hanchenye-llvm-project/llvm/test/CodeGen/AMDGPU
Matt Arsenault e5d9515fb7 DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138
2015-10-13 00:49:00 +00:00
..
32-bit-local-address-space.ll
README
add-debug.ll
add.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
add_i64.ll
address-space.ll PeepholeOptimizer: Remove redundant copies 2015-09-25 20:22:12 +00:00
and.ll AMDGPU: Add some more tests for literal operands 2015-09-25 18:21:47 +00:00
anyext.ll
array-ptr-calc-i32.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
array-ptr-calc-i64.ll AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32) 2015-07-14 18:20:33 +00:00
atomic_cmp_swap_local.ll
atomic_load_add.ll
atomic_load_sub.ll
basic-branch.ll
basic-loop.ll
bfe_uint.ll
bfi_int.ll
big_alu.ll
bitcast.ll
bswap.ll
build_vector.ll
call.ll
call_fs.ll
calling-conventions.ll AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments() 2015-10-06 21:16:34 +00:00
cayman-loop-bug.ll
cf-stack-bug.ll
cf_end.ll
cgp-addressing-modes-flat.ll AMDGPU: Assume SMRD access for constant address space 2015-08-07 20:18:34 +00:00
cgp-addressing-modes.ll AMDGPU/SI: Fold operands through REG_SEQUENCE instructions 2015-09-09 15:43:26 +00:00
coalescer_remat.ll
codegen-prepare-addrmode-sext.ll
combine_vloads.ll
commute-compares.ll
commute-shifts.ll AMDGPU: really don't commute REV opcodes if the target variant doesn't exist 2015-06-26 20:29:10 +00:00
commute_modifiers.ll
complex-folding.ll
concat_vectors.ll
copy-illegal-type.ll
copy-to-reg.ll
ctlz_zero_undef.ll
ctpop.ll
ctpop64.ll AMDGPU: Reduce number of copies emitted 2015-09-24 07:16:37 +00:00
cttz_zero_undef.ll
cvt_f32_ubyte.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
cvt_flr_i32_f32.ll
cvt_rpi_i32_f32.ll
dagcombiner-bug-illegal-vec4-int-to-fp.ll
debug.ll
default-fp-mode.ll
disconnected-predset-break-bug.ll
dot4-folding.ll
drop-mem-operand-move-smrd.ll AMDGPU: Fix dropping mem operands when moving to VALU 2015-08-29 06:48:46 +00:00
ds-negative-offset-addressing-mode-loop.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
ds-sub-offset.ll AMDGPU: Handle sub of constant for DS offset folding 2015-09-08 19:34:22 +00:00
ds_read2.ll DAGCombiner: Combine extract_vector_elt from build_vector 2015-10-12 23:59:50 +00:00
ds_read2_offset_order.ll AMDGPU/SI: Fix read2 merging into a super register. 2015-07-14 17:57:36 +00:00
ds_read2_superreg.ll Introduce target hook for optimizing register copies 2015-09-24 08:36:14 +00:00
ds_read2st64.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
ds_write2.ll AMDGPU/SI: Fix read2 merging into a super register. 2015-07-14 17:57:36 +00:00
ds_write2st64.ll AMDGPU/SI: Fix read2 merging into a super register. 2015-07-14 17:57:36 +00:00
dynamic_stackalloc.ll AMDGPU: Produce error on dynamic_stackalloc 2015-08-26 18:37:13 +00:00
elf.ll AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSA 2015-06-26 21:15:11 +00:00
elf.r600.ll
empty-function.ll
endcf-loop-header.ll
extload-private.ll
extload.ll
extract-vector-elt-i64.ll AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG 2015-09-25 17:27:08 +00:00
extract_vector_elt_i16.ll
fabs.f64.ll
fabs.ll
fadd.ll
fadd64.ll
fceil.ll
fceil64.ll DAGCombiner: Combine extract_vector_elt from build_vector 2015-10-12 23:59:50 +00:00
fcmp-cnd.ll
fcmp-cnde-int-args.ll
fcmp.ll Fix CHECK directives that weren't checking. 2015-08-31 21:10:35 +00:00
fcmp64.ll
fconst64.ll
fcopysign.f32.ll
fcopysign.f64.ll
fdiv.f64.ll
fdiv.ll
fetch-limits.r600.ll
fetch-limits.r700+.ll
ffloor.f64.ll
ffloor.ll
flat-address-space.ll Fix CHECK directives that weren't checking. 2015-08-31 21:10:35 +00:00
floor.ll
fma-combine.ll [DAGCombiner] Improve FMA support for interpolation patterns 2015-09-21 20:32:48 +00:00
fma.f64.ll
fma.ll
fmad.ll
fmax.ll
fmax3.f64.ll
fmax3.ll
fmax_legacy.f64.ll
fmax_legacy.ll
fmaxnum.f64.ll
fmaxnum.ll
fmin.ll
fmin3.ll
fmin_legacy.f64.ll
fmin_legacy.ll
fminnum.f64.ll
fminnum.ll
fmul-2-combine-multi-use.ll Only do fmul (fadd x, x), c combine if the fadd only has one use 2015-07-17 01:14:35 +00:00
fmul.ll
fmul64.ll
fmuladd.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
fnearbyint.ll
fneg-fabs.f64.ll
fneg-fabs.ll
fneg.f64.ll
fneg.ll
fp-classify.ll
fp16_to_fp.ll
fp32_to_fp16.ll
fp_to_sint.f64.ll
fp_to_sint.ll
fp_to_uint.f64.ll
fp_to_uint.ll
fpext.ll
fptrunc.ll
frem.ll
fsqrt.ll
fsub.ll
fsub64.ll
ftrunc.f64.ll DAGCombiner: Combine extract_vector_elt from build_vector 2015-10-12 23:59:50 +00:00
ftrunc.ll
gep-address-space.ll DAGCombiner: Combine extract_vector_elt from build_vector 2015-10-12 23:59:50 +00:00
global-directive.ll
global-extload-i1.ll
global-extload-i8.ll
global-extload-i16.ll
global-extload-i32.ll
global-zero-initializer.ll
global_atomics.ll AMDGPU: Fix printing trailing whitespace for mubuf atomics 2015-09-24 07:51:17 +00:00
gv-const-addrspace-fail.ll
gv-const-addrspace.ll AMDGPU: don't match vgpr loads for constant loads 2015-07-27 18:16:08 +00:00
half.ll Introduce target hook for optimizing register copies 2015-09-24 08:36:14 +00:00
hsa.ll AMDGPU/SI: Use .hsatext section instead of .text for HSA 2015-09-25 21:41:28 +00:00
i1-copy-implicit-def.ll
i1-copy-phi.ll
i8-to-double-to-float.ll
icmp-select-sete-reverse-args.ll
icmp64.ll
image-attributes.ll AMDGPU: Add pass to lower OpenCL image and sampler arguments. 2015-08-07 23:19:30 +00:00
image-resource-id.ll AMDGPU: Add pass to lower OpenCL image and sampler arguments. 2015-08-07 23:19:30 +00:00
imm.ll
indirect-addressing-si.ll AMDGPU: Use explicit register size indirect pseudos 2015-10-07 00:42:51 +00:00
indirect-private-64.ll
infinite-loop-evergreen.ll
infinite-loop.ll
inline-asm.ll
inline-calls.ll
input-mods.ll
insert_subreg.ll
insert_vector_elt.ll
invariant-load-no-alias-store.ll DAGCombiner: Assume invariant load cannot alias a store 2015-07-10 22:17:40 +00:00
jump-address.ll
kcache-fold.ll
kernel-args.ll
large-alloca.ll
large-constant-initializer.ll
lds-initializer.ll
lds-oqap-crash.ll
lds-output-queue.ll
lds-size.ll
lds-zero-initializer.ll
legalizedag-bug-expand-setcc.ll
lit.local.cfg
literals.ll
llvm.AMDGPU.abs.ll
llvm.AMDGPU.barrier.global.ll
llvm.AMDGPU.barrier.local.ll
llvm.AMDGPU.bfe.i32.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
llvm.AMDGPU.bfe.u32.ll
llvm.AMDGPU.bfi.ll
llvm.AMDGPU.bfm.ll
llvm.AMDGPU.brev.ll
llvm.AMDGPU.clamp.ll
llvm.AMDGPU.class.ll AMDGPU: Improve accuracy of instruction rates for VOPC 2015-09-25 16:58:25 +00:00
llvm.AMDGPU.cube.ll
llvm.AMDGPU.cvt_f32_ubyte.ll
llvm.AMDGPU.div_fixup.ll
llvm.AMDGPU.div_fmas.ll AMDGPU/SI: Fix extra space when printing v_div_fmas_* 2015-06-28 18:16:14 +00:00
llvm.AMDGPU.div_scale.ll
llvm.AMDGPU.flbit.i32.ll
llvm.AMDGPU.fract.f64.ll AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaround 2015-07-27 11:37:42 +00:00
llvm.AMDGPU.fract.ll
llvm.AMDGPU.imad24.ll
llvm.AMDGPU.imax.ll
llvm.AMDGPU.imin.ll
llvm.AMDGPU.imul24.ll
llvm.AMDGPU.kill.ll
llvm.AMDGPU.ldexp.ll
llvm.AMDGPU.legacy.rsq.ll
llvm.AMDGPU.mul.ll
llvm.AMDGPU.rcp.f64.ll
llvm.AMDGPU.rcp.ll
llvm.AMDGPU.rsq.clamped.f64.ll
llvm.AMDGPU.rsq.clamped.ll
llvm.AMDGPU.rsq.ll
llvm.AMDGPU.tex.ll
llvm.AMDGPU.trig_preop.ll
llvm.AMDGPU.trunc.ll
llvm.AMDGPU.umad24.ll
llvm.AMDGPU.umax.ll
llvm.AMDGPU.umin.ll
llvm.AMDGPU.umul24.ll
llvm.SI.fs.interp.ll
llvm.SI.gather4.ll
llvm.SI.getlod.ll
llvm.SI.image.ll
llvm.SI.image.sample.ll
llvm.SI.image.sample.o.ll
llvm.SI.imageload.ll
llvm.SI.load.dword.ll
llvm.SI.resinfo.ll
llvm.SI.sample-masked.ll
llvm.SI.sample.ll
llvm.SI.sampled.ll
llvm.SI.sendmsg-m0.ll
llvm.SI.sendmsg.ll
llvm.SI.tbuffer.store.ll
llvm.SI.tid.ll
llvm.amdgcn.buffer.wbinvl1.ll AMDGPU: Add cache invalidation instructions. 2015-09-24 19:52:21 +00:00
llvm.amdgcn.buffer.wbinvl1.sc.ll AMDGPU: Add cache invalidation instructions. 2015-09-24 19:52:21 +00:00
llvm.amdgcn.buffer.wbinvl1.vol.ll AMDGPU: Add cache invalidation instructions. 2015-09-24 19:52:21 +00:00
llvm.amdgcn.s.dcache.inv.ll AMDGPU: Add s_dcache_* instructions 2015-09-24 19:52:27 +00:00
llvm.amdgcn.s.dcache.inv.vol.ll AMDGPU: Add s_dcache_* instructions 2015-09-24 19:52:27 +00:00
llvm.amdgcn.s.dcache.wb.ll AMDGPU: Add s_dcache_* instructions 2015-09-24 19:52:27 +00:00
llvm.amdgcn.s.dcache.wb.vol.ll AMDGPU: Add s_dcache_* instructions 2015-09-24 19:52:27 +00:00
llvm.amdgpu.dp4.ll
llvm.amdgpu.kilp.ll
llvm.amdgpu.lrp.ll [DAGCombiner] Improve FMA support for interpolation patterns 2015-09-21 20:32:48 +00:00
llvm.cos.ll
llvm.dbg.value.ll DI: Require subprogram definitions to be distinct 2015-08-28 20:26:49 +00:00
llvm.exp2.ll
llvm.log2.ll
llvm.memcpy.ll
llvm.pow.ll
llvm.rint.f64.ll
llvm.rint.ll
llvm.round.f64.ll Introduce target hook for optimizing register copies 2015-09-24 08:36:14 +00:00
llvm.round.ll AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions 2015-07-14 14:15:03 +00:00
llvm.sin.ll
llvm.sqrt.ll
load-i1.ll
load-input-fold.ll
load.ll
load.vec.ll
load64.ll
local-64.ll
local-atomics.ll
local-atomics64.ll
local-memory-two-objects.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
local-memory.ll
loop-address.ll
loop-idiom.ll
lshl.ll
lshr.ll
m0-spill.ll
mad-combine.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
mad-sub.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
mad_int24.ll
mad_uint24.ll
madak.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
madmk.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
max-literals.ll
max.ll
max3.ll
merge-stores.ll DAGCombiner: Don't stop finding better chain on 2 aliases 2015-10-13 00:49:00 +00:00
min.ll
min3.ll
missing-store.ll
move-addr64-rsrc-dead-subreg-writes.ll Introduce target hook for optimizing register copies 2015-09-24 08:36:14 +00:00
mubuf.ll
mul.ll
mul_int24.ll
mul_uint24.ll AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32) 2015-07-14 18:20:33 +00:00
mulhu.ll
no-initializer-constant-addrspace.ll
no-shrink-extloads.ll
opencl-image-metadata.ll AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass 2015-10-01 21:16:05 +00:00
operand-folding.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
operand-spacing.ll
or.ll
packetizer.ll
parallelandifcollapse.ll
parallelorifcollapse.ll
partially-dead-super-register-immediate.ll LiveIntervalAnalysis: Avoid multiple connected liveness components 2015-09-22 22:37:44 +00:00
predicate-dp4.ll
predicates.ll
private-memory-atomics.ll
private-memory-broken.ll
private-memory.ll AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operand 2015-07-16 19:40:09 +00:00
promote-alloca-bitcast-function.ll AMDGPU: Fix crash if called function is a bitcast 2015-07-28 18:29:14 +00:00
promote-alloca-stored-pointer-value.ll AMDGPU: Don't try to use LDS/vector for private if pointer value stored 2015-07-28 18:47:00 +00:00
pv-packing.ll
pv.ll
r600-encoding.ll
r600-export-fix.ll
r600-infinite-loop-bug-while-reorganizing-vector.ll
r600cfg.ll
reciprocal.ll
register-count-comments.ll AMDGPU/SI: Fix printing useless info with amdhsa 2015-08-15 00:12:39 +00:00
reorder-stores.ll
rotl.i64.ll
rotl.ll
rotr.i64.ll
rotr.ll
rsq.ll
rv7x0_count3.ll
s_movk_i32.ll AMDGPU: Reduce number of copies emitted 2015-09-24 07:16:37 +00:00
saddo.ll
salu-to-valu.ll AMDGPU: Fix splitting x16 SMRD loads 2015-09-28 20:54:52 +00:00
sampler-resource-id.ll AMDGPU: Add pass to lower OpenCL image and sampler arguments. 2015-08-07 23:19:30 +00:00
scalar_to_vector.ll
schedule-fs-loop-nested-if.ll
schedule-fs-loop-nested.ll
schedule-fs-loop.ll
schedule-global-loads.ll
schedule-if-2.ll
schedule-if.ll
schedule-kernel-arg-loads.ll
schedule-vs-if-nested-loop-failure.ll
schedule-vs-if-nested-loop.ll
scratch-buffer.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
sdiv.ll
sdivrem24.ll
sdivrem64.ll
select-i1.ll
select-vectors.ll AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions 2015-07-14 14:15:03 +00:00
select.ll
select64.ll AMDGPU/SI: Fold operands through REG_SEQUENCE instructions 2015-09-09 15:43:26 +00:00
selectcc-cnd.ll
selectcc-cnde-int.ll
selectcc-icmp-select-float.ll
selectcc-opt.ll
selectcc.ll
set-dx10.ll
setcc-equivalent.ll
setcc-opt.ll AMDGPU/SI: Better handle s_wait insertion 2015-08-21 22:47:27 +00:00
setcc.ll
setcc64.ll
seto.ll
setuo.ll
sext-eliminate.ll
sext-in-reg.ll AMDGPU: Fix not moving users of s_bfe_i64 to VALU 2015-08-26 20:47:58 +00:00
sgpr-control-flow.ll
sgpr-copy-duplicate-operand.ll
sgpr-copy.ll
shared-op-cycle.ll
shl.ll AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32) 2015-07-14 18:20:33 +00:00
shl_add_constant.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
shl_add_ptr.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
si-annotate-cf-assertion.ll
si-annotate-cf.ll
si-instr-info-correct-implicit-operands.ll AMDGPU/SI: Add implicit register operands in the correct order. 2015-07-31 23:30:09 +00:00
si-literal-folding.ll AMDGPU/SI: Add test for folding constants into operands 2015-08-27 17:41:27 +00:00
si-lod-bias.ll
si-sgpr-spill.ll
si-spill-cf.ll
si-triv-disjoint-mem-access.ll AMDGPU: Fix sched model for VOP2b instructions 2015-09-26 02:25:45 +00:00
si-vector-hang.ll
sign_extend.ll
simplify-demanded-bits-build-pair.ll
sint_to_fp.f64.ll AMDGPU: Improve accuracy of instruction rates for some FP instructions 2015-08-22 00:50:41 +00:00
sint_to_fp.ll
smrd.ll AMDGPU/SI: Add support for 32-bit immediate SMRD offsets on CI 2015-08-06 19:28:38 +00:00
split-scalar-i64-add.ll
sra.ll
srem.ll
srl.ll AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32) 2015-07-14 18:20:33 +00:00
ssubo.ll
store-barrier.ll
store-v3i32.ll
store-v3i64.ll
store-vector-ptrs.ll
store.ll
store.r600.ll
store_typed.ll AMDGPU: Add MEM_RAT STORE_TYPED. 2015-10-01 17:51:34 +00:00
structurize.ll
structurize1.ll
sub.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
subreg-coalescer-crash.ll
subreg-coalescer-undef-use.ll Test for specific output in lit test 2015-07-01 22:34:59 +00:00
subreg-eliminate-dead.ll
swizzle-export.ll
tex-clause-antidep.ll
texture-input-merge.ll
trunc-cmp-constant.ll
trunc-store-f64-to-f16.ll
trunc-store-i1.ll
trunc-store.ll AMDGPU: Fix v16i32 to v16i8 truncstore 2015-07-31 04:12:04 +00:00
trunc-vector-store-assertion-failure.ll
trunc.ll
tti-unroll-prefs.ll
uaddo.ll
udiv.ll
udivrem.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
udivrem24.ll
udivrem64.ll
uint_to_fp.f64.ll AMDGPU: Improve accuracy of instruction rates for some FP instructions 2015-08-22 00:50:41 +00:00
uint_to_fp.ll
unaligned-load-store.ll
unhandled-loop-condition-assertion.ll
unroll.ll
unsupported-cc.ll
urecip.ll
urem.ll
use-sgpr-multiple-times.ll PeepholeOptimizer: Remove redundant copies 2015-09-25 20:22:12 +00:00
usubo.ll
v1i64-kernel-arg.ll
v_cndmask.ll
v_mac.ll AMDGPU/SI: Select mad patterns to v_mac_f32 2015-07-13 15:47:57 +00:00
valu-i1.ll AMDGPU: Improve accuracy of instruction rates for VOPC 2015-09-25 16:58:25 +00:00
vector-alloca.ll
vertex-fetch-encoding.ll
vop-shrink.ll AMDGPU: Add sdst operand to VOP2b instructions 2015-08-29 07:16:50 +00:00
vselect.ll AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions 2015-07-14 14:15:03 +00:00
vselect64.ll
vtx-fetch-branch.ll
vtx-schedule.ll
wait.ll AMDGPU/SI: Better handle s_wait insertion 2015-08-21 22:47:27 +00:00
work-item-intrinsics.ll
wrong-transalu-pos-fix.ll
xor.ll AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions 2015-07-14 14:15:03 +00:00
zero_extend.ll

README

+==============================================================================+
| How to organize the lit tests                                                |
+==============================================================================+

- If you write a test for matching a single DAG opcode or intrinsic, it should
  go in a file called {opcode_name,intrinsic_name}.ll (e.g. fadd.ll)

- If you write a test that matches several DAG opcodes and checks for a single
  ISA instruction, then that test should go in a file called {ISA_name}.ll (e.g.
  bfi_int.ll

- For all other tests, use your best judgement for organizing tests and naming
  the files.

+==============================================================================+
| Naming conventions                                                           |
+==============================================================================+

- Use dash '-' and not underscore '_' to separate words in file names, unless
  the file is named after a DAG opcode or ISA instruction that has an
  underscore '_' in its name.