hanchenye-llvm-project/llvm/lib
Sanjay Patel ae945e7927 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
..
Analysis Add a missing const qualifier on the context instruction. This somehow 2015-12-24 09:08:08 +00:00
AsmParser Implemented Support of IA interrupt and exception handlers: 2015-12-21 14:07:14 +00:00
Bitcode Remove overly strict new assert in BitcodeReader. 2015-12-21 15:38:13 +00:00
CodeGen Use range-based for loops. NFC 2015-12-24 05:20:40 +00:00
DebugInfo Remove unused constants from TypeTableBuilder.cpp. 2015-12-24 19:15:56 +00:00
ExecutionEngine Delete APIs that have been deprecated since 2010. 2015-12-19 21:42:07 +00:00
Fuzzer [libFuzzer] add AFL-style dictionary for C++, remove the old file with tokens 2015-12-22 01:50:51 +00:00
IR [Function] Properly remove use when clearing personality 2015-12-23 18:27:23 +00:00
IRReader [ThinLTO] Metadata linking for imported functions 2015-12-17 17:14:09 +00:00
LTO Rename variables to reflect linker split (NFC) 2015-12-18 19:28:59 +00:00
LibDriver [Option] Use an ArrayRef to store the Option Infos in OptTable. NFC 2015-10-21 16:30:42 +00:00
LineEditor
Linker Handle empty Subprogram list when linking metadata. 2015-12-22 01:17:19 +00:00
MC Form reform for MCDwarf. 2015-12-23 01:57:31 +00:00
Object Handle archives with paths in the names. 2015-12-18 16:07:17 +00:00
Option Convert Arg, ArgList, and Option to dump() to dbgs() rather than errs(). 2015-12-18 18:55:26 +00:00
Passes [PM] Port StripDeadPrototypes to the new pass manager 2015-10-30 23:28:12 +00:00
ProfileData [ProfileData] Make helper function static. 2015-12-24 10:03:37 +00:00
Support [Support] Allow multiple paired calls to {start,stop}Timer() 2015-12-22 17:36:17 +00:00
TableGen [TblGen] ArrayRefize TGParser. No functional change intended. 2015-10-24 12:46:45 +00:00
Target [X86][ms-inline asm] Add support for memory operands that include structs 2015-12-24 12:09:51 +00:00
Transforms [InstCombine] transform more extract/insert pairs into shuffles (PR2109) 2015-12-24 21:17:56 +00:00
CMakeLists.txt LibDriver, llvm-lib: introduce. 2015-06-09 21:50:22 +00:00
LLVMBuild.txt Wrap some long lines in LLVMBuild files. NFC 2015-06-12 18:44:57 +00:00
Makefile LibDriver, llvm-lib: introduce. 2015-06-09 21:50:22 +00:00