Commit Graph

249 Commits

Author SHA1 Message Date
Evan Cheng 0e7b00d79f Replace all target specific implicit def instructions with a target independent one: TargetInstrInfo::IMPLICIT_DEF.
llvm-svn: 48380
2008-03-15 00:03:38 +00:00
Evan Cheng 5be52a6053 Fix some 80 col violations.
llvm-svn: 48361
2008-03-14 07:46:48 +00:00
Evan Cheng 96bdbd6c5d Fix a number of encoding bugs. SSE 4.1 instructions MPSADBWrri, PINSRDrr, etc. have 8-bits immediate field (ImmT == Imm8).
llvm-svn: 48360
2008-03-14 07:39:27 +00:00
Evan Cheng 99ee78ef63 Clean up my own mess.
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.

llvm-svn: 48279
2008-03-12 07:02:50 +00:00
Evan Cheng 95cf661534 Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
llvm-svn: 48042
2008-03-08 00:58:38 +00:00
Evan Cheng 3ea44e4ee9 isTwoAddress = 1 -> Constraints.
llvm-svn: 47941
2008-03-05 08:19:16 +00:00
Evan Cheng 6ec7dc6bea PSLLWri etc. are two-address instructions.
llvm-svn: 47940
2008-03-05 08:11:27 +00:00
Evan Cheng 6200c225e0 - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290
2008-02-18 23:04:32 +00:00
Andrew Lenharth 9b254eed32 llvm.memory.barrier, and impl for x86 and alpha
llvm-svn: 47204
2008-02-16 01:24:58 +00:00
Nate Begeman 8ef50214f0 SSE4.1 64b integer insert/extract pattern support
Move formats into the formats file

llvm-svn: 47035
2008-02-12 22:51:28 +00:00
Nate Begeman 2d77e8e446 Enable SSE4 codegen and pattern matching.
Add some notes to the README.

llvm-svn: 46949
2008-02-11 04:19:36 +00:00
Nate Begeman 3050f74a1d xmm0 variable blends
llvm-svn: 46931
2008-02-10 18:47:57 +00:00
Nate Begeman 727c7634c7 memopv16i8 had wrong alignment requirement, would have broken pabsb
pabs{b,w,d} are not two address
fix extract-to-mem sse4 ops
add sse4 vector sign extend nodes

llvm-svn: 46915
2008-02-09 23:46:37 +00:00
Nate Begeman 6715f755cc Skeleton of insert and extract matching, more to come
llvm-svn: 46902
2008-02-09 01:38:08 +00:00
Nate Begeman e146c0e3fd The rest of the SSE4.1 intrinsic patterns that are obvious to me. Getting
Evan's help with the rest.

llvm-svn: 46697
2008-02-04 06:00:24 +00:00
Nate Begeman ccdfd4aa17 Some more SSE 4.1 intrinsic patterns.
llvm-svn: 46696
2008-02-04 05:34:34 +00:00
Nate Begeman e14fdfaecd SSE 4.1 Intrinsics and detection
llvm-svn: 46681
2008-02-03 07:18:54 +00:00
Chris Lattner a91f77eaac Significantly simplify and improve handling of FP function results on x86-32.
This case returns the value in ST(0) and then has to convert it to an SSE
register.  This causes significant codegen ugliness in some cases.  For 
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:

_bar:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.

Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always 
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.

This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case.  This gives 
us this code for the example above:

_bar:
	subl	$12, %esp
	call	L_foo$stub
	addl	$12, %esp
	ret

The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
less gross than the code it is replacing :)

This also allows us to generate better code in several other cases.  For 
example on fp-stack-ret-conv.ll, we now generate:

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstps	8(%esp)
	movl	16(%esp), %eax
	cvtss2sd	8(%esp), %xmm0
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

where before we produced (incidentally, the old bad code is identical to what
gcc produces):

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	cvtsd2ss	(%esp), %xmm0
	cvtss2sd	%xmm0, %xmm0
	movl	16(%esp), %eax
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
deficiency that is unrelated to this patch.

llvm-svn: 46307
2008-01-24 08:07:48 +00:00
Chris Lattner f4b0c99d63 add some missing flags.
llvm-svn: 45859
2008-01-11 06:59:07 +00:00
Chris Lattner 317332fc2a Start inferring side effect information more aggressively, and fix many bugs in the
x86 backend where instructions were not marked maystore/mayload, and perf issues where
instructions were not marked neverHasSideEffects.  It would be really nice if we could
write patterns for copy instructions.

I have audited all the x86 instructions down to MOVDQAmr.  The flags on others and on
other targets are probably not right in all cases, but no clients currently use this
info that are enabled by default.

llvm-svn: 45829
2008-01-10 07:59:24 +00:00
Chris Lattner aca7ca3730 remove explicit sets of 'neverHasSideEffects' that can now be
inferred from the instr patterns.

llvm-svn: 45824
2008-01-10 05:45:39 +00:00
Chris Lattner a4ce4f6987 rename isLoad -> isSimpleLoad due to evan's desire to have such a predicate.
llvm-svn: 45667
2008-01-06 23:38:27 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Evan Cheng 01c7c198ee Fix JIT encoding for CMPSD as well.
llvm-svn: 45268
2007-12-20 19:57:09 +00:00
Bill Wendling b3d85a5d4b Add "mayHaveSideEffects" and "neverHasSideEffects" flags to some instructions. I
based what flag to set on whether it was already marked as
"isRematerializable". If there was a further check to determine if it's "really"
rematerializable, then I marked it as "mayHaveSideEffects" and created a check
in the X86 back-end similar to the remat one.

llvm-svn: 45132
2007-12-17 23:07:56 +00:00
Chris Lattner dab6bd902e Fix the JIT encoding of cmp*ss, which aborts with this assertion currently:
X86CodeEmitter.cpp:378: failed assertion `0 && "Immediate size not set!"'

I *think* this is right, but Evan, please verify.  It also looks like
CMPSDrr and maybe others are missing this info.  Evan, plz investigate.

llvm-svn: 45074
2007-12-16 20:12:41 +00:00
Evan Cheng 23d2d4dc6c Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Evan Cheng 6e68381e02 Implicit def instructions, e.g. X86::IMPLICIT_DEF_GR32, are always re-materializable and they should not be spilled.
llvm-svn: 44960
2007-12-12 23:12:09 +00:00
Evan Cheng c829e5cdf0 Remove a bogus optimization. It's not possible to do a move to low element to a <8 x i16> or <16 x i8> vector.
llvm-svn: 44669
2007-12-06 22:14:22 +00:00
Chris Lattner 5728bdd4db Fix a long standing deficiency in the X86 backend: we would
sometimes emit "zero" and "all one" vectors multiple times,
for example:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M2
	ret

instead of:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	movq	%mm0, _M2
	ret

This patch fixes this by always arranging for zero/one vectors
to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
any random type.  This ensures they get trivially CSE'd on the dag.
This fix is also important for LegalizeDAGTypes, as it gets unhappy
when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
'i64' isn't legal.

This patch makes the following changes:

1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
   their canonical types.
2) The now-dead patterns are removed from the SSE/MMX .td files.
3) All the patterns in the .td file that referred to immAllOnesV or
   immAllZerosV in the wrong form now use *_bc to match them with a
   bitcast wrapped around them.
4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle 
   bitcast'd zero vectors, which simplifies the code actually.
5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
   is legal, instead of generating one that is illegal and expecting
   a later legalize pass to clean it up.
6) isZeroShuffle is generalized to handle bitcast of zeros.
7) several other minor tweaks.

This patch is definite goodness, but has the potential to cause random
code quality regressions.  Please be on the lookout for these and let 
me know if they happen.

llvm-svn: 44310
2007-11-25 00:24:49 +00:00
Nate Begeman d4d45c268c Add support for vectors to int <-> float casts.
llvm-svn: 44204
2007-11-17 03:58:34 +00:00
Dale Johannesen d50c8bcef6 Add missing SSE builtins: CVTPD2PI, CVTPS2PI,
CVTTPD2PI, CVTTPS2PI, CVTPI2PD, CVTPI2PS.

llvm-svn: 43523
2007-10-30 22:15:38 +00:00
Arnold Schwaighofer 1f0da1fefb Corrected many typing errors. And removed 'nest' parameter handling
for fastcc from X86CallingConv.td.  This means that nested functions
are not supported for calling convention 'fastcc'.

llvm-svn: 42934
2007-10-12 21:30:57 +00:00
Dale Johannesen 62f65edc32 Add missing argument to PALIGNR
llvm-svn: 42874
2007-10-11 20:58:37 +00:00
Evan Cheng f4b5d491df Added DAG xforms. e.g.
(vextract (v4f32 s2v (f32 load $addr)), 0) -> (f32 load $addr) 
(vextract (v4i32 bc (v4f32 s2v (f32 load $addr))), 0) -> (i32 load $addr)
Remove x86 specific patterns.

llvm-svn: 42677
2007-10-06 02:46:29 +00:00
Evan Cheng a1b7e95039 Typo. X86comi doesn't read / write chain's.
llvm-svn: 42492
2007-10-01 18:12:48 +00:00
Evan Cheng 5fb5a1f389 Enabling new condition code modeling scheme.
llvm-svn: 42459
2007-09-29 00:00:36 +00:00
Evan Cheng e95f391ef1 Added support for new condition code modeling scheme (i.e. physical register dependency). These are a bunch of instructions that are duplicated so the x86 backend can support both the old and new schemes at the same time. They will be deleted after
all the kinks are worked out.

llvm-svn: 42285
2007-09-25 01:57:46 +00:00
Dale Johannesen e36c400255 Fix PR 1681. When X86 target uses +sse -sse2,
keep f32 in SSE registers and f64 in x87.  This
is effectively a new codegen mode.
Change addLegalFPImmediate to permit float and
double variants to do different things.
Adjust callers.

llvm-svn: 42246
2007-09-23 14:52:20 +00:00
Evan Cheng 483e1ce16e Add implicit def of EFLAGS on those instructions that may modify flags.
llvm-svn: 41962
2007-09-14 21:48:26 +00:00
Evan Cheng 3e18e504ae Remove (somewhat confusing) Imp<> helper, use let Defs = [], Uses = [] instead.
llvm-svn: 41863
2007-09-11 19:55:27 +00:00
Dan Gohman a95cbb0007 Avoid storing and reloading zeros and other constants from stack slots
by flagging the associated instructions as being trivially rematerializable.

llvm-svn: 41775
2007-09-07 21:32:51 +00:00
Evan Cheng c2081fe573 Mark load instructions with isLoad = 1.
llvm-svn: 41595
2007-08-30 05:49:43 +00:00
Bill Wendling cdbd82ee37 64-bit SSSE3 ops that use MMX registers don't require 16-byte alignment.
Make a 'memop' pattern just for them.

llvm-svn: 41017
2007-08-11 09:52:53 +00:00
Bill Wendling 7014615087 For kicks, I though it would be fun to use the correct opcode.
llvm-svn: 40985
2007-08-10 09:00:17 +00:00
Bill Wendling 2377206923 Adding SSSE3 intrinsics.
llvm-svn: 40982
2007-08-10 06:22:27 +00:00
Dan Gohman 8932bff7fe Fix the alignment requirements of several unpck and shuf instructions.
Generalize isPSHUFDMask and add a unary SHUFPD pattern so that SHUFPD's
memory operand alignment can be tested as well, with a fix to avoid
breaking MMX's use of isPSHUFDMask.

llvm-svn: 40756
2007-08-02 21:17:01 +00:00
Dan Gohman 4d436e2b7d Fix pastos in vector arithmetic intrinsics.
llvm-svn: 40754
2007-08-02 21:06:40 +00:00
Dan Gohman fa3eeeedc0 Mark the SSE and MMX load instructions that
X86InstrInfo::isReallyTriviallyReMaterializable knows how to handle
with the isReMaterializable flag so that it is given a chance to handle
them. Without hoisting constant-pool loads from loops this isn't very
visible, though it does keep CodeGen/X86/constant-pool-remat-0.ll from
making a copy of the constant pool on the stack.

llvm-svn: 40736
2007-08-02 14:27:55 +00:00
Evan Cheng da549ece5c Missing Requires.
llvm-svn: 40691
2007-08-01 21:42:24 +00:00