Commit Graph

40967 Commits

Author SHA1 Message Date
Owen Anderson cf7f941121 Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.

llvm-svn: 112325
2010-08-27 23:31:36 +00:00
Owen Anderson 38f6b7fe3b Improve the precision of getConstant().
llvm-svn: 112323
2010-08-27 23:29:38 +00:00
Bob Wilson 13ce07fa92 Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like
all the other LDM/STM instructions.  This fixes asm printer crashes when
compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
was not aware of these special cases.  The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode.  I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322
2010-08-27 23:18:17 +00:00
Chris Lattner 6c1395f62a Enhance the shift propagator to handle the case when you have:
A = shl x, 42
...
B = lshr ..., 38

which can be transformed into:
A = shl x, 4
...

iff we can prove that the would-be-shifted-in bits
are already zero.  This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.

llvm-svn: 112314
2010-08-27 22:53:44 +00:00
Devang Patel f2855b147f Simplify.
llvm-svn: 112305
2010-08-27 22:25:51 +00:00
Chris Lattner 18d7fc8fc6 Implement a pretty general logical shift propagation
framework, which is good at ripping through bitfield
operations.  This generalize a bunch of the existing
xforms that instcombine does, such as 
  (x << c) >> c -> and
to handle intermediate logical nodes.  This is useful for
ripping up the "promote to large integer" code produced by
SRoA.

llvm-svn: 112304
2010-08-27 22:24:38 +00:00
Bob Wilson af371b49a8 Unsigned value cannot be < 0.
llvm-svn: 112300
2010-08-27 21:44:35 +00:00
Dan Gohman 15871f23e3 When merging adjacent operands, scan ahead and merge all equal
adjacent operands at once, instead of just two at a time.

llvm-svn: 112299
2010-08-27 21:39:59 +00:00
Chris Lattner 25a198e72b remove some special shift cases that have been subsumed into the
more general simplify demanded bits logic.

llvm-svn: 112291
2010-08-27 21:04:34 +00:00
Dan Gohman c866bf4fec Make the {A,+,B}<L> + {C,+,D}<L> --> Other + {A+C,+,B+D}<L>
transformation collect all the addrecs with the same loop
add combine them at once rather than starting everything over
at the first chance.

llvm-svn: 112290
2010-08-27 20:45:56 +00:00
Bill Wendling 6628431a91 Remove now unneeded command line flag that enables 'optimize compares.'
llvm-svn: 112287
2010-08-27 20:39:09 +00:00
Owen Anderson 99d4cb861b Fix typos in comments.
llvm-svn: 112286
2010-08-27 20:32:56 +00:00
Chris Lattner 7398434675 teach the truncation optimization that an entire chain of
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.

llvm-svn: 112285
2010-08-27 20:32:06 +00:00
Dan Gohman 9bad2fb378 Switch ScalarEvolution's main Value*->SCEV* map from std::map
to DenseMap.

llvm-svn: 112281
2010-08-27 18:55:03 +00:00
Chris Lattner 90cd746e63 Add an instcombine to clean up a common pattern produced
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:

   %94 = zext i16 %93 to i32                       ; <i32> [#uses=2]
   %96 = lshr i32 %94, 8                           ; <i32> [#uses=1]
   %101 = trunc i32 %96 to i8                      ; <i8> [#uses=1]

This also unblocks other xforms from happening, now clang is able to compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm2
	addss	%xmm0, %xmm2
	movdqa	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	pshufd	$1, %xmm1, %xmm0
	addss	%xmm3, %xmm0
	ret

on x86-64, instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

This seems pretty close to optimal to me, at least without
using horizontal adds.  This also triggers in lots of other
code, including SPEC.

llvm-svn: 112278
2010-08-27 18:31:05 +00:00
Bob Wilson edf722add3 Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Owen Anderson 6ebbd92380 Use LVI to eliminate conditional branches where we've tested a related condition previously. Update tests for this change.
This fixes PR5652.

llvm-svn: 112270
2010-08-27 17:12:29 +00:00
Dan Gohman 2706567c5c Optimize SCEVComplexityCompare. Use a 3-way return instead of a 2-way
return to avoid needing two calls to test for equivalence, and sort
addrecs by their degree before examining their operands.

llvm-svn: 112267
2010-08-27 15:26:01 +00:00
Anton Korobeynikov c0b36921c2 Properly handle passing of FP stuff to varargs function on Win64:
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
2010-08-27 14:43:06 +00:00
Benjamin Kramer 1f6012479f MCELF: Port EmitInstruction changes from MachO streamer. Patch by Roman Divacky.
llvm-svn: 112260
2010-08-27 10:40:51 +00:00
Benjamin Kramer 05e22982c8 MCELF: Always overwrite FixedValue.
llvm-svn: 112259
2010-08-27 10:38:39 +00:00
Daniel Dunbar 1844a71e66 X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.
llvm-svn: 112250
2010-08-27 01:30:14 +00:00
Devang Patel b12ff5999e Revert r112213. It is not needed.
llvm-svn: 112242
2010-08-26 23:35:15 +00:00
Jim Grosbach 6a77066913 Simplify eliminateFrameIndex() interface back down now that PEI doesn't need
to try to re-use scavenged frame index reference registers. rdar://8277890

llvm-svn: 112241
2010-08-26 23:32:16 +00:00
Devang Patel ea134f56b1 If node is not available then use FuncInfo.ValueMap to emit debug info for byval parameter.
llvm-svn: 112238
2010-08-26 22:53:27 +00:00
Jim Grosbach 2a1915d04b Remove the now obsolete frame index virtual re-use algorithm from PEI. Pre-RA
virtual base registers handle this function, and more. A bit more cleanup
to do on the interface to eliminateFrameIndex() after this.

llvm-svn: 112237
2010-08-26 22:42:12 +00:00
Chris Lattner bfd2228182 optimize "integer extraction out of the middle of a vector" as produced
by SRoA.  This is part of rdar://7892780, but needs another xform to
expose this.

llvm-svn: 112232
2010-08-26 22:14:59 +00:00
Jim Grosbach e82d5b4aaf tidy up a bit. no functional change.
llvm-svn: 112228
2010-08-26 21:56:30 +00:00
Chris Lattner d4ebd6df5a optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x'
is a vector to be a vector element extraction.  This allows clang to
compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movd	%eax, %xmm0
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	addss	%xmm2, %xmm1
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm1, %xmm0
	ret

... eliminating half of the horribleness.

llvm-svn: 112227
2010-08-26 21:55:42 +00:00
Jim Grosbach 17da935964 Turn off the scavenging based frame reg reuse briefly to measure whether it's
still having a significant effect. It shouldn't be now that the pre-RA
virtual base reg stuff is in. Assuming that's valididated by the nightly
testers, we can simplify a lot of the PEI frame index code.

llvm-svn: 112220
2010-08-26 21:29:54 +00:00
Bruno Cardoso Lopes e25ba0c7c2 zap the now unused MVT::getIntVectorWithNumElements
llvm-svn: 112218
2010-08-26 20:53:12 +00:00
Devang Patel 42b4ac7ed3 Speculatively revert r112207.
llvm-svn: 112216
2010-08-26 20:33:42 +00:00
Devang Patel 977057f481 80 col.
llvm-svn: 112215
2010-08-26 20:32:32 +00:00
Devang Patel 384fa91deb Update DanglingDebugInfo so that it can be used to track llvm.dbg.declare also.
llvm-svn: 112213
2010-08-26 20:06:46 +00:00
Bob Wilson 97919e9c59 Use pseudo instructions for VST3.
llvm-svn: 112208
2010-08-26 18:51:29 +00:00
Devang Patel ab596a637c Donot forget to resolve dangling debug info in a case where virtual register, used for a value, is initialized after a dbg intrinsic is seen.
llvm-svn: 112207
2010-08-26 18:36:14 +00:00
Bill Wendling a9c03f4fae Reapply r112176 without removing the other CMN patterns (that was unintentional).
llvm-svn: 112206
2010-08-26 18:33:51 +00:00
Benjamin Kramer 2c45f431fa MCELF: Fix a thinko of mine.
llvm-svn: 112203
2010-08-26 18:12:04 +00:00
Bob Wilson a967c42a3d Fix comment typos.
llvm-svn: 112202
2010-08-26 18:08:11 +00:00
Owen Anderson bd2ecc7e68 Make JumpThreading smart enough to properly thread StrSwitch when it's compiled with clang++.
llvm-svn: 112198
2010-08-26 17:40:24 +00:00
Benjamin Kramer 929cc7618f MCELF: Compensate for the addend on i386. Patch by Roman Divacky, with some cleanups.
llvm-svn: 112197
2010-08-26 17:23:02 +00:00
Jim Grosbach 074d22e1ac Restrict the register to tGPR to make sure the str instruction will be
encodable as a 16-bit wide instruction.

llvm-svn: 112195
2010-08-26 17:02:47 +00:00
Dan Gohman 10b20b2b81 Revert r112176; it broke test/CodeGen/Thumb2/thumb2-cmn.ll.
llvm-svn: 112191
2010-08-26 15:50:25 +00:00
Dan Gohman ca26f79051 Reapply r112091 and r111922, support for metadata linking, with a
fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).

This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.

llvm-svn: 112190
2010-08-26 15:41:53 +00:00
Benjamin Kramer 9bf0380a54 StringRef::compare_numeric also differed from StringRef::compare for characters > 127.
llvm-svn: 112189
2010-08-26 15:25:35 +00:00
Benjamin Kramer b04d4af057 Do unsigned char comparisons in StringRef::compare_lower to be more consistent with compare in corner cases.
llvm-svn: 112185
2010-08-26 14:21:08 +00:00
Bill Wendling a9a0599b39 There seems to be a (potential) hardware bug with the CMN instruction and
comparison with 0. These two pieces of code should give identical results:

  rsbs r1, r1, 0
  cmp  r0, r1
  mov  r0, #0
  it   ls
  mov  r0, #1

and:

  cmn  r0, r1
  mov  r0, #0
  it   ls
  mov  r0, #1

However, the CMN gives the *opposite* result when r1 is 0. This is because the
carry flag is set in the CMP case but not in the CMN case. In short, the CMP
instruction doesn't perform a truncate of the (logical) NOT of 0 plus the value
of r0 and the carry bit (because the "carry bit" parameter to AddWithCarry is
defined as 1 in this case, the carry flag will always be set when r0 >= 0). The
CMN instruction doesn't perform a NOT of 0 so there is never a "carry" when this
AddWithCarry is performed (because the "carry bit" parameter to AddWithCarry is
defined as 0).

The AddWithCarry in the CMP case seems to be relying upon the identity:

  ~x + 1 = -x

However when x is 0 and unsigned, this doesn't hold:

   x = 0
  ~x = 0xFFFF FFFF
  ~x + 1 = 0x1 0000 0000
  (-x = 0) != (0x1 0000 0000 = ~x + 1)

Therefore, we should disable *all* versions of CMN, especially when comparing
against zero, until we can limit when the CMN instruction is used (when we know
that the RHS is not 0) or when we have a hardware fix for this.

(See the ARM docs for the "AddWithCarry" pseudo-code.)

This is related to <rdar://problem/7569620>.

llvm-svn: 112176
2010-08-26 09:07:33 +00:00
Chris Lattner af23e9a798 Add a hackaround for PR7993 which is causing failures on x86 builders that lack sse2.
llvm-svn: 112175
2010-08-26 06:57:07 +00:00
Chris Lattner eb2cc0ce0e implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Bob Wilson 4cec44975e Use pseudo instructions for VST1d64Q.
llvm-svn: 112170
2010-08-26 05:33:30 +00:00