Commit Graph

1594 Commits

Author SHA1 Message Date
Evan Cheng 5e5a63cf8f CodeGen still defaults to non-verbose asm, but llc now overrides it and default to verbose.
llvm-svn: 67668
2009-03-25 01:47:28 +00:00
Dan Gohman 9fc30d5c30 Add a testcase for the scheduling heuristic introduced in r67586.
llvm-svn: 67622
2009-03-24 16:38:27 +00:00
Evan Cheng a774a99245 Do not emit comments unless -asm-verbose.
llvm-svn: 67580
2009-03-24 00:17:40 +00:00
Evan Cheng 7fe1b0f50f Fix a bug in spill weight computation. If the alias is a super-register, and the super-register is in the register class we are trying to allocate. Then add the weight to all sub-registers of the super-register even if they are not aliases.
e.g. allocating for GR32, bh is not used, updating bl spill weight.                                                                                                        
     bl should get the same spill weight otherwise it will be choosen                                                                                              
     as a spill candidate since spilling bh doesn't make ebx available.
This fix PR2866.

llvm-svn: 67574
2009-03-23 22:57:19 +00:00
Dale Johannesen 93eefa0043 Fix internal representation of fp80 to be the
same as a normal i80 {low64, high16} rather
than its own {high64, low16}.  A depressing number
of places know about this; I think I got them all.
Bitcode readers and writers convert back to the old
form to avoid breaking compatibility.

llvm-svn: 67562
2009-03-23 21:16:53 +00:00
Evan Cheng 4dc0c6697f Update test for pr3864.
llvm-svn: 67545
2009-03-23 18:27:36 +00:00
Evan Cheng f858466018 Fix PR3391 and PR3864. Reg allocator infinite looping.
llvm-svn: 67544
2009-03-23 18:24:37 +00:00
Evan Cheng 968c3b0d6e Model inline asm constraint which ties an input to an output register as machine operand TIED_TO constraint. This eliminated the need to pre-allocate registers for these. This also allows register allocator can eliminate the unneeded copies.
llvm-svn: 67512
2009-03-23 08:01:15 +00:00
Evan Cheng 47c9750f04 Do not fold away subreg_to_reg if the source register has a sub-register index. That means the source register is taking a sub-register of a larger register. e.g. On x86
%RAX<def> = ...
%RAX<def> = SUBREG_TO_REG 0, %EAX:3<kill>, 3
The first def is defining RAX, not EAX so the top bits were not zero-extended.

llvm-svn: 67511
2009-03-23 07:19:58 +00:00
Rafael Espindola d2b64fc65b Add -relocation-model=pic so that the test works
both in Linux and Darwin.

llvm-svn: 67191
2009-03-18 09:38:28 +00:00
Mon P Wang 32c8074be6 Added missing support for widening when splitting an unary op (PR3683)
and expanding a bit convert (PR3711).  In both cases, we extract the
valid part of the widen vector and then do the conversion.

llvm-svn: 67175
2009-03-18 06:24:04 +00:00
Evan Cheng 8df898917f Add another test case for r64440.
llvm-svn: 67156
2009-03-18 02:43:01 +00:00
Chris Lattner a6bed3e950 Disable the "call to immediate" optimization on x86-64. It is
not safe in general because the immediate could be an arbitrary
value that does not fit in a 32-bit pcrel displacement.  
Conservatively fall back to loading the value into a register
and calling through it.

We still do the optzn on X86-32.

llvm-svn: 67142
2009-03-18 00:43:52 +00:00
Bill Wendling 4eaeb4ef22 A more proper -mtriple.
llvm-svn: 67138
2009-03-18 00:19:44 +00:00
Bill Wendling 3aad86fa3f Temporary fix. I think Rafael wanted this to be Linux-only.
llvm-svn: 67137
2009-03-18 00:16:36 +00:00
Chris Lattner 42e9ca42ce LSR shouldn't ever try to hack on integer IV's larger than 64-bits. Right now
it is not APInt clean, but even when it is it needs to be evaluated carefully
to determine whether it is actually profitable.

This fixes a crash on PR3806

llvm-svn: 67134
2009-03-17 23:58:30 +00:00
Rafael Espindola 4606b12108 Don't force promotion of return arguments on the callee.
Some architectures (like x86) don't require it.
This fixes bug 3779.

llvm-svn: 67132
2009-03-17 23:43:59 +00:00
Chris Lattner 4359f3f26f this is apparently passing now. Evan/Dan, please check
to see if this is producing the expected code or not, I'm
not sure what the test was intended to check.

llvm-svn: 67099
2009-03-17 20:23:43 +00:00
Chris Lattner 2363d0b8b9 Fix codegen to compute the size of an allocation by multiplying the
size by the array amount as an i32 value instead of promoting from
i32 to i64 then doing the multiply.  Not doing this broke wrap-around
assumptions that the optimizers (validly) made.  The ultimate real
fix for this is to introduce i64 version of alloca and remove mallocinst.

This fixes PR3829

llvm-svn: 67093
2009-03-17 19:36:00 +00:00
Evan Cheng 8216785663 Add newline at end of file.
llvm-svn: 67085
2009-03-17 17:08:25 +00:00
Scott Michel df52d3d477 CellSPU:
Revert inadvertent mis-fix of fneg.

llvm-svn: 67084
2009-03-17 16:45:16 +00:00
Duncan Sands fb5c74ef4b Reapply r67049, with the test adjusted for darwin
(which produces "call L_f$stub" rather than "call f").

llvm-svn: 67079
2009-03-17 09:46:22 +00:00
Mon P Wang 523c0852c6 Fix a problem with DAGCombine where we were building an illegal build
vector shuffle mask. Forced the mask to be built using i32.  Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.

llvm-svn: 67076
2009-03-17 06:33:10 +00:00
Dan Gohman d6e571b202 Recognize bswapl as bswap too.
llvm-svn: 67072
2009-03-17 02:45:40 +00:00
Dan Gohman 77a9279d80 Recognize "bswapq" as an alternate spelling for the bswap instruction.
llvm-svn: 67071
2009-03-17 02:17:27 +00:00
Evan Cheng 76f1b47ec9 Spiller may unfold load / mod / store instructions as an optimization when the would be loaded value is available in a register. It needs to check if it's legal to clobber the register. Also, the register can contain values of multiple spill slots, make sure to check all instead of just the one being unfolded.
llvm-svn: 67068
2009-03-17 01:23:09 +00:00
Scott Michel 839ad0a5f3 CellSPU:
- Fix fabs, fneg for f32 and f64.
- Use BuildVectorSDNode.isConstantSplat, now that the functionality exists
- Continue to improve i64 constant lowering. Lower certain special constants
  to the constant pool when they correspond to SPU's shufb instruction's
  special mask values. This avoids the overhead of performing a shuffle on a
  zero-filled vector just to get the special constant when the memory load
  suffices.

llvm-svn: 67067
2009-03-17 01:15:45 +00:00
Bill Wendling dadaf54e09 --- Reverse-merging (from foreign repository) r67049 into '.':
U    test/CodeGen/X86/2009-03-13-PHIElimBug.ll
D    test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll
U    lib/CodeGen/PHIElimination.cpp

r67049 was causing this failure:

Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll for PR3784
Failed with exit(1) at line 1
while running:  llvm-as < /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll |  llc -march=x86 | /usr/bin/grep -A 2 {call f} | /usr/bin/grep movl
child process exited abnormally

llvm-svn: 67051
2009-03-16 20:27:20 +00:00
Duncan Sands d3e07c9d09 Tweak the fix for PR3784: be less sensitive about just
how invokes are set up.  The fix could be disturbed by
register copies coming after the EH_LABEL, and also didn't
behave quite right when it was the invoke result that
was used in a phi node.  Also (see new testcase) fix
another phi elimination bug while there: register copies
in the landing pad need to come after the EH_LABEL, because
that's where execution branches to when unwinding.  If they
come before the EH_LABEL then they will never be executed...
Also tweak the original testcase so it doesn't use a no-longer
existing counter.
The accumulated phi elimination changes fix two of seven Ada
testsuite failures that turned up after landing pad critical
edge splitting was turned off.  So there's probably more to come.

llvm-svn: 67049
2009-03-16 19:58:38 +00:00
Scott Michel d1db1aba66 CellSPU:
Incorporate Tilmann's 128-bit operation patch. Evidently, it gets the
llvm-gcc bootstrap a bit further along.

llvm-svn: 67048
2009-03-16 18:47:25 +00:00
Dan Gohman 6c28e72bb1 Add a testcase that covers a wide variety of ABI isel cases.
llvm-svn: 67003
2009-03-14 02:35:10 +00:00
Dan Gohman f98cd1b48a Use %rip-relative addressing on x86-64 whenever practical, as
it has a smaller encoding than absolute addressing.

llvm-svn: 67002
2009-03-14 02:33:41 +00:00
Dan Gohman 638e530509 Add a few more ptrtoint/inttoptr cast tests.
llvm-svn: 66989
2009-03-13 23:54:51 +00:00
Dan Gohman a62e4ab690 Improve FastISel's handling of truncates to i1, and implement
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.

llvm-svn: 66988
2009-03-13 23:53:06 +00:00
Evan Cheng 94419d6fdd Fix PR3784: If the source of a phi comes from a bb ended with an invoke, make sure the copy is inserted before the try range (unless it's used as an input to the invoke, then insert it after the last use), not at the end of the bb.
Also re-apply r66140 which was disabled as a workaround.

llvm-svn: 66976
2009-03-13 22:59:14 +00:00
Dan Gohman c0bb959591 Fix FastISel's assumption that i1 values are always zero-extended
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.

llvm-svn: 66941
2009-03-13 20:42:20 +00:00
Rafael Espindola 71144973f3 Improve sext and zext of TLS variables.
llvm-svn: 66922
2009-03-13 18:37:06 +00:00
Evan Cheng 1fb8aedd1e Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.


Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875
2009-03-13 07:51:59 +00:00
Chris Lattner 99cc133710 generalize the previous code to use the full generality of LEA
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869
2009-03-13 05:53:31 +00:00
Chris Lattner 4be6df5d86 optimize the case of cond ? 42 : 41 and friends. This compiles the
example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868
2009-03-13 05:22:11 +00:00
Dan Gohman a1d92423cf Enhance address-mode folding of ISD::ADD to handle cases where the
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.

llvm-svn: 66865
2009-03-13 02:25:09 +00:00
Evan Cheng 50a839e61f Add this test back.
llvm-svn: 66838
2009-03-12 23:01:35 +00:00
Duncan Sands 1f853d6a2a Revert commit 66140 since it caused several failures
in the Ada testcase.  Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late...  See PR3784.

llvm-svn: 66826
2009-03-12 21:13:42 +00:00
Evan Cheng 56f9f80bb1 Typo.
llvm-svn: 66797
2009-03-12 17:07:39 +00:00
Evan Cheng f16a991262 Fix test after Chris' select changes.
llvm-svn: 66795
2009-03-12 16:10:08 +00:00
Chris Lattner 4147f08e44 Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))"
related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779
2009-03-12 06:52:53 +00:00
Evan Cheng ef0b7cc2d5 On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
llvm-svn: 66776
2009-03-12 05:59:15 +00:00
Chris Lattner 7b87e542dc add no-unwind, remove duplicate run line.
llvm-svn: 66775
2009-03-12 05:56:37 +00:00
Chris Lattner 1d5cf4bcdd add nounwinds
llvm-svn: 66773
2009-03-12 05:35:33 +00:00
Dan Gohman 5637df37cd Revert r66024. The JIT encoding for CALLpcrel32 is wrong -- see PR3773, and the
assembly text output uses an indirect call ("call *") instead of a direct call.

llvm-svn: 66735
2009-03-11 23:01:47 +00:00