Commit Graph

4545 Commits

Author SHA1 Message Date
Eli Friedman 7335e8a720 Back out r131444 and r131438; they're breaking nightly tests. I'll look into
it more tomorrow.

llvm-svn: 131451
2011-05-17 02:36:59 +00:00
Eli Friedman e5f7f26df0 Fix test.
llvm-svn: 131444
2011-05-17 00:39:14 +00:00
Evan Cheng 54459240e3 Add target triple so test doesn't fail on Windows machines.
llvm-svn: 131439
2011-05-17 00:15:58 +00:00
Eli Friedman 83ba150f3a Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872.
llvm-svn: 131438
2011-05-17 00:13:47 +00:00
Jakob Stoklund Olesen 4edf17d91f Teach LiveInterval::isZeroLength about null SlotIndexes.
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.

This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic.  Prioritizing
register allocation according to spill weight can cause more registers to be
used.

llvm-svn: 131436
2011-05-16 23:50:05 +00:00
Eli Friedman d4a3609d30 Remove dead code. Fix associated test to use FileCheck.
llvm-svn: 131424
2011-05-16 21:28:22 +00:00
Eli Friedman a4d4a0162d Make fast-isel work correctly s/uadd.with.overflow intrinsics.
llvm-svn: 131420
2011-05-16 21:06:17 +00:00
Eli Friedman 9ac944774f Basic fast-isel of extractvalue. Not too helpful on its own, given the IR clang generates for cases like this, but it should become more useful soon.
llvm-svn: 131417
2011-05-16 20:27:46 +00:00
Rafael Espindola df9db7ed92 Don't produce a vmovntdq if we don't have AVX support.
llvm-svn: 131330
2011-05-14 00:30:01 +00:00
Rafael Espindola e53b7d1a11 Make codegen able to handle values of empty types. This is one way
to fix PR9900. I will keep it open until sable is able to comment on it.

llvm-svn: 131294
2011-05-13 15:18:06 +00:00
Stuart Hastings aa02c0847d Since I can't reproduce the failures from 131261, re-trying with a
simplified version.  <rdar://problem/9298790>

llvm-svn: 131274
2011-05-13 00:51:54 +00:00
Stuart Hastings 8d57d8ea64 Revert 131266 and 131261 due to buildbot complaints.
rdar://problem/9298790

llvm-svn: 131269
2011-05-13 00:15:17 +00:00
Stuart Hastings ef4940254f Tweak 131261 (thumb2-cbnz.ll) to generate the intended cbnz.
rdar://problem/9298790

llvm-svn: 131266
2011-05-13 00:10:03 +00:00
Stuart Hastings 89f1b47e3a Non-fast-isel followup to 129634; correctly handle branches controlled
by non-CMP expressions.  The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester.  Alas, the ARM assembly would be very difficult to check with
FileCheck.

The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790

llvm-svn: 131261
2011-05-12 23:36:41 +00:00
Galina Kistanova 9e56e51fab Correction. Use explicit target triple in the test.
llvm-svn: 131252
2011-05-12 21:55:34 +00:00
Evan Cheng 43054e6159 Re-enable branchfolding common code hoisting optimization. Fixed a liveness test bug and also taught it to update liveins.
llvm-svn: 131241
2011-05-12 20:30:01 +00:00
Stuart Hastings 114ecbd0f4 Move this test to CodeGen/Thumb. rdar://problem/9416774
llvm-svn: 131196
2011-05-11 19:41:28 +00:00
Devang Patel 34a6620748 Identify end of prologue (and beginning of function body) using DW_LNS_set_prologue_end line table opcode.
llvm-svn: 131194
2011-05-11 19:22:19 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Nadav Rotem 8f971c27fb Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector.
llvm-svn: 131179
2011-05-11 08:12:09 +00:00
Rafael Espindola 2a09d65979 Revert 131172 as it is causing clang to miscompile itself. I will try
to provide a reduced testcase.

llvm-svn: 131176
2011-05-11 03:27:17 +00:00
Evan Cheng 05fc35e275 Add a late optimization to BranchFolding that hoist common instruction sequences
at the start of basic blocks to their common predecessor. It's actually quite
common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size
benefit. e.g.

        pushq   %rax
        testl   %edi, %edi
        jne     LBB0_2
## BB#1:
        xorb    %al, %al
        popq    %rdx
        ret
LBB0_2:
        xorb    %al, %al
        callq   _foo
        popq    %rdx
        ret

=>

        pushq   %rax
        xorb    %al, %al
        testl   %edi, %edi
        je      LBB0_2
## BB#1:
        callq   _foo
LBB0_2:
        popq    %rdx
        ret

rdar://9145558

llvm-svn: 131172
2011-05-11 01:03:01 +00:00
Rafael Espindola 19c1a56287 Produce a __debug_frame section on darwin ARM when appropriate.
llvm-svn: 131151
2011-05-10 21:04:45 +00:00
Justin Holewinski 3c0447259c PTX: add test cases for cvt, fneg, and selp
Patch by Dan Bailey

llvm-svn: 131128
2011-05-10 14:53:13 +00:00
Benjamin Kramer d724a590e5 X86: Add a bunch of peeps for add and sub of SETB.
"b + ((a < b) ? 1 : 0)" compiles into
	cmpl	%esi, %edi
	adcl	$0, %esi
instead of
	cmpl	%esi, %edi
	sbbl	%eax, %eax
	andl	$1, %eax
	addl	%esi, %eax

This saves a register, a false dependency on %eax
(Intel's CPUs still don't ignore it) and it's shorter.

llvm-svn: 131070
2011-05-08 18:36:07 +00:00
Jakob Stoklund Olesen a5c889982a Emit a proper error message when register allocators run out of registers.
This can't be just an assertion, users can always write impossible inline
assembly. Such an assembly statement should be included in the error message.

llvm-svn: 131024
2011-05-06 21:58:30 +00:00
Justin Holewinski 11d70b6b32 PTX: add PTX 2.3 language target
Patch by Wei-Ren Chen

llvm-svn: 130980
2011-05-06 11:40:36 +00:00
Eli Friedman 5401962643 Re-revert r130877; it's apparently causing a regression on 197.parser,
possibly related to cbnz formation.

llvm-svn: 130977
2011-05-06 05:23:07 +00:00
Rafael Espindola a4982bddf3 Don't produce a __debug_frame.
I tested both gdb on a bootstrapped clang and and the gdb testsuite on OS X (snow leopard)
and both are happy using __eh_frame.

llvm-svn: 130937
2011-05-05 18:43:39 +00:00
Eli Friedman 441a01a2b8 Avoid extra vreg copies for arguments passed in registers. Specifically, this can make MachineCSE more effective in some cases (especially in small functions). PR8361 / part of rdar://problem/8259436 .
llvm-svn: 130928
2011-05-05 16:53:34 +00:00
Jakob Stoklund Olesen 17d4f9bbcc Prepare remaining tests for -join-physreg going away.
llvm-svn: 130893
2011-05-04 23:54:59 +00:00
Jakob Stoklund Olesen 369bddf5ad Fix a batch of x86 tests to be coalescer independent.
Most of these tests require a single mov instruction that can come either before
or after a 2-addr instruction. -join-physregs changes the behavior, but the
results are equivalent.

llvm-svn: 130891
2011-05-04 23:54:51 +00:00
Dan Gohman dd550305e6 Give this test an explicit register allocator, so that it can work even if
the default register allocator is changed.

llvm-svn: 130883
2011-05-04 23:14:02 +00:00
Bill Wendling 2a40131f6b SjLj EH could produce a machine basic block that legitimately has more than one
landing pad as its successor.

SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>

llvm-svn: 130881
2011-05-04 22:54:05 +00:00
Eli Friedman 0fe4608af2 Re-commit r130862 with a minor change to avoid an iterator running off the edge in some cases.
Original message:

Teach MachineCSE how to do simple cross-block CSE involving physregs.  This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .

llvm-svn: 130877
2011-05-04 22:10:36 +00:00
Galina Kistanova e53ae508ec This test fails on ARM. The test shouldn't explicitly specify alignment (and alignment 4 is wrong) and requires hard-float.
llvm-svn: 130875
2011-05-04 21:57:44 +00:00
Eli Friedman 3bd79ba856 Back out r130862; it appears to be breaking bootstrap.
llvm-svn: 130867
2011-05-04 20:48:42 +00:00
Eli Friedman a16fc2fec0 Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130862
2011-05-04 19:54:24 +00:00
Jakob Stoklund Olesen 28a93a49bb Fix more register and coalescing dependencies.
llvm-svn: 130859
2011-05-04 19:02:11 +00:00
Jakob Stoklund Olesen d7fd7bfc31 Explicitly request physreg coalesing for a bunch of Thumb2 unit tests.
These tests all follow the same pattern:

	mov	r2, r0
	movs	r0, #0
	$CMP	r2, r1
	it	eq
	moveq	r0, #1
	bx	lr

The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:

	$CMP	r0, r1
	mov.w	r0, #0
	it	eq
	moveq	r0, #1
	bx	lr

So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.

llvm-svn: 130858
2011-05-04 19:02:07 +00:00
Jakob Stoklund Olesen e7528c45ea FileCheckize and break dependence on coalescing order.
llvm-svn: 130856
2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen 067ba3c23c Explicitly request -join-physregs for some tests that depend on it.
llvm-svn: 130855
2011-05-04 19:01:59 +00:00
Devang Patel 39ecf816c5 Do not emit location expression size twice.
llvm-svn: 130854
2011-05-04 19:00:57 +00:00
Akira Hatanaka 3bace5d223 Remove LLVM IR metadata in test case committed in r130847.
llvm-svn: 130849
2011-05-04 18:28:36 +00:00
Akira Hatanaka 23e8ecf125 Prevent instructions using $gp from being placed between a jalr and the instruction that restores the clobbered $gp.
llvm-svn: 130847
2011-05-04 17:54:27 +00:00
Jakob Stoklund Olesen f1b401800a Don't depend on the physreg coalescing order.
llvm-svn: 130818
2011-05-04 01:01:47 +00:00
Jakob Stoklund Olesen 5b5abb4ea1 Don't run this test through -regalloc=basic.
The basic allocator is really bad about hinting, so it doesn't eliminate all
copies when physreg joining is disabled.

llvm-svn: 130817
2011-05-04 01:01:44 +00:00
Jakob Stoklund Olesen d3b2f44c9d Fix register-dependent XCore tests
llvm-svn: 130816
2011-05-04 01:01:41 +00:00
Jakob Stoklund Olesen 7f7fc82141 Fix register-dependent test in MSP430.
llvm-svn: 130815
2011-05-04 01:01:39 +00:00
Jakob Stoklund Olesen 51b35f7bb1 Fix a bunch of ARM tests to be register allocation independent.
llvm-svn: 130800
2011-05-03 22:31:21 +00:00
Bill Wendling db0996c822 Replace the "movnt" intrinsics with a native store + nontemporal metadata bit.
<rdar://problem/8460511>

llvm-svn: 130791
2011-05-03 21:11:17 +00:00
Evan Cheng 93b5cdc5ab Make the test less likely to fail with minor changes.
llvm-svn: 130778
2011-05-03 19:09:32 +00:00
Bob Wilson c5242b0e78 Remove test for iOS divmod function, since that is disabled for now.
llvm-svn: 130769
2011-05-03 17:54:49 +00:00
Bruno Cardoso Lopes 168c9005b5 Add a few ARM coprocessor intrinsics. Testcases included
llvm-svn: 130763
2011-05-03 17:29:22 +00:00
Dan Gohman 6136e94897 Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Rafael Espindola 5164e6e8b2 Add 130690 back.
llvm-svn: 130693
2011-05-02 15:58:16 +00:00
Rafael Espindola a392475865 Revert while I debug the tests that use march but not mtriple.
llvm-svn: 130691
2011-05-02 15:42:31 +00:00
Rafael Espindola c2aad4f2a3 Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time.
llvm-svn: 130690
2011-05-02 15:00:52 +00:00
Rafael Espindola fc8223670a Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm.
llvm-svn: 130658
2011-05-01 15:44:13 +00:00
Rafael Espindola 750cb61553 GCC uses a different encoding of pointers in the FDE when using
-fno-dwarf2-cfi-asm. Implement the same behavior.

llvm-svn: 130637
2011-05-01 04:49:54 +00:00
Rafael Espindola b7c2286055 Revert the previous patch while I figure out how to make llvm-gcc
less agressive about disabling cfi on linux :-(

llvm-svn: 130626
2011-04-30 23:03:44 +00:00
Rafael Espindola 5265bc483e Enable CFI on OS X.
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.

The only two differences I know of are:

* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.

llvm-svn: 130623
2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen f5eaa8dc62 Allow folded spills in test.
llvm-svn: 130599
2011-04-30 08:00:50 +00:00
Jakob Stoklund Olesen edfabc9aad Weekly fix of register allocation dependent unit tests.
llvm-svn: 130567
2011-04-30 01:37:52 +00:00
Eli Friedman 4105ed1523 Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM.
llvm-svn: 130560
2011-04-29 23:34:52 +00:00
Eli Friedman 328bad02fa Switch to ImmLeaf (which can be used by FastISel) for a few more common ARM/Thumb2 patterns.
llvm-svn: 130552
2011-04-29 22:48:03 +00:00
Eli Friedman dd937843d3 Fix run-line, again. :(
llvm-svn: 130540
2011-04-29 21:33:03 +00:00
Eli Friedman 86caced370 Re-committing r130454, which does not in fact break anything.
Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130539
2011-04-29 21:22:56 +00:00
Eric Christopher 8d46b47787 Add trunc->branch support, this won't help with clang's i8->i1 truncations
for bools, but is a start.

llvm-svn: 130534
2011-04-29 20:02:39 +00:00
Rafael Espindola 697edc89a5 Change DwarfCFIException's member variables to track what it actually
emmits: .cfi_personality, .cfi_lsda and the moves.

llvm-svn: 130503
2011-04-29 14:48:51 +00:00
Andrew Trick e794e17524 Teach Thumb2 isel to fold and->rotr ==> ROR.
Generalization of Nate Begeman's patch!

llvm-svn: 130502
2011-04-29 14:18:15 +00:00
Andrew Trick 65266ed4d7 Combine thumb2-ror tests.
llvm-svn: 130498
2011-04-29 14:02:41 +00:00
Eli Friedman 517728b1ae Revert r130454; apparently this doesn't actually work.
llvm-svn: 130462
2011-04-28 23:55:14 +00:00
Eli Friedman 37b9ede969 Fix runline.
llvm-svn: 130455
2011-04-28 23:12:24 +00:00
Eli Friedman e4ecd42926 Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130454
2011-04-28 23:03:25 +00:00
Eli Friedman 7cd5101ad3 fast-isel sret calls, try 2. We actually do need to do something on x86-32. rdar://problem/9303592 .
llvm-svn: 130429
2011-04-28 20:19:12 +00:00
Eli Friedman 3cf6d4032a Actually revert r130348 correctly.
llvm-svn: 130418
2011-04-28 18:20:24 +00:00
Eli Friedman d5a80ca3c8 Revert r130348; causing buildbot issues on x86-32.
llvm-svn: 130412
2011-04-28 18:06:10 +00:00
Devang Patel 3e021533cd Teach dwarf writer to handle complex address expression for .debug_loc entries.
This fixes clang generated blocks' variables' debug info.
Radar 9279956.

llvm-svn: 130373
2011-04-28 02:22:40 +00:00
Eli Friedman 33c133919a Fix a silly mistake in r130338.
llvm-svn: 130360
2011-04-28 00:42:03 +00:00
Justin Holewinski 18e6ac83ea PTX: support for bitwise operations on predicates
- selection of bitwise preds (AND, OR, XOR)
- new bitwise.ll test

Patch by Dan Bailey

llvm-svn: 130353
2011-04-28 00:19:51 +00:00
Eli Friedman 8bd572fc58 fast-isel sret. We actually don't need to do anything special on x86. :) rdar://problem/9303592 .
llvm-svn: 130348
2011-04-27 23:58:52 +00:00
Eli Friedman 406c471b69 Make the fast-isel code for literal 0.0 a bit shorter/faster, since 0.0 is common. rdar://problem/9303592 .
llvm-svn: 130338
2011-04-27 22:41:55 +00:00
Evan Cheng 9808d31b9e If converter was being too cute. It look for root BBs (which don't have
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.

rdar://9344645

llvm-svn: 130324
2011-04-27 19:32:43 +00:00
Jakob Stoklund Olesen 71d3b895ba Also add <imp-def> operands for defined and dead super-registers when rewriting.
We cannot rely on the <imp-def> operands added by LiveIntervals in all cases as
demonstrated by the test case.

llvm-svn: 130313
2011-04-27 17:42:31 +00:00
Eli Friedman 0eea0293d9 Fix an edge case involving branches in fast-isel on x86.
rdar://problem/9303306 .

llvm-svn: 130272
2011-04-27 01:34:27 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Evan Cheng dbb86b8108 This test should be in MC. It breaks with changes to scheduling / register allocation so it's being removed.
llvm-svn: 130243
2011-04-26 21:09:04 +00:00
Benjamin Kramer 1d4c835089 Force a triple on this test to unbreak windows buildbots.
llvm-svn: 130226
2011-04-26 18:47:43 +00:00
Dan Gohman 7da91aee83 Fast-isel support for simple inline asms.
llvm-svn: 130205
2011-04-26 17:18:34 +00:00
Rafael Espindola 580eebaa20 Add test for PR9743.
llvm-svn: 130198
2011-04-26 14:17:42 +00:00
Chris Lattner 189ca1498f don't emit the symbol name twice for local bss and common
symbols.  For example, don't emit:
        .comm   _i,4,2                  ## @i
                                        ## @i

instead emit:
        .comm   _i,4,2                  ## @i

llvm-svn: 130192
2011-04-26 06:14:13 +00:00
Eric Christopher 238a21f2d5 Make this test disable fast isel as it's not needed.
llvm-svn: 130165
2011-04-25 22:39:46 +00:00
Akira Hatanaka 0e7ee666b7 Lower BlockAddress node when relocation-model is static.
llvm-svn: 130131
2011-04-25 17:10:45 +00:00
Devang Patel 734f2218ac A dbg.declare may not be in entry block, even if it is referring to an incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction.
llvm-svn: 130129
2011-04-25 16:33:52 +00:00
Benjamin Kramer ba446cc12a Make tests more useful.
lit needs a linter ...

llvm-svn: 130126
2011-04-25 10:12:01 +00:00
Andrew Trick 76dca78cb4 Accidental function name mangling.
llvm-svn: 130050
2011-04-23 04:08:15 +00:00
Andrew Trick 0ed5778a1e Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick 1a1f8d4640 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
NAKAMURA Takumi 576273cf56 test/CodeGen/X86/shrink-compare.ll: Relax expressions for Win64.
llvm-svn: 130039
2011-04-23 00:15:45 +00:00
Chris Lattner 6d277517d1 Recommit the fix for rdar://9289512 with a couple tweaks to
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
   would cause an assert.
2. The load may not be used by the current chain of instructions,
   and we could move it past a side-effecting instruction. Change
   how we process uses to define the problem away.

llvm-svn: 130018
2011-04-22 21:59:37 +00:00
Benjamin Kramer 341c11da3b DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless.
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
  movzbl	(%rdi), %eax
  cmpl	$47, %eax
->
  cmpb	$47, (%rdi)

This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)

llvm-svn: 130005
2011-04-22 18:47:44 +00:00
Benjamin Kramer 4c81624735 X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)
This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
	shlq	$42, %rdi               ## encoding: [0x48,0xc1,0xe7,0x2a]
	movabsq	$4398046511104, %rax    ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
	andq	%rdi, %rax              ## encoding: [0x48,0x21,0xf8]
	ret                             ## encoding: [0xc3]

with this patch we can fold the immediate into the and:
	andq	$1, %rdi                ## encoding: [0x48,0x83,0xe7,0x01]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	shlq	$42, %rax               ## encoding: [0x48,0xc1,0xe0,0x2a]
	ret                             ## encoding: [0xc3]

It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.

llvm-svn: 129990
2011-04-22 15:30:40 +00:00
Evan Cheng c0d2004e3c In Thumb2 mode, lower frame indix references to:
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.

rdar://9321541

llvm-svn: 129971
2011-04-22 01:42:52 +00:00
Devang Patel 94ad6ac13c Fix DWARF description of Q registers.
llvm-svn: 129952
2011-04-21 23:22:35 +00:00
Devang Patel 3712c14be9 Fix DWARF description of S registers.
llvm-svn: 129947
2011-04-21 22:48:26 +00:00
Devang Patel be22131c28 Test case for r129922
llvm-svn: 129934
2011-04-21 20:16:43 +00:00
Daniel Dunbar 6309828206 Revert r1296656, "Fix rdar://9289512 - not folding load into compare at -O0...",
which broke a couple GCC test suite tests at -O0.

llvm-svn: 129914
2011-04-21 16:14:46 +00:00
Che-Liang Chiou 14c48e5d66 ptx: fix parameter ordering
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.

Patch by Dan Bailey

llvm-svn: 129909
2011-04-21 10:56:58 +00:00
Evan Cheng 5f1ba4cd2d Remove -use-divmod-libcall. Let targets opt in when they are available.
llvm-svn: 129884
2011-04-20 22:20:12 +00:00
Stuart Hastings 1b06a10d62 Un-XFAIL this test for ARM. <rdar://problem/7662569>
llvm-svn: 129875
2011-04-20 21:47:45 +00:00
Justin Holewinski 7d8895e767 PTX: Add intrinsics to list of built-in intrinsics, which allows them to be
used by Clang.  To help Clang integration, the PTX target has been split
     into two targets: ptx32 and ptx64, depending on the desired pointer size.

- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64

llvm-svn: 129851
2011-04-20 15:37:17 +00:00
Eric Christopher bcaedb5ce0 Rewrite the expander for umulo/smulo to remember to sign extend the input
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).

Fixes rdar://9292577

llvm-svn: 129842
2011-04-20 01:19:45 +00:00
Daniel Dunbar ed3d5496dc llc: Eliminate a use of getDarwinMajorNumber().
- As before, there is a minor semantic change here (evidenced by the test
   change) for Darwin triples that have no version component. I debated changing
   the default behavior of isOSVersionLT, but decided it made more sense for
   triples to be explicit.

llvm-svn: 129805
2011-04-19 20:46:13 +00:00
Daniel Dunbar 4a7783b0c2 CodeGen: Eliminate a use of getDarwinMajorNumber().
- There is a minor semantic change here (evidenced by the test change) for
   Darwin triples that have no version component. I debated changing the default
   behavior of isOSVersionLT, but decided it made more sense for triples to be
   explicit.

llvm-svn: 129802
2011-04-19 20:32:39 +00:00
Bob Wilson 0858c3aaed This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Bob Wilson d04a83f8f2 Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637.
llvm-svn: 129774
2011-04-19 18:11:52 +00:00
Bob Wilson a2881ee8a4 Avoid some 's' 16-bit instruction which partially update CPSR
(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773
2011-04-19 18:11:49 +00:00
Bob Wilson df612ba006 Avoid write-after-write issue hazards for Cortex-A9.
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default.  We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772
2011-04-19 18:11:45 +00:00
Eli Friedman ee92a6b332 Add support for FastISel'ing varargs calls.
llvm-svn: 129765
2011-04-19 17:22:22 +00:00
Jakob Stoklund Olesen fb1249548f Tighten test case a bit.
Ideally, we would match an S-register to its containing D-register, but that
requires arithmetic (divide by 2).

llvm-svn: 129756
2011-04-19 06:14:45 +00:00
Chris Lattner 91328b317b Implement support for x86 fastisel of small fixed-sized memcpys, which are generated
en-mass for C++ PODs.  On my c++ test file, this cuts the fast isel rejects by 10x 
and shrinks the generated .s file by 5%

llvm-svn: 129755
2011-04-19 05:52:03 +00:00
Chris Lattner 5f4b783426 Implement support for fast isel of calls of i1 arguments, even though they are illegal,
when they are a truncate from something else.  This eliminates fully half of all the 
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.

This fixed rdar://9297003 - fast isel bails out on all functions taking bools

llvm-svn: 129752
2011-04-19 05:09:50 +00:00
Chris Lattner d7f7c93914 Handle i1/i8/i16 constant integer arguments to calls by prepromoting them.
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones.  Also, we used to emit extraneous code.  e.g. test12 was:

	movb	$0, %al
	movzbl	%al, %edi
	callq	_test12

and test13 was:
	movb	$0, %al
	xorl	%edi, %edi
	movb	%al, 7(%rsp)
	callq	_test13f

Now we get:

	movl	$0, %edi
	callq	_test12
and:
	movl	$0, %edi
	callq	_test13f

llvm-svn: 129751
2011-04-19 04:42:38 +00:00
Chris Lattner c59290a34c be layout aware, to produce:
testb	$1, %al
	je	LBB0_2
## BB#1:                                ## %if.then
	movb	$0, %al

instead of:

	testb	$1, %al
	jne	LBB0_1
	jmp	LBB0_2
LBB0_1:                                 ## %if.then
	movb	$0, %al

how 'bout that.

llvm-svn: 129749
2011-04-19 04:26:32 +00:00
Chris Lattner 2c8a4c3b1b fix rdar://9297006 - fast isel bails out on trunc to i1 -> bools cry,
a common cause of fast isel rejects on c++ code.

llvm-svn: 129748
2011-04-19 04:22:17 +00:00
Jakob Stoklund Olesen bf78618db6 Make tests register allocation independent again.
llvm-svn: 129739
2011-04-19 00:14:43 +00:00
Evan Cheng 4079133796 Do not lose mem_operands while lowering VLD / VST intrinsics.
llvm-svn: 129738
2011-04-19 00:04:03 +00:00
Eric Christopher c37aa0b26a Fix a bug where we were counting the alias sets as completely used
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.

Fixes rdar://9207598

llvm-svn: 129711
2011-04-18 19:26:25 +00:00
Chris Lattner 48f75ad678 while we're at it, handle 'sdiv exact' of a power of 2 also,
this fixes a few rejects on c++ iterator loops.

llvm-svn: 129694
2011-04-18 07:00:40 +00:00
Chris Lattner 562d6e82bd fix rdar://9297011 - udiv by power of two causing fast-isel rejects
llvm-svn: 129693
2011-04-18 06:55:51 +00:00
Chris Lattner 07add49a4b Implement major new fastisel functionality: the matcher can now handle immediates with
value constraints on them (when defined as ImmLeaf's).  This is particularly important
for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand,
which has a value constraint.  Before this patch we ended up iseling the examples into
such amazing code as:

	movabsq	$7, %rax
	imulq	%rax, %rdi
	movq	%rdi, %rax
	ret

now we produce:

	imulq	$7, %rdi, %rax
	ret

This dramatically shrinks the generated code at -O0 on x86-64.

llvm-svn: 129691
2011-04-18 06:22:33 +00:00
Chris Lattner 353fda159d relax this test to just check that the lock prefix is encoded properly,
and to not rely on the register allocator's arbitrary operand choices.

llvm-svn: 129690
2011-04-18 06:15:35 +00:00
Chris Lattner b53ccb8e36 1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the 
   shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
   instead of FastEmit_ri to simplify code.

llvm-svn: 129666
2011-04-17 20:23:29 +00:00
Chris Lattner eb729d48ff fix an x86 fast isel issue where we'd completely give up on folding an address
when we have a global variable base an an index.  Instead, just give up on
folding the global variable.

Before we'd geenrate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	addq	%rdi, %rax
	movzbl	(%rax), %eax
	ret

now we generate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	movzbl	(%rax,%rdi), %eax
	ret

The difference is even more significant when there is a scale
involved.

This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64

llvm-svn: 129664
2011-04-17 17:47:38 +00:00
Chris Lattner 4832660b4d fix an oversight which caused us to compile the testcase (and other
less trivial things) into a dummy lea.  Before we generated:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	ret

now we produce:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	ret

This is part of rdar://9289558

llvm-svn: 129662
2011-04-17 17:12:08 +00:00
Chris Lattner 045c43855c Fix rdar://9289512 - not folding load into compare at -O0
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo.  Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:


cmpb    $0, 52(%rax)
je      LBB4_2

instead of:

movb    52(%rax), %cl
cmpb    $0, %cl
je      LBB4_2

This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and 
generating less code.

This was one of the biggest classes of missing load folding.  Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.

llvm-svn: 129656
2011-04-17 06:35:44 +00:00
Eli Friedman 55f7bf3289 Remove working entry from README.
llvm-svn: 129654
2011-04-17 02:36:27 +00:00
Chris Lattner fba7ca63cc fix rdar://9289583 - fast isel should handle non-canonical commutative binops
allowing us to fold the immediate into the 'and' in this case:

int test1(int i) {
  return 8&i;
}

llvm-svn: 129653
2011-04-17 01:16:47 +00:00
Eli Friedman 55b0acd624 PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext.
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.

llvm-svn: 129650
2011-04-16 23:25:34 +00:00
Evan Cheng b14ce09fca Fix divmod libcall lowering. Convert to {S|U}DIVREM first and then expand the node to a libcall. rdar://9280991
llvm-svn: 129633
2011-04-16 03:08:26 +00:00
Akira Hatanaka 2cb3aa30dd Re-enable test o32_cc_vararg.ll.
llvm-svn: 129616
2011-04-15 22:23:09 +00:00
Cameron Zwarich 9c65e4d69c Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate
a case involving EOR, so I only added a test for ORR.

llvm-svn: 129610
2011-04-15 21:24:38 +00:00
Rafael Espindola 9fef721830 Add this test back for Darwin.
llvm-svn: 129607
2011-04-15 21:06:27 +00:00
Cameron Zwarich 0829b3065a The AND instruction leaves the V flag unmodified, so it falls victim to the same
problem as all of the other instructions we fold with CMPs.

llvm-svn: 129602
2011-04-15 20:45:00 +00:00
Cameron Zwarich 93eae1571c Add missing register forms of instructions to the ARM CMP-folding code. This
fixes <rdar://problem/9287901>.

llvm-svn: 129599
2011-04-15 20:28:28 +00:00
Akira Hatanaka 279169771b Add pass that expands pseudo instructions into target instructions after register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions.
llvm-svn: 129594
2011-04-15 19:52:08 +00:00
Rafael Espindola a01cdb0e37 Add 129518 back with a fix for when we are producing eh just because of debug info.
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.

llvm-svn: 129571
2011-04-15 15:11:06 +00:00
NAKAMURA Takumi b5e3e9dd27 Revert r129518, "Change ELF systems to use CFI for producing the EH tables. This reduces the"
It broke several builds.

llvm-svn: 129557
2011-04-15 03:35:57 +00:00
Evan Cheng 12bb05b75b Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't
forget to right shift the source by 32 first. rdar://9287902

llvm-svn: 129556
2011-04-15 01:31:00 +00:00