Commit Graph

4232 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen ae044c06bf Pick a conservative register class when creating a small live range for remat.
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.

The register class is reinflated after spilling, so the conservative choice is
only temporary.

llvm-svn: 128610
2011-03-31 03:54:44 +00:00
Evan Cheng ee9d45dd55 Don't try to create zero-sized stack objects.
llvm-svn: 128586
2011-03-30 23:44:13 +00:00
Cameron Zwarich 53dd03d537 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng 18381b4257 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Cameron Zwarich 143f9aea2b Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

llvm-svn: 128492
2011-03-29 21:41:55 +00:00
Rafael Espindola 6b2fac21ca Reduce test case.
llvm-svn: 128445
2011-03-29 02:18:54 +00:00
Evan Cheng e2086e740f Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Bill Wendling 96f962fdff In some cases, the "fail BB dominator" may be null after the BB was split (and
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.

llvm-svn: 128434
2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen 9a624fa993 Collect and coalesce DBG_VALUE instructions before emitting the function.
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.

The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.

llvm-svn: 128327
2011-03-26 02:19:36 +00:00
Eric Christopher d553096688 Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen 1886a4c823 Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs.
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.

Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.

llvm-svn: 128284
2011-03-25 17:20:59 +00:00
Devang Patel 71536de752 Move test in x86 specific area.
llvm-svn: 128245
2011-03-24 22:39:09 +00:00
Devang Patel e01b75cb89 Keep track of directory namd and fIx regression caused by Rafael's patch r119613.
A better approach would be to move source id handling inside MC.

llvm-svn: 128233
2011-03-24 20:30:50 +00:00
NAKAMURA Takumi 521eb7c11e Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.
FIXME: Some cleanups would be needed.
llvm-svn: 128206
2011-03-24 07:07:00 +00:00
Cameron Zwarich 4649f17db1 Do early taildup of ret in CodeGenPrepare for potential tail calls that have a
void return type. This fixes PR9487.

llvm-svn: 128197
2011-03-24 04:52:10 +00:00
Devang Patel abc77347a7 Enable GlobalMerge on darwin.
llvm-svn: 128183
2011-03-23 23:34:19 +00:00
Andrew Trick 4ab9a16569 Revert r128175.
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181
2011-03-23 23:11:02 +00:00
Evan Cheng 425489d397 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Eli Friedman 4c192305bf PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.
Also cleaning up some duplicated code while I'm here.

llvm-svn: 128176
2011-03-23 22:18:48 +00:00
Andrew Trick 4046a0de91 Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen ec0ac3ca40 Reapply r128045 and r128051 with fixes.
This will extend the ranges of debug info variables in registers until they are
clobbered.

Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.

Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.

llvm-svn: 128114
2011-03-22 22:33:08 +00:00
Andrew Trick b0f98bb5e9 Revert r128045 and r128051, debug info enhancements.
Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem.

llvm-svn: 128097
2011-03-22 19:18:42 +00:00
Che-Liang Chiou 7413080cea ptx: add analyze/insert/remove branch
llvm-svn: 128084
2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen 9c057ee440 Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges.
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.

Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.

llvm-svn: 128045
2011-03-22 00:21:41 +00:00
Dan Gohman c1783b31a4 Fix fast-isel address mode folding to avoid folding instructions
outside of the current basic block. This fixes PR9500, rdar://9156159.

llvm-svn: 128041
2011-03-22 00:04:35 +00:00
Rafael Espindola 1557fd6d39 Write the section table and the section data in the same order that
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.

llvm-svn: 127972
2011-03-20 18:44:20 +00:00
Daniel Dunbar 327cd36f74 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng 824a711305 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Nadav Rotem e7a101ccab Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Andrew Trick e7537a0187 FileCheckize a test.
(one-by-one until valgrind is happy)

llvm-svn: 127925
2011-03-19 00:41:39 +00:00
Evan Cheng dc1d626a3d Match a few more obvious patterns to revsh. rdar://9147637.
llvm-svn: 127913
2011-03-18 21:52:42 +00:00
Eli Friedman 59721e3238 Revert r127852; it's apparently causing an ICE on mingw.
llvm-svn: 127909
2011-03-18 21:12:29 +00:00
Justin Holewinski 0984dcc077 PTX: Fix various codegen issues
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)

llvm-svn: 127895
2011-03-18 19:24:28 +00:00
Che-Liang Chiou b1df0fe1cc ptx: fix parameter order that is reversed
llvm-svn: 127874
2011-03-18 11:23:56 +00:00
Che-Liang Chiou ff9d938e33 ptx: add unconditional and conditional branch
llvm-svn: 127873
2011-03-18 11:08:52 +00:00
Eli Friedman 1a916a3c0c Add a target-specific branchless method for double-width relational
comparisons on x86.  Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.

This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.

llvm-svn: 127852
2011-03-18 02:34:11 +00:00
Benjamin Kramer cfcea12fe2 BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
	shrl	$2, %edi
	imulq	$613566757, %rdi, %rax
	shrq	$32, %rax
	ret

instead of
	movl    %edi, %eax
	imulq   $613566757, %rax, %rcx
	shrq    $32, %rcx
	subl    %ecx, %eax
	shrl    %eax
	addl    %ecx, %eax
	shrl    $4, %eax

on x86_64

llvm-svn: 127829
2011-03-17 20:39:14 +00:00
Richard Osborne 6120962d7d Add XCore intrinsic for setpsc.
llvm-svn: 127821
2011-03-17 18:42:05 +00:00
NAKAMURA Takumi bf9ff6f63b test/CodeGen/X86/h-registers-1.ll: Add explicit -mtriple=x86_64-linux. It does not need to be checked on x86_64-win32 (aka Win64).
llvm-svn: 127800
2011-03-17 04:24:40 +00:00
NAKAMURA Takumi 5b6198dfb9 test/CodeGen/X86/constant-pool-remat-0.ll: FileCheck-ize and add explicit -mtriple=x86_64-linux.
llvm-svn: 127775
2011-03-16 23:01:31 +00:00
Cameron Zwarich ac106273d4 The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766
2011-03-16 22:20:18 +00:00
Cameron Zwarich 40a9200357 Rename a test to be more inclusive.
llvm-svn: 127765
2011-03-16 22:20:12 +00:00
Daniel Dunbar fd95b016fb Revert r127757, "Patch to a fix dwarf relocation problem on ARM. One-line fix
plus the test where it used to break.", which broke Clang self-host of a
Debug+Asserts compiler, on OS X.

llvm-svn: 127763
2011-03-16 22:16:39 +00:00
Richard Osborne c871eff3f5 Add XCore intrinsics for setclk, setrdy.
llvm-svn: 127761
2011-03-16 21:56:00 +00:00
Renato Golin a3aeafeb35 Patch to a fix dwarf relocation problem on ARM. One-line fix plus the test where it used to break.
llvm-svn: 127757
2011-03-16 21:05:52 +00:00
Cameron Zwarich 49e354bcb6 Add a test for i1 zeroext arguments on x86-64. We currently generate code that
conforms to the ABI, but DAGCombine could in theory recognize the sequence of
zext asserts and truncates and generate incorrect code.

llvm-svn: 127754
2011-03-16 20:15:44 +00:00
Richard Osborne d4346f2388 Add checkevent intrinsic to check if any resources owned by the current thread
can event.

llvm-svn: 127741
2011-03-16 18:34:00 +00:00
NAKAMURA Takumi d60e4101e6 test/CodeGen/X86: FileCheck-ize and add actions for x86_64-linux and x86_64-win32.
llvm-svn: 127734
2011-03-16 13:53:07 +00:00
NAKAMURA Takumi 0b9e2b0257 test/CodeGen/X86: Add a pattern for Win64.
llvm-svn: 127733
2011-03-16 13:52:51 +00:00
NAKAMURA Takumi c10801e8a5 test/CodeGen/X86: FileCheck-ize and add explicit -mtriple=x86_64-linux. They are useless to Win64 target.
llvm-svn: 127732
2011-03-16 13:52:38 +00:00