Commit Graph

39352 Commits

Author SHA1 Message Date
Vasileios Kalintiris 1ed49fd384 [mips] Disable the TImode shift libcalls for 32-bit targets.
Summary:
The o32 ABI doesn't not support the TImode helpers. For the time being,
disable just the shift libcalls as they break recursive builds on MIPS.

Reviewers: sdardis

Subscribers: llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D24259

llvm-svn: 280798
2016-09-07 10:01:18 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Hal Finkel 42c83f131e [PowerPC] Fix address-offset folding for plain addi
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).

On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.

I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).

Fixes PR30280.

llvm-svn: 280789
2016-09-07 07:36:11 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper b880ad3a71 [AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input.
llvm-svn: 280781
2016-09-07 04:46:11 +00:00
Saleem Abdulrasool a7ade33d16 Revert "CodeGen: ensure that libcalls are always AAPCS CC"
This reverts SVN r280683.  Revert until I figure out why this is breaking lli
tests.

llvm-svn: 280778
2016-09-07 03:17:19 +00:00
Zachary Turner c998571493 Re-add "Make FieldList records print as a YAML sequence"
This was originally submitted in r280549, and reverted in r280577
due to breaking one MSVC buildbot.  The issue is that MSVC 2013
doesn't synthesize move constructors.  So even though i was
writing std::move(A) it was copying it, leading to a bogus ArrayRef.
The solution here is to simply remove the std::vector<> from the
type, since it is unused and unnecessary.  This way the ArrayRef
continues to point into the original memory backing the CVType.

llvm-svn: 280769
2016-09-06 23:45:47 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Ying Yi 24e91bd05f [llvm-cov] Add the project summary to the text coverage report for each source file.
This patch is a spin-off from https://reviews.llvm.org/D23922. It extends the text view to preserve the same feature as the html view.

Differential Revision: https://reviews.llvm.org/D24241

llvm-svn: 280756
2016-09-06 21:41:38 +00:00
Konstantin Zhuravlyov 864718c666 [AMDGPU] Wave and register controls
- Add missing test

llvm-svn: 280749
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Ying Yi d36b47c481 [llvm-cov] Add the "Go to first unexecuted line" feature.
This patch provides easy navigation to find the zero count lines, especially useful when the source file is very large.

Differential Revision: https://reviews.llvm.org/D23277

llvm-svn: 280739
2016-09-06 19:31:18 +00:00
Rafael Espindola b940b66c60 Add an c++ itanium demangler to llvm.
This adds a copy of the demangler in libcxxabi.

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:

    No constexpr.
    No alignas

On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi

My current plan is:

    Commit something like this
    Change lld to use it
    Change lldb to use it as the fallback

    Add a few #ifdefs so that exactly the same file can be used in
    libcxxabi to export abi::__cxa_demangle.

Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.

llvm-svn: 280732
2016-09-06 19:16:48 +00:00
Krzysztof Parzyszek 7c9b012629 [RDF] Ignore undef use operands
llvm-svn: 280717
2016-09-06 17:03:13 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
Simon Dardis b432a3ed7e [mips] Tighten FastISel restrictions
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.

This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.

Thanks to mike.k for reporting this issue!

Reviewers: seanbruno, vkalintiris

Differential Review: https://reviews.llvm.org/D23795

llvm-svn: 280706
2016-09-06 12:36:24 +00:00
Krzysztof Parzyszek 020ec299bf [PPC] Claim stack frame before storing into it, if no red zone is present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093

llvm-svn: 280705
2016-09-06 12:30:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper cf9f1b8dfa [AVX-512] Add a test case to show that we don't select masked vpermi2ps when the index operand comes from a bitcast.
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.

Taken from optimized output from clang's test case.

llvm-svn: 280695
2016-09-06 05:45:27 +00:00
Saleem Abdulrasool bfa25bd1ac ARM: workaround bundled operation predication
This is a Windows ARM specific issue.  If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up.  This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.

For now, report a bundle as being unpredicatable.  Although this is false, this
would trigger a failure case previously anyways, so this is no worse.  That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.

Under certain circumstances, it may be possible to "predicate the bundle".  This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence.  If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.

llvm-svn: 280689
2016-09-06 04:00:12 +00:00
Craig Topper 62d0a5e7d3 [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.
llvm-svn: 280684
2016-09-06 00:31:10 +00:00
Saleem Abdulrasool a6519b1d54 CodeGen: ensure that libcalls are always AAPCS CC
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683
2016-09-06 00:28:43 +00:00
Craig Topper dfc4fc9f02 [AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double.
Still need to fix the register classes to allow the extended range of registers.

llvm-svn: 280682
2016-09-05 23:58:40 +00:00
Craig Topper 70e1348031 [X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add command lines for AVX and AVX512 feature sets.
llvm-svn: 280681
2016-09-05 23:58:37 +00:00
Craig Topper f54ebca2dd [X86] Update fast-isel vector load test to have more 256 and 512-bit test cases. Add a command line for SKX features too.
llvm-svn: 280680
2016-09-05 23:58:32 +00:00
Sanjay Patel e341c919b0 fix FileCheck variables for test added with r280677
The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679
2016-09-05 23:49:32 +00:00
Gor Nishanov ccabaca273 [Coroutines] Part12: Handle alloca address-taken
Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678
2016-09-05 23:45:45 +00:00
Sanjay Patel eea2ef7862 [InstCombine] don't assert that division-by-constant has been folded (PR30281)
This is effectively a revert of:
https://reviews.llvm.org/rL280115

And this should fix
https://llvm.org/bugs/show_bug.cgi?id=30281:

llvm-svn: 280677
2016-09-05 23:38:22 +00:00
Sanjay Patel 46f9df5b71 [InstCombine] revert r280637 because it causes test failures on an ARM bot
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll

llvm-svn: 280676
2016-09-05 22:36:32 +00:00
Simon Pilgrim 6dd4b3526a [X86][SSE] Add test cases for PR29078
'Failure to recognise i64 sitofp/uitofp conversions that can be performed as i32'

llvm-svn: 280671
2016-09-05 18:11:17 +00:00
Simon Pilgrim efab350967 [X86][SSE] Add test cases for PR29079
'Failure to recognise uitofp conversions that can be performed as sitofp'

llvm-svn: 280670
2016-09-05 18:04:38 +00:00
Simon Pilgrim 5f14ff75ec [X86][SSE] Regenerate odd shuffle tests with common prefixes
llvm-svn: 280661
2016-09-05 14:15:38 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
James Molloy 728cf85950 [Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc}
These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively.

Fixes PR30279.

llvm-svn: 280651
2016-09-05 08:29:15 +00:00
Igor Breger a2f8ca9a34 [AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits.
Differential Revision: http://reviews.llvm.org/D23983

llvm-svn: 280650
2016-09-05 08:26:51 +00:00
Craig Topper d9ca3d97ef [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648
2016-09-05 06:43:06 +00:00
Gor Nishanov 0e18f75a92 [Coroutines] Part11: Add final suspend handling.
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24068

llvm-svn: 280646
2016-09-05 04:44:30 +00:00
Craig Topper 42b69dd961 [X86] Add AVX and AVX512 command lines to the vec_ss_load_fold test.
llvm-svn: 280645
2016-09-05 02:20:53 +00:00
Sanjay Patel c641e9d6ff [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
The code to calculate 'UsesRemoved' could be simplified.
As-is, that code is a victim of PR30273:
https://llvm.org/bugs/show_bug.cgi?id=30273

llvm-svn: 280637
2016-09-04 20:58:27 +00:00
Simon Pilgrim cded03f163 [X86] Regenerate x64 mmx/f64 return value tests
llvm-svn: 280634
2016-09-04 18:14:45 +00:00
Craig Topper 4177345d7f [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR.
llvm-svn: 280633
2016-09-04 18:13:33 +00:00
Lang Hames 38c7927b6f [ORC] Clone module flags metadata into the globals module in the
CompileOnDemandLayer.

Also contains a tweak to the orc-lazy jit in LLI to enable the test case.

llvm-svn: 280632
2016-09-04 17:53:30 +00:00
Simon Pilgrim 128047fde5 [X86] Regenerate trunc-store legalization test
llvm-svn: 280631
2016-09-04 17:50:03 +00:00
Simon Pilgrim c228f5c195 [X86][SSE] Regenerate fcmp/uitofp combine tests
llvm-svn: 280629
2016-09-04 17:16:01 +00:00
Igor Breger 7e2a0dfa0c revert r279960.
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
2016-09-04 14:03:52 +00:00
Simon Pilgrim 9a36318c54 EOL fixes
llvm-svn: 280624
2016-09-04 13:30:46 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00