Commit Graph

29112 Commits

Author SHA1 Message Date
Chandler Carruth afe4b2507e [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening
vector types to be legal and a ZERO_EXTEND node is encountered.

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Differential Revision: http://reviews.llvm.org/D4405

llvm-svn: 212610
2014-07-09 10:58:18 +00:00
Daniel Sanders e31155fd1a [mips][mips64r6] Correct select patterns that have the condition or true/false values backwards
Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others).

Differential Revision: http://reviews.llvm.org/D4388

llvm-svn: 212608
2014-07-09 10:47:26 +00:00
Daniel Sanders dc06718e0b [mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions
Summary:
It seems we accidentally read the wrong column of the table MIPS64r6 spec
and used the names for c.cond.fmt instead of cmp.cond.fmt.

Differential Revision: http://reviews.llvm.org/D4387

llvm-svn: 212607
2014-07-09 10:40:20 +00:00
Chandler Carruth ef5dcf571e [x86] Initialize a pointer to null to fix a bug in r212602.
This should restore GCC hosts (which happen to put the bad stuff into
the pointer) and MSan, etc.

llvm-svn: 212606
2014-07-09 10:36:42 +00:00
Daniel Sanders f5a5fbd3f4 [mips][mips64r6] Use JALR for indirect branches instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6.

Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4269

llvm-svn: 212605
2014-07-09 10:21:59 +00:00
Daniel Sanders 338513b3fa [mips][mips64r6] Use JALR for returns instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
RET, and RET_MM have been replaced by a pseudo named PseudoReturn.
In addition a version with a 64-bit GPR named PseudoReturn64 has been
added.

Instruction selection for a return matches RetRA, which is expanded post
register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter,
this PseudoReturn/PseudoReturn64 are emitted as:
- (JALR64 $zero, $rs) on MIPS64r6
- (JALR $zero, $rs) on MIPS32r6
- (JR_MM $rs) on microMIPS
- (JR $rs) otherwise

On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid
development and review (specifically, to ensure all cases of jr are
updated), these aliases are temporarily named 'r6.jr' instead of 'jr'.
A follow up patch will change them back to the correct mnemonic.

Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect
jump, and removed it from its definition of a call.
Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's
doesn't appear to account for any MIPS64-specifics.

The return instruction created as part of eh_return expansion is now expanded
using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6
('jalr $zero, $rs').

Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in
expandEhReturn().

Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4268

llvm-svn: 212604
2014-07-09 10:16:07 +00:00
Chandler Carruth 2ebc942683 [x86] Re-apply a variant of the x86 side of r212324 now that the rest
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.

The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.

This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
  (where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
  now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
  a correct check of whether the splat is the only user of a loaded
  value, checking that the splat actually crosses multiple lanes before
  using a broadcast, and handling broadcasts of non-constant splats.

As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.

llvm-svn: 212602
2014-07-09 10:06:58 +00:00
NAKAMURA Takumi 843c4cb401 MipsTargetStreamer.h: Avoid "using" to appease msc17.
llvm-svn: 212577
2014-07-08 23:48:22 +00:00
Jim Grosbach 04691a530d AArch64: Better codegen for loading from __fp16.
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.

rdar://17594379

llvm-svn: 212573
2014-07-08 23:28:48 +00:00
Ulrich Weigand 862d8b8d06 [PowerPC] Implement atomic NAND operations as actual NAND
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).

This is in line with the common-code and ARM back-end change
implemented in r212433.

llvm-svn: 212547
2014-07-08 16:16:02 +00:00
Daniel Sanders 324ad956e0 [mips] Fixed struct/class mismatch introduced in r212522.
Clang emits a warning about this.

llvm-svn: 212528
2014-07-08 13:13:42 +00:00
Daniel Sanders 7201a3e3bb Fix r212522 - [mips] Improve encapsulation of the .MIPS.abiflags implementation and limit scope of related enums
Added two lines that should have been in r212522.

llvm-svn: 212523
2014-07-08 10:35:52 +00:00
Daniel Sanders c7dbc630e5 [mips] Improve encapsulation of the .MIPS.abiflags implementation and limit scope of related enums
Summary:
Follow on to r212519 to improve the encapsulation and limit the scope of the enums.

Also merged two very similar parser functions, fixed a bug where ASE's
were not being reported, and marked CPR1's as being 128-bit when MSA is
enabled.

Differential Revision: http://reviews.llvm.org/D4384

llvm-svn: 212522
2014-07-08 10:11:38 +00:00
Renato Golin b8a86c43c0 Revert "Refactor ARM subarchitecture parsing"
This reverts commit 7b4a6882467e7fef4516a0cbc418cbfce0fc6f6d.

llvm-svn: 212521
2014-07-08 10:06:16 +00:00
Arnaud A. de Grandmaison d7827606de Truncate the immediate in logical operation to the register width
And continue to produce an error if the 32 most significant bits are not all ones or zeros.

llvm-svn: 212520
2014-07-08 09:53:04 +00:00
Vladimir Medic fb8a2a95cd Mips.abiflags is a new implicitly generated section that will be present on all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it.
llvm-svn: 212519
2014-07-08 08:59:22 +00:00
Chandler Carruth 142e966261 [x86,SDAG] Sink the logic for folding shuffles of splats more
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.

This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.

First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.

Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.

Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.

But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.

llvm-svn: 212517
2014-07-08 08:45:38 +00:00
Adam Nemet 79580db918 [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*
As destination k0 is allowed but not as predicate/writemask.

I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.

llvm-svn: 212504
2014-07-08 00:22:32 +00:00
Andrea Di Biagio 2620b877b6 [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types.
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.

Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.

define <4 x float> @test1(<4 x float> %V) {
  %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  ret <4 x float> %2
}

llvm-svn: 212498
2014-07-07 23:25:23 +00:00
Juergen Ributzka 665ea71fcd [FastISel][X86] Fix smul.with.overflow.i8 lowering.
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.

This fixes <rdar://problem/17549300>

llvm-svn: 212493
2014-07-07 21:52:21 +00:00
Louis Gerbarg 4c5b4054b2 Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:

  typedef unsigned int uint128_t __attribute__((mode(TI)));
  typedef int sint128_t __attribute__((mode(TI)));

This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.

Tests included.

rdar://17516686

llvm-svn: 212492
2014-07-07 21:37:51 +00:00
Renato Golin 1e9c282cd1 Refactor ARM subarchitecture parsing
According to a FIXME in ARMMCTargetDesc.cpp the ARM version parsing should be
in the Triple helper class.

Patch by: Gabor Ballabas

llvm-svn: 212479
2014-07-07 20:01:11 +00:00
Ulrich Weigand de8641bfde [PowerPC] Fix no-assert build
r212476 caused a compile failure (unused variable) in a non-assertion
build ...

llvm-svn: 212477
2014-07-07 19:39:44 +00:00
Ulrich Weigand ec2bf93895 [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]

llvm-svn: 212476
2014-07-07 19:26:41 +00:00
Chandler Carruth beeacac0b3 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

llvm-svn: 212475
2014-07-07 19:03:32 +00:00
Matt Arsenault d2c9e08b63 R600: Fix mishandling of load / store chains.
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.

llvm-svn: 212473
2014-07-07 18:34:45 +00:00
Matt Arsenault fda9dad17f Fix typo, weird indentation
llvm-svn: 212472
2014-07-07 18:34:42 +00:00
Tim Northover 3705283b24 X86: revert unintentional change to X86FastISel.
This crept in with r212443.

llvm-svn: 212459
2014-07-07 14:06:42 +00:00
Evgeniy Stepanov 6fa6c677cc [asan] Generate asm instrumentation in MC.
Generate entire ASan asm instrumentation in MC without
relying on runtime helper functions.

Patch by Yuri Gorshenin.

llvm-svn: 212455
2014-07-07 13:57:37 +00:00
Chandler Carruth 0dcb366268 [x86] Teach the new vector shuffle lowering code to handle what is
essentially a DAG combine that never gets a chance to run.

We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.

This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.

llvm-svn: 212444
2014-07-07 09:06:58 +00:00
Tim Northover 55beb64bd0 CodeGen: it turns out that NAND is not the same thing as BIC. At all.
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.

So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.

llvm-svn: 212443
2014-07-07 09:06:35 +00:00
Saleem Abdulrasool 763f9a50a5 ARM: properly lower dllimport'ed global values
This completes the handling for DLL import storage symbols when lowering
instructions.  A DLL import storage symbol must have an additional load
performed prior to use.  This is applicable to variables and functions.

This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering).  For a variable symbol, no such thunk can be
accommodated.

llvm-svn: 212431
2014-07-07 05:18:35 +00:00
Saleem Abdulrasool 220a044888 ARM: correctly mangle dllimport symbols
Add support for tracking DLLImport storage class information on a per symbol
basis in the ARM instruction selection.  Use that information to correctly
mangle the symbol (dllimport symbols are referenced via *__imp_<name>).

llvm-svn: 212430
2014-07-07 05:18:30 +00:00
Saleem Abdulrasool 1eb4a28b44 ARM: unify symbol name retrieval
Ensure that all paths that retrieve the symbol name go through GetARMGVSymbol
rather than getSymbol.  This is desirable so that any global symbol mangling can
be centralised to this function.  The motivation for this is handling of symbols
that are marked as having dll import dll storage.  Such a symbol requires an
extra load that is currently handled in the backend and a __imp_ prefix on the
symbol name.

llvm-svn: 212429
2014-07-07 05:18:22 +00:00
Kevin Qin 4473c1943f [AArch64] Normalize all constants to build a vector.
The value of constant operands will be truncated to fit element width.

llvm-svn: 212428
2014-07-07 02:45:40 +00:00
Saleem Abdulrasool 97255a017b AArch64: whitespace cleanup
llvm-svn: 212420
2014-07-06 22:13:26 +00:00
Matt Arsenault 4261973548 Use cast<> instead of dyn_cast + assert
llvm-svn: 212380
2014-07-05 21:16:43 +00:00
Matt Arsenault 258c6e7cd9 Fix grammar
llvm-svn: 212379
2014-07-05 21:16:40 +00:00
Ehsan Akhgari 4103da6bfb Add support for parsing the not operator in Microsoft inline assembly
This fixes http://llvm.org/PR20202

llvm-svn: 212352
2014-07-04 19:13:05 +00:00
Daniel Sanders 950f48d3c7 [mips][mips64r6] Set ELF e_flags for MIPS32r6/MIPS64r6. Also do MIPS-I to MIPS-V
Differential Revision: http://reviews.llvm.org/D4386

llvm-svn: 212346
2014-07-04 15:21:53 +00:00
Tim Northover 1bc367a41b ARM: when falling back to scattered relocs, keep the type.
The linker relies on relocation type info (e.g. is it a branch?) to perform the
correct actions, so we should keep that even when we end up using a scattered
relocation for whatever reason.

rdar://problem/17553104

llvm-svn: 212333
2014-07-04 10:58:05 +00:00
Daniel Sanders 2e03d66453 [mips][mips64r6] Correct the encoding of dmuh, dmuhu, dmul, and dmulu.
We have detected a documentation bug in the encoding tables of the released
MIPS64r6 specification that has resulted in the wrong encodings being used for
these instructions in LLVM. This commit corrects them.

llvm-svn: 212330
2014-07-04 10:08:27 +00:00
Chandler Carruth 5d79bb5d32 [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.

This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.

With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.

We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.

llvm-svn: 212324
2014-07-04 08:11:49 +00:00
Alexey Volkov 302309f39f [X86] Limit maximum nop length on Silvermont
Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes.
Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty.
Maximum nop length is limited to 7 bytes when used for padding on Silvermont.
For other x86 processors max nop length remains unchanged 15 bytes.

Differential Revision: http://reviews.llvm.org/D4374

llvm-svn: 212321
2014-07-04 07:14:56 +00:00
Robert Lytton 37d3fa7e36 XCore target: remove incorrect DebugLoc entries from prologue
Summary: This was causing the prologue_end to be incorrectly positioned.

Differential Revision: http://reviews.llvm.org/D4122

llvm-svn: 212318
2014-07-04 06:38:22 +00:00
Eric Christopher c1058df66f Move function dependent resetting of a subtarget variable out of the
subtarget. This involved having the movt predicate take the current
function - since we care about size in instruction selection for
whether or not to use movw/movt take the function so we can check
the attributes. This required adding the current MachineFunction to
FastISel and propagating through.

llvm-svn: 212309
2014-07-04 01:55:26 +00:00
Chandler Carruth 19cff8205e [x86] Clarify that this lowering only applies to vectors and is only
used when we have SSE2.

llvm-svn: 212300
2014-07-03 22:57:44 +00:00
Eric Christopher 2f991c9ee1 Remove caching of the target machine and initialization of the
subtarget from ARMISelDAGtoDAG. The former is unnecessary and the
latter is initialized on each runOnMachineFunction.

llvm-svn: 212297
2014-07-03 22:24:49 +00:00
Andrea Di Biagio c8e8bda58f [CostModel][x86] Improved cost model for alternate shuffles.
This patch:
 1) Improves the cost model for x86 alternate shuffles (originally
added at revision 211339);
 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles.

Alternate shuffles are a special kind of blend; on x86, we can often
easily lowered alternate shuffled into single blend
instruction (depending on the subtarget features).

The existing cost model didn't take into account subtarget features.
Also, it had a couple of "dead" entries for vector types that are never
legal (example: on x86 types v2i32 and v2f32 are not legal; those are
always either promoted or widened to 128-bit vector types).

The new x86 cost model takes into account what target features we have
before returning the shuffle cost (i.e. the number of instructions
after the blend is lowered/expanded).

This patch also teaches the Cost Model Analysis how to identify and analyze
alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions):
 - added function 'isAlternateVectorMask';
 - added some logic to check if an instruction is a alternate shuffle and, in
   case, call the target specific TTI to get the corresponding shuffle cost;
 - added a test to verify the cost model analysis on alternate shuffles.

llvm-svn: 212296
2014-07-03 22:24:18 +00:00
Andrea Di Biagio a37a2fc81f [X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes.
This patch adds tablegen patterns to select F16C float-to-half-float
conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes.

If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32'
are expanded into library calls.

llvm-svn: 212293
2014-07-03 21:51:06 +00:00
Yi Kong 93e52da641 [ARM] Implement ISB memory barrier intrinsic
Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions
modelling by adding has-side-effects property.

llvm-svn: 212276
2014-07-03 16:00:41 +00:00
Chandler Carruth 739b6ada99 [x86] Fix crashes in lowering bitcast instructions with the widening
mode.

This also runs the test in that mode which would reproduce the crash.
What I love is that *every single FIXME* in the test is addressed by
switching to widening.

llvm-svn: 212254
2014-07-03 03:43:47 +00:00
Chandler Carruth 49a8b10d82 [x86] Based on a long conversation between myself, Jim Grosbach, Hal
Finkel, Eric Christopher, and a bunch of other people I'm probably
forgetting (sorry), add an option to the x86 backend to widen vectors
during type legalization rather than promote them.

This still would promote vNi1 vectors to get the masks right, but would
widen other vectors. A lot of experiments are piling up right now
showing that widening should probably be the default legalization
strategy outside of vNi1 cases, but it is very hard to test the
rammifications of that and fix bugs in widening-based legalization
without an option that enables it. I'll be checking in tests shortly
that use this option to exercise cases where widening doesn't work well
and hopefully we'll be able to switch fully to this soon.

llvm-svn: 212249
2014-07-03 02:11:29 +00:00
Eric Christopher f204208e4f Make these preprocessor directives match all of the others in the port.
llvm-svn: 212245
2014-07-03 00:44:31 +00:00
Eric Christopher ad4de684ea Remove dead code.
llvm-svn: 212244
2014-07-03 00:44:28 +00:00
Chandler Carruth 9d010fffe1 [codegen,aarch64] Add a target hook to the code generator to control
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.

This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.

This also provides a foundation for other targets to have more granular
control over how vector types are legalized.

Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]

http://reviews.llvm.org/D4322

llvm-svn: 212242
2014-07-03 00:23:43 +00:00
Eric Christopher daa9dbbbd5 Move subtarget dependent features into the subtarget from the target
machine. Includes a fix for a subtarget initialization for
hard floating point on mips16.

llvm-svn: 212240
2014-07-03 00:10:24 +00:00
Eric Christopher 4cdb3f9b6a So that we can include frame lowering in the subtarget, remove include
circular dependency with the subtarget by inlining accessor methods and
outlining a routine.

llvm-svn: 212236
2014-07-02 23:29:55 +00:00
Eric Christopher bf33a3cf70 So that we can include target lowering in the subtarget, remove include
circular dependency with the subtarget by inlining accessor methods and
outlining a routine.

llvm-svn: 212234
2014-07-02 23:18:40 +00:00
Eric Christopher 0eaa541ea5 Fix typos.
llvm-svn: 212228
2014-07-02 22:05:40 +00:00
Eric Christopher 5f9fd210b3 Move the data layout and selection dag info from the mips target machine
down to the subtarget.

llvm-svn: 212224
2014-07-02 21:29:23 +00:00
Adam Nemet 11dd5cf9f1 [X86] AVX512: Allow writemask argument in vpermt* intrinsics
llvm-svn: 212223
2014-07-02 21:26:01 +00:00
Adam Nemet efe9c98a16 [X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclass
This new multiclass, avx512_perm_table_3src derives from the current one and
provides the Pat<>.  The next patch will add another Pat<> that uses the
writemask.

Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32
VR512:$src1) -> R512:$src1.  I think that this should be fine (at least many
intrinsic calls don't provide them) and it greatly reduces the number of
template arguments.

llvm-svn: 212222
2014-07-02 21:25:58 +00:00
Adam Nemet 2415a497b5 [X86] AVX512: Add writemask variants for vperm*2*
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).

<rdar://problem/17492620>

llvm-svn: 212221
2014-07-02 21:25:54 +00:00
Tom Stellard e9219e0026 R600: Add a comment that llvm.AMDGPU.trunc is a legacy intrinsic
llvm-svn: 212218
2014-07-02 20:53:57 +00:00
Tom Stellard 7c1838d797 R600/SI: Use a ComplexPattern for ADDR64 addressing of MUBUF loads
llvm-svn: 212217
2014-07-02 20:53:56 +00:00
Tom Stellard 10ae6a0e6a R600: Promote i64 loads to v2i32
llvm-svn: 212216
2014-07-02 20:53:54 +00:00
Tom Stellard b2de94e0c6 R600/SI: Adjsut SGPR live ranges before register allocation
SGPRs are written by instructions that sometimes will ignore control flow,
which means if you have code like:

if (VGPR0) {
  SGPR0 = S_MOV_B32 0
} else {
  SGPR0 = S_MOV_B32 1
}

The value of SGPR0 will 1 no matter what the condition is.

In order to deal with this situation correctly, we need to view the
program as if it were a single basic block when we calculate the
live ranges for the SGPRs.  They way we actually update the live
range is by iterating over all of the segments in each LiveRange
object and setting the end of each segment equal to the start of
the next segment.  So a live range like:

[3888r,9312r:0)[10032B,10384B:0)  0@3888r

will become:

[3888r,10032B:0)[10032B,10384B:0)  0@3888r

This change will allow us to use SALU instructions within branches.

llvm-svn: 212215
2014-07-02 20:53:48 +00:00
Tom Stellard a305f93d81 R600/SI: Add verifier check for immediates in register operands.
llvm-svn: 212214
2014-07-02 20:53:44 +00:00
Quentin Colombet 5caa6a2da1 [RegAllocGreedy] Provide a subtarget hook to disable the local reassignment
heuristic.
By default, no functionality change.
This is a follow-up of r212099.

This hook provides a finer grain to control the optimization.

<rdar://problem/17444599>

llvm-svn: 212204
2014-07-02 18:32:04 +00:00
Duncan P. N. Exon Smith de58870394 AArch64: Re-enable AArch64AddressTypePromotion
This reverts commits r212189 and r212190.

While this pass was accidentally disabled (until r212073), r205437
slipped in a use of `auto` that should have been `auto&`.

This fixes PR20188.

llvm-svn: 212201
2014-07-02 18:17:40 +00:00
Duncan P. N. Exon Smith 0945abc142 AArch64: Remove unnecessary parens
llvm-svn: 212199
2014-07-02 18:14:03 +00:00
Matt Arsenault c324b95c77 R600: Fix crashes when an illegal type load or store is not handled.
I don't think anything hits this now, but will be exposed in future
patches.

llvm-svn: 212197
2014-07-02 17:44:53 +00:00
Duncan P. N. Exon Smith c4db656221 AArch64: Merge isa with dyn_cast
llvm-svn: 212194
2014-07-02 17:26:39 +00:00
Duncan P. N. Exon Smith 6d1fc66e9b AArch64: Temporarily disable AArch64AddressTypePromotion
Temporarily disable AArch64AddressTypePromotion, which was effectively
re-enabled in r212073 and r212075, while I look into PR20188.

llvm-svn: 212189
2014-07-02 17:03:16 +00:00
Benjamin Kramer e739cf3eb5 X86: When combining shuffles just remove shuffles that are completely redundant.
CombineTo doesn't allow replacing a node with itself so this would crash if the
combined shuffle is the same as the input shuffle.

llvm-svn: 212181
2014-07-02 15:09:44 +00:00
Elena Demikhovsky 678bd5ba4a AVX-512: dec/inc instructions are slow on KNL
After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC.
Added a test.

llvm-svn: 212178
2014-07-02 14:11:05 +00:00
Saleem Abdulrasool 2e09c514c0 aarch64: support target-specific .req assembler directive
Based on the support for .req on ARM. The aarch64 variant has to keep track if
the alias register was a vector register (v0-31) or a general purpose or
VFP/Advanced SIMD ([bhsdq]0-31) register.

Patch by Janne Grunau!

llvm-svn: 212161
2014-07-02 04:50:23 +00:00
Eric Christopher 5b336a242c Break out subtarget initialization that dependent variables need into
a separate function and clean up calling convention for helper function.

llvm-svn: 212153
2014-07-02 01:14:43 +00:00
Eric Christopher a4d901f599 Unify these two lines.
llvm-svn: 212152
2014-07-02 01:02:28 +00:00
Eric Christopher 1f51ddda98 Move MipsJITInfo to the subtarget rather than the target machine.
llvm-svn: 212151
2014-07-02 00:54:12 +00:00
Eric Christopher 404c94c0fc Remove unnecessary include.
llvm-svn: 212150
2014-07-02 00:54:10 +00:00
Eric Christopher 4407ddefd0 Remove the cached InstrItineraryData on the TargetMachine, it's unnecessary.
llvm-svn: 212149
2014-07-02 00:54:07 +00:00
Eric Christopher 1b1cef2323 Move the subtarget dependent features from XCoreTargetMachine
down to the subtarget.

llvm-svn: 212147
2014-07-02 00:10:09 +00:00
Eric Christopher 99a5ba8b20 Make XCoreSelectionDAGInfo take a DataLayout since it only needs
that information.

llvm-svn: 212146
2014-07-02 00:10:05 +00:00
Tim Northover 334d8eebe5 X86: remove atomic instructions *after* we've iterated through them.
Otherwise they get freed and the implicit "isa<XYZ>" tests following
turn out badly (at least under sanitizers).

Also corrects the ordering of unordered atomic stores.

llvm-svn: 212136
2014-07-01 22:10:30 +00:00
Juergen Ributzka 3bd03c7099 [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

llvm-svn: 212135
2014-07-01 22:01:54 +00:00
Tim Northover df58625e3c X86: delegate expanding atomic libcalls to generic code.
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.

X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.

This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.

llvm-svn: 212134
2014-07-01 21:44:59 +00:00
Eric Christopher 5234995e80 Move the subtarget dependent features from SystemZTargetMachine
down to the subtarget. Add an initialization routine to assist.

llvm-svn: 212124
2014-07-01 20:19:02 +00:00
Eric Christopher f1bd22dfa4 Remove the use and initialization of the target machine and subtarget
from SystemZFrameLowering.

llvm-svn: 212123
2014-07-01 20:18:59 +00:00
Tim Northover 21feb2e1d2 AArch64: fix comment typo
llvm-svn: 212120
2014-07-01 19:47:09 +00:00
Tim Northover 277066ab43 X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

llvm-svn: 212119
2014-07-01 18:53:31 +00:00
Adam Nemet 16de2486cb [X86] AVX512: Allow writemasks with vpcmp
For now I only updated the _alt variants.  The main variants are used by
codegen and that will need a bit more work to trigger.

<rdar://problem/17492620>

llvm-svn: 212114
2014-07-01 18:03:45 +00:00
Adam Nemet 1efcb90fcd [X86] AVX512: Factor generating the AsmString into avx512_icmp_cc
Adding a writemask variant would require a third asm string to be passed to
the template.  Generate the AsmString in the template instead.

No change in X86.td.expanded.

llvm-svn: 212113
2014-07-01 18:03:43 +00:00
Reid Kleckner b5dd9452b4 Fix .seh_stackalloc 0
seh_stackalloc 0 is not representable in Win64 SEH info, so emitting it
is a bug.

Reviewers: rnk

Differential Revision: http://reviews.llvm.org/D4334

Patch by Vadim Chugunov!

llvm-svn: 212081
2014-07-01 00:42:47 +00:00
Duncan P. N. Exon Smith 4f5e9f8207 AArch64: Follow-up to r212073
In r212073 I missed a call of `use_begin()` that assumed the wrong
semantics.  It's not clear to me at all what this code does without the
fix, so I'm not sure how to write a testcase.

llvm-svn: 212075
2014-07-01 00:05:37 +00:00
Duncan P. N. Exon Smith 7d7ae93139 AArch64: Actually do address type promotion
AArch64AddressTypePromotion was doing nothing because it was using the
old semantics of `Use` and `uses()`, when it really wanted to get at the
`users()`.

llvm-svn: 212073
2014-06-30 23:42:14 +00:00
Alp Toker cf21875d41 Fix 'platform-specific' hyphenations
llvm-svn: 212056
2014-06-30 18:57:16 +00:00
Matt Arsenault d0e0f0aea0 R600: Move mul combine to separate function
llvm-svn: 212052
2014-06-30 17:55:48 +00:00
Matt Arsenault 5d293eefda R600: Remove unused declarations leftover from AMDIL
llvm-svn: 212051
2014-06-30 17:37:17 +00:00