Commit Graph

9268 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen 7fde8c4ef4 Annotate shifts and rotates with SchedRW lists.
llvm-svn: 177935
2013-03-25 23:07:32 +00:00
NAKAMURA Takumi f87f70a72d X86DisassemblerDecoder.c: Make this C89-compliant.
llvm-svn: 177910
2013-03-25 20:55:49 +00:00
NAKAMURA Takumi dde7fa838e Whitespace.
llvm-svn: 177909
2013-03-25 20:55:43 +00:00
Dave Zarzycki 656e8515fc x86 -- add the XTEST instruction
llvm-svn: 177888
2013-03-25 18:59:43 +00:00
Dave Zarzycki 07fabeecad x86 -- disassemble the REP/REPNE prefix when needed
This fixes Apple bug: 13493622

llvm-svn: 177887
2013-03-25 18:59:38 +00:00
Jakob Stoklund Olesen 5891cf9788 Add a WriteMicrocoded for ancient microcoded instructions.
llvm-svn: 177611
2013-03-21 00:07:17 +00:00
Jakob Stoklund Olesen 712f674880 Model prefetches and barriers as loads.
It's not yet clear if these instructions need a more careful model.

llvm-svn: 177599
2013-03-20 23:09:53 +00:00
Jakob Stoklund Olesen 5b535c965e Add a catch-all WriteSystem SchedWrite type.
This is used for all the expensive system instructions.

llvm-svn: 177598
2013-03-20 23:09:50 +00:00
Jakob Stoklund Olesen cd4ebb7639 Annotate the remaining SSE MOV instructions.
llvm-svn: 177592
2013-03-20 22:37:16 +00:00
Jakob Stoklund Olesen c6dc70d865 Annotate SSE horizontal and integer instructions.
llvm-svn: 177591
2013-03-20 22:37:13 +00:00
Michael Liao 70dd7f999d Correct cost model for vector shift on AVX2
- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.

llvm-svn: 177586
2013-03-20 22:01:10 +00:00
Jakob Stoklund Olesen 7a8bb72a3a Add some missing SSE annotations.
llvm-svn: 177540
2013-03-20 16:56:39 +00:00
Jakob Stoklund Olesen 50bd713b5e Annotate remaining IIC_BIN_* instructions.
llvm-svn: 177539
2013-03-20 16:56:36 +00:00
Michael Liao 0f4ea0c4a9 Fix PR15296
- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
  to support extended 256-bit integer in AVX but not AVX2.

llvm-svn: 177478
2013-03-20 02:33:21 +00:00
Michael Liao 5a4e81d2e8 Mark all variable shifts needing customizing
- Prepare moving logic from DAG combining into DAG lowering. There's no
  functionality change.

llvm-svn: 177477
2013-03-20 02:28:20 +00:00
Michael Liao 48e8a3727c Move scalar immediate shift lowering into a dedicated func
- no functionality change

llvm-svn: 177476
2013-03-20 02:20:36 +00:00
Jakob Stoklund Olesen 3a546156c7 Annotate various null idioms with SchedRW lists.
llvm-svn: 177461
2013-03-19 23:23:31 +00:00
Jakob Stoklund Olesen 24aac1dc92 Annotate SSE float conversions with SchedRW lists.
llvm-svn: 177460
2013-03-19 23:23:29 +00:00
Jakob Stoklund Olesen 050fa62fd6 Annotate X86InstrCMovSetCC.td with SchedRW lists.
llvm-svn: 177459
2013-03-19 23:23:26 +00:00
Chad Rosier f3c04f6a9f [ms-inline asm] Move the immediate asm rewrite into the target specific
logic as a QOI cleanup.  No functional change.  Tests already in place.
rdar://13456414

llvm-svn: 177446
2013-03-19 21:58:18 +00:00
Jakob Stoklund Olesen 9bd6b8bd96 Annotate X86InstrCompiler.td with SchedRW lists.
Add a new WriteZero SchedWrite type for the common dependency-breaking
instructions that clear a register.

llvm-svn: 177442
2013-03-19 21:16:56 +00:00
Chad Rosier 7ca135b25f [ms-inline asm] Create a helper function, CreateMemForInlineAsm, that creates
an X86Operand, but also performs a Sema lookup and adds the sizing directive
when appropriate.  Use this when parsing a bracketed statement.  This is
necessary to get the instruction matching correct as well.  Test case coming
on clang side.
rdar://13455408

llvm-svn: 177439
2013-03-19 21:11:56 +00:00
Ulrich Weigand 80d9ad398d Remove an invalid and unnecessary Pat pattern from the X86 backend:
def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
            (MOV64rm tglobaltlsaddr :$dst)>;

This pattern is invalid because the MOV64rm instruction expects a
source operand of type "i64mem", which is a subclass of X86MemOperand
and thus actually consists of five MI operands, but the Pat provides
only a single MI operand ("tglobaltlsaddr" matches an SDnode of
type ISD::TargetGlobalTLSAddress and provides a single output).

Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
instruction pattern would access uninitialized memory.  In addition,
with the TableGen patch I'm about to check in, this would actually be
reported as a build-time error.

Fortunately, the pattern does in fact never match, for at least two
independent reasons.

First, the code generator actually never generates a pattern of the
form (load (X86Wrapper (tglobaltlsaddr))).  For most combinations of
TLS and code models, (tglobaltlsaddr) represents just an offset that
needs to be added to some base register, so it is never directly
dereferenced.  The only exception is the initial-exec model, where
(tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
which *is* in fact directly dereferenced: but in that case, the
X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.

Second, even if some patterns along those lines *were* ever generated,
we should not need an extra Pat pattern to match it.  Instead, the
original MOV64rm instruction pattern ought to match directly, since
it uses an "addr" operand, which is implemented via the SelectAddr
C++ routine; this routine is supposed to accept the full range of
input DAGs that may be implemented by a single mov instruction,
including those cases involving ISD::TargetGlobalTLSAddress (and
actually does so e.g. in the initial-exec case as above).

To avoid build breaks (due to the above-mentioned error) after the
TableGen patch is checked in, I'm removing this Pat here.

llvm-svn: 177426
2013-03-19 19:49:52 +00:00
Nadav Rotem 0f1bc60d51 Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>

llvm-svn: 177421
2013-03-19 18:38:27 +00:00
Jakob Stoklund Olesen af39940bad Annotate X86InstrExtension.td with SchedRW lists.
llvm-svn: 177418
2013-03-19 18:03:58 +00:00
Jakob Stoklund Olesen caf3d89ff5 Annotate a lot of X86InstrInfo.td with SchedRW lists.
llvm-svn: 177417
2013-03-19 18:03:55 +00:00
Chad Rosier 120eefd104 [ms-inline asm] Move the size directive asm rewrite into the target specific
logic as a QOI cleanup.
rdar://13445327

llvm-svn: 177413
2013-03-19 17:32:17 +00:00
Chad Rosier 2707d534c1 [ms-inline asm] Avoid emitting a redundant sizing directive, if we've already
parsed one.  Test case coming shortly.
rdar://13446980

llvm-svn: 177347
2013-03-18 23:31:24 +00:00
Jakob Stoklund Olesen a5158c8f0a Add SchedRW annotations to most of X86InstrSSE.td.
We hitch a ride with the existing OpndItins class that was used to add
instruction itinerary classes in the many multiclasses in this file.

Use the link provided by the X86FoldableSchedWrite.Folded to find the
right SchedWrite for folded loads.

llvm-svn: 177326
2013-03-18 22:01:35 +00:00
Jakob Stoklund Olesen e2289b78df Annotate X86 arithmetic instructions with SchedRW lists.
This new-style scheduling information is going to replace the
instruction iteneraries.

This also serves as a test case for Andy's fix in r177317.

llvm-svn: 177323
2013-03-18 21:32:39 +00:00
Anton Korobeynikov 3e7005f1c1 TLS support for MinGW targets.
MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available.

Patch by David Nadlinger!

llvm-svn: 177257
2013-03-18 08:12:28 +00:00
Craig Topper 0498b88d48 Post process ADC/SBB and use a shorter encoding if they use a sign extended immediate.
llvm-svn: 177243
2013-03-18 03:34:55 +00:00
Craig Topper 7e9a1cb199 Refactor some duplicated code into helper functions.
llvm-svn: 177242
2013-03-18 02:53:34 +00:00
Craig Topper 612f7bfa4d Add X86 code emitter support AVX encoded MRMDestReg instructions.
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao.

llvm-svn: 177221
2013-03-16 03:44:31 +00:00
Jakob Stoklund Olesen 63bff2eb39 Define more SchedWrites for annotating X86 instructions.
Since almost all X86 instructions can fold loads, use a multiclass to
define register/memory pairs of SchedWrites.

An X86FoldableSchedWrite represents the register version of an
instruction. It holds a reference to the SchedWrite to use when the
instruction folds a load.

This will be used inside multiclasses that define rr and rm instruction
versions together.

llvm-svn: 177210
2013-03-16 00:02:17 +00:00
Eric Christopher 8996c5d469 Silence anonymous type in anonymous union warnings.
llvm-svn: 177135
2013-03-15 00:42:55 +00:00
Nadav Rotem adfa5eaf8c Unaligned loads should use the VMOVUPS opcode.
llvm-svn: 177130
2013-03-14 23:49:44 +00:00
Jakob Stoklund Olesen 712366821a Prepare for adding InstrSchedModel annotations to X86 instructions.
The new InstrSchedModel is easier to use than the instruction
itineraries. It will be used to model instruction latency and throughput
in modern Intel microarchitectures like Sandy Bridge.

InstrSchedModel should be able to coexist with instruction itinerary
classes, but for cleanliness we should switch the Atom processor model
to the new InstrSchedModel as well.

llvm-svn: 177122
2013-03-14 22:42:17 +00:00
Chad Rosier 4b54f594b4 [fast-isel] The X86FastISel::FastLowerArguments function doesn't properly handle
the win64 calling convention.
rdar://13423768

llvm-svn: 177113
2013-03-14 21:25:04 +00:00
Craig Topper ba82429826 Fix the name of a variable to match its declaration. Fixes build failure from r177014.
llvm-svn: 177015
2013-03-14 07:47:43 +00:00
Craig Topper 872999737d Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.
llvm-svn: 177014
2013-03-14 07:40:52 +00:00
Craig Topper a66d81d521 Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.
llvm-svn: 177011
2013-03-14 07:09:57 +00:00
Michael Liao 20d287044c Fix PR15309
- Fix the typo on type checking

llvm-svn: 177010
2013-03-14 06:57:42 +00:00
Kevin Enderby f15856ebb4 Fixes disassembler crashes on 2013 Haswell RTM instructions.
rdar://13318048

llvm-svn: 176828
2013-03-11 21:17:13 +00:00
Tom Stellard b1588fc057 DAGCombiner: Use correct value type for checking legality of BR_CC v3
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694
2013-03-08 15:36:57 +00:00
Benjamin Kramer 2c3d0df8ee X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR.
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634
2013-03-07 18:48:40 +00:00
Michael Liao d5cac37dc5 Fix two remaining issue after fixing PR15355 when CMOV is not available
- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598
2013-03-07 01:01:29 +00:00
Michael Liao da22b30be5 Fix PR15355
- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538
2013-03-06 00:17:04 +00:00
David Sehr 4c8979cd4d The current X86 NOP padding uses one long NOP followed by the remainder in
one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464
2013-03-05 00:02:23 +00:00
Preston Gurd 485296d1e8 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Michael Liao 6af16fc3b7 Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Duncan Sands 2cb41d372c GCC thinks that this variable might be used uninitialized (it isn't).
llvm-svn: 176341
2013-03-01 09:46:03 +00:00
Yiannis Tsiouris d4842e5ee9 Re-format comments (and check commit access)
llvm-svn: 176270
2013-02-28 16:59:10 +00:00
Nadav Rotem 08ab877cc7 Revert r176166 because it broke one of the lit tests.
llvm-svn: 176171
2013-02-27 05:56:20 +00:00
Nadav Rotem 85e1211fbf std::string to StringRef.
llvm-svn: 176166
2013-02-27 05:23:56 +00:00
Chad Rosier 1b33e8d63e [fast-isel] Make sure the FastLowerArguments function checks to make sure the
arguments type is a simple type.
rdar://13290455

llvm-svn: 176066
2013-02-26 01:05:31 +00:00
Michael Liao 609a527286 Refine fix to PR10499, no functionality change
- Put expensive checking after simple one

llvm-svn: 176060
2013-02-25 23:16:36 +00:00
Michael Liao ab97668061 Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Chad Rosier a92ef4ba5b [fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052
2013-02-25 21:59:35 +00:00
Chad Rosier 669bb3ee77 [ms-inline asm] Add support for the pushad/popad mnemonics.
rdar://13254235

llvm-svn: 176036
2013-02-25 19:06:27 +00:00
Nadav Rotem b532fca92c Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Benjamin Kramer ee23dcb461 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Peter Collingbourne 7b57621fb3 x86_64: designate most general purpose and SSE registers as callee save under coldcc
llvm-svn: 175911
2013-02-22 19:19:44 +00:00
Eli Bendersky 8da87163ca Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Eli Bendersky e93249befa getX86SubSuperRegister has a special mode with High=true for i64 which
exists solely to enable it to call itself for i8 with some registers.
The proposed patch simplifies the function somewhat to make the High
bit only meaningful for the i8 mode, which makes sense. No functional
difference (getX86SubSuperRegister is not getting called from anywhere
outside with i64 and High=true).

llvm-svn: 175762
2013-02-21 16:40:18 +00:00
Jim Grosbach d2037eb1ee MCParser: Update method names per coding guidelines.
s/AddDirectiveHandler/addDirectiveHandler/
s/ParseMSInlineAsm/parseMSInlineAsm/
s/ParseIdentifier/parseIdentifier/
s/ParseStringToEndOfStatement/parseStringToEndOfStatement/
s/ParseEscapedString/parseEscapedString/
s/EatToEndOfStatement/eatToEndOfStatement/
s/ParseExpression/parseExpression/
s/ParseParenExpression/parseParenExpression/
s/ParseAbsoluteExpression/parseAbsoluteExpression/
s/CheckForValidSection/checkForValidSection/

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

No functional change intended.

llvm-svn: 175675
2013-02-20 22:21:35 +00:00
Jim Grosbach 341ad3e72a Update TargetLowering ivars for name policy.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

ivars should be camel-case and start with an upper-case letter. A few in
TargetLowering were starting with a lower-case letter.

No functional change intended.

llvm-svn: 175667
2013-02-20 21:13:59 +00:00
Chad Rosier a018cfd10c [ms-inline asm] Make the comment a bit more verbose.
llvm-svn: 175641
2013-02-20 18:03:44 +00:00
Elena Demikhovsky 0ccdd1315b I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619
2013-02-20 12:42:54 +00:00
Chad Rosier 45a52fa097 [ms-inline asm] Force the use of a base pointer if the MachineFunction includes
MS-style inline assembly.

This is a follow-on to r175334.  Forcing a FP to be emitted doesn't ensure it
will be used.  Therefore, force the base pointer as well.  We now treat MS
inline assembly in the same way we treat functions with dynamic stack
realignment and VLAs.  This guarantees the BP will be used to reference 
parameters and locals.
rdar://13218191

llvm-svn: 175576
2013-02-19 23:50:45 +00:00
Jakub Staszak e167cf5c4d Add obvious constantness.
llvm-svn: 175560
2013-02-19 21:54:59 +00:00
Benjamin Kramer 1cb826b0ad Clean up HiPE prologue emission a bit and avoid signed arithmetic tricks.
No intended functionality change.

llvm-svn: 175536
2013-02-19 17:32:57 +00:00
Rafael Espindola 1c040b5788 Move LLVM_LIBRARY_VISIBILITY for consistency with what was done to
PPCJITInfo.cpp in r175394.

llvm-svn: 175531
2013-02-19 17:14:33 +00:00
Eli Bendersky b0b13b22a3 Make pass name more precise and fix comment.
llvm-svn: 175525
2013-02-19 16:38:32 +00:00
Craig Topper f371e89264 Fix capitalization in comment to match function name.
llvm-svn: 175497
2013-02-19 07:43:59 +00:00
Jakub Staszak 1f199a0ef2 Use array_pod_sort instead of std::sort.
llvm-svn: 175472
2013-02-18 23:18:22 +00:00
NAKAMURA Takumi 3a8002f61d X86FrameLowering.cpp: Fixup. Sorry for the breakage.
llvm-svn: 175467
2013-02-18 23:15:21 +00:00
NAKAMURA Takumi a614ec7e6f X86FrameLowering.cpp: Fix a warning in -Asserts. [-Wunused-variable]
llvm-svn: 175464
2013-02-18 23:08:49 +00:00
Chad Rosier 441e81287f Remove a useless assert.
llvm-svn: 175463
2013-02-18 22:20:16 +00:00
Benjamin Kramer 5c6e653b72 Fix a 32/64 bit incompatibility in the HiPE prologue generation.
llvm-svn: 175458
2013-02-18 21:45:01 +00:00
Benjamin Kramer 53bc37ca2a Support for HiPE-compatible code emission, patch by Yiannis Tsiouris.
llvm-svn: 175457
2013-02-18 20:55:12 +00:00
Benjamin Kramer 189fc5819a X86: Add a note.
llvm-svn: 175408
2013-02-17 23:34:14 +00:00
Jakub Staszak 74010cd9c2 Return false instead of 0.
llvm-svn: 175402
2013-02-17 18:35:25 +00:00
NAKAMURA Takumi f86d12caf0 [msvc x64] Update X86CompilationCallback_Win64.asm corresponding to r175267.
llvm-svn: 175363
2013-02-16 16:04:29 +00:00
Jakub Staszak d784d96074 Minor cleanups. No functionality change.
llvm-svn: 175359
2013-02-16 13:34:26 +00:00
Bill Wendling 61375d8953 Reinitialize the ivars in the subtarget so that they can be reset with the new features.
llvm-svn: 175336
2013-02-16 01:36:26 +00:00
Chad Rosier 925c9b499e [ms-inline asm] Do not omit the frame pointer if we have ms-inline assembly.
If the frame pointer is omitted, and any stack changes occur in the inline
assembly, e.g.: "pusha", then any C local variable or C argument references
will be incorrect.  

I pass no judgement on anyone who would do such a thing. ;)
rdar://13218191

llvm-svn: 175334
2013-02-16 01:25:28 +00:00
Bill Wendling e9434778f7 Temporary revert of 175320.
llvm-svn: 175322
2013-02-15 23:22:32 +00:00
Bill Wendling a060d0efd8 Reinitialize the ivars in the subtarget.
When we're recalculating the feature set of the subtarget, we need to have the
ivars in their initial state.

llvm-svn: 175320
2013-02-15 23:18:01 +00:00
Bill Wendling aef9c37c65 Use the 'target-features' and 'target-cpu' attributes to reset the subtarget features.
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.

llvm-svn: 175314
2013-02-15 22:31:27 +00:00
Chad Rosier a915bbf4a1 [ms-inline asm] Adjust the EndLoc to account for the ']'.
llvm-svn: 175312
2013-02-15 21:58:13 +00:00
Rafael Espindola 91cbcbb909 Give these callbacks hidden visibility. It is better to not export them more
than we need to and some ELF linkers complain about directly accessing symbols
with default visibility.

llvm-svn: 175268
2013-02-15 14:15:59 +00:00
Rafael Espindola 9b7d4004bc Don't make assumptions about the mangling of static functions in extern "C"
blocks. We still don't have consensus if we should try to change clang or
the standard, but llvm should work with compilers that implement the current
standard and mangle those functions.

llvm-svn: 175267
2013-02-15 14:08:43 +00:00
Benjamin Kramer 6ecb1e78a9 Make helpers static. Add missing include so LLVMInitializeObjCARCOpts gets C linkage.
llvm-svn: 175264
2013-02-15 12:30:38 +00:00
Eli Bendersky a1c6635ca3 The operand listing is very much outdated.
llvm-svn: 175220
2013-02-14 23:17:03 +00:00
Jakub Staszak 701cc97e92 Simplify code. Remove "else after return".
llvm-svn: 175212
2013-02-14 21:50:09 +00:00
Kay Tiong Khoo f809c6491d added basic support for Intel ADX instructions
-feature flag, instructions definitions, test cases

llvm-svn: 175196
2013-02-14 19:08:21 +00:00
Nadav Rotem accb0c747c 80-col
llvm-svn: 175189
2013-02-14 18:20:48 +00:00
Elena Demikhovsky d0a0cc80cd Fixed a bug in X86TargetLowering::LowerVectorIntExtend() (assertion failure).
Added a test.

llvm-svn: 175144
2013-02-14 08:20:26 +00:00
Rafael Espindola 8868faac14 Revert r175120 and r175121. Clang is producing the expected asm names again.
llvm-svn: 175133
2013-02-14 03:33:34 +00:00
Rafael Espindola 3c818086f2 Don't assume the mangling of static functions.
llvm-svn: 175121
2013-02-14 02:49:18 +00:00
Nick Lewycky beba972659 Don't build tail calls to functions with three inreg arguments on x86-32 PIC.
Fixes PR15250!

llvm-svn: 175092
2013-02-13 21:59:15 +00:00
Chad Rosier 282edd7caa [ms-inline-asm] Add support for memory references that have non-immediate
displacements.
rdar://12974533

llvm-svn: 175083
2013-02-13 21:33:44 +00:00
Benjamin Kramer 8e2637e2b0 X86: Disable generation of rep;movsl when %esi is used as a base pointer.
This happens when there is both stack realignment and a dynamic alloca in the
function. If we overwrite %esi (rep;movsl uses fixed registers) we'll lose the
base pointer and the next register spill will write into oblivion.

Fixes PR15249 and unbreaks firefox on i386/freebsd. Mozilla uses dynamic allocas
and freebsd a 4 byte stack alignment.

llvm-svn: 175057
2013-02-13 13:40:35 +00:00
Elena Demikhovsky 9e0df7cb01 Prevent insertion of "vzeroupper" before call that preserves YMM registers, since a caller uses preserved registers across the call.
llvm-svn: 175043
2013-02-13 08:02:04 +00:00
Eric Christopher 389ee71b0a Check i1 as well as i8 variables for 8 bit registers for x86 inline
assembly.

llvm-svn: 175036
2013-02-13 06:01:05 +00:00
Kay Tiong Khoo ab588efe42 Added 0x0D to 2-byte opcode extension table for prefetch* variants
Fixed decode of existing 3dNow prefetchw instruction
Intel is scheduled to add a compatible prefetchw (same encoding) to future CPUs

llvm-svn: 174920
2013-02-12 00:19:12 +00:00
Kay Tiong Khoo d30b1a2ac7 *fixed disassembly of some i386 system insts with intel syntax
*added file for test cases for i386 intel syntax

llvm-svn: 174900
2013-02-11 19:46:36 +00:00
Eli Bendersky ef4558abd3 This is a follow-up on r174446, now taking Atom processors into
account. Atoms use LEA for updating SP in prologs/epilogs, and the
exact LEA opcode depends on the data model.

Also reapplying the test case which was added and then reverted
(because of Atom failures), this time specifying explicitly the CPU in
addition to the triple. The test case now checks all variations (data
mode, cpu Atom vs. Core).

llvm-svn: 174542
2013-02-06 20:43:57 +00:00
Eli Bendersky 44a40ca143 Make sure the correct opcodes are used to SUB and ADD the stack
pointer in function prologs/epilogs. The opcodes should depend on the
data model (LP64 vs. ILP32) rather than the architecture bit-ness.

llvm-svn: 174446
2013-02-05 21:53:29 +00:00
Jakob Stoklund Olesen dc69f6fbca Move MRI liveouts to X86 return instructions.
llvm-svn: 174402
2013-02-05 17:59:48 +00:00
Eli Bendersky 530a3bc5fa Fix comments
llvm-svn: 174390
2013-02-05 16:53:11 +00:00
Benjamin Kramer 2c9da989c2 X86: Open up some opportunities for constant folding by postponing shift lowering.
Fixes PR15141.

llvm-svn: 174327
2013-02-04 15:19:33 +00:00
Benjamin Kramer 0611298446 X86: Simplify code. No functionality change.
llvm-svn: 174326
2013-02-04 15:19:25 +00:00
Evgeniy Stepanov 1f5a71492d More MSan/ASan annotations.
This change lets us bootstrap LLVM/Clang under ASan and MSan. It contains
fixes for 2 issues:

- X86JIT reads return address from stack, which MSan does not know is
  initialized.
- bugpoint tests run binaries with RLIMIT_AS. This does not work with certain
  Sanitizers.

We are no longer including config.h in Compiler.h with this change.

llvm-svn: 174306
2013-02-04 07:03:24 +00:00
David Sehr 8114a7a651 Two changes relevant to LEA and x32:
1) allows the use of RIP-relative addressing in 32-bit LEA instructions under
   x86-64 (ILP32 and LP64)
2) separates the size of address registers in 64-bit LEA instructions from
   control by ILP32/LP64.

llvm-svn: 174208
2013-02-01 19:28:09 +00:00
Chad Rosier df782d2225 [PEI] Pass the frame index operand number to the eliminateFrameIndex function.
Each target implementation was needlessly recomputing the index.
Part of rdar://13076458

llvm-svn: 174083
2013-01-31 20:02:54 +00:00
Eric Christopher 258c867c0b Whitespace.
llvm-svn: 174009
2013-01-31 00:50:48 +00:00
Eric Christopher 4e3e94c13d Check and allow floating point registers to select the size of the
register for inline asm. This conforms to how gcc allows for effective
casting of inputs into gprs (fprs is already handled).

llvm-svn: 174008
2013-01-31 00:50:46 +00:00
Evan Cheng d2ca4e2ed9 Restrict sin/cos optimization to 64-bit only for now. 32-bit is a bit messy and less critical.
llvm-svn: 173987
2013-01-30 22:56:35 +00:00
Evan Cheng 27e41c9f70 Remove dead code.
llvm-svn: 173812
2013-01-29 18:08:22 +00:00
Hans Wennborg 5deecd9043 Fix typo in X86BaseInfo.h that I introduced in r157818.
llvm-svn: 173798
2013-01-29 14:05:57 +00:00
Craig Topper c048154b9b Merge SSE and AVX shuffle instructions in the comment printer.
llvm-svn: 173777
2013-01-29 07:54:31 +00:00
Evan Cheng 0e88c7d897 Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Craig Topper 5c683972bc Fix 256-bit PALIGNR comment decoding to understand that it works on independent 256-bit lanes.
llvm-svn: 173674
2013-01-28 07:41:18 +00:00
Craig Topper 71d99ffe4a Add missing break in 256-bit palignr comment printing. No test case yet because the comment itself is still wrong.
llvm-svn: 173669
2013-01-28 07:19:11 +00:00
Craig Topper 8fb09f0abb Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction.
llvm-svn: 173667
2013-01-28 06:48:25 +00:00
Benjamin Kramer 6a93596538 X86: Decode PALIGN operands so I don't have to do it in my head.
llvm-svn: 173572
2013-01-26 13:31:37 +00:00
Benjamin Kramer 99c68dd964 X86: Do splat promotion later, so the optimizer can chew on it first.
This catches many cases where we can emit a more efficient shuffle for a
specific mask or when the mask contains undefs. Once the splat is lowered to
unpacks we can't do that anymore.

There is a possibility of moving the promotion after pshufb matching, but I'm
not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so
I avoided that for now.

llvm-svn: 173569
2013-01-26 11:44:21 +00:00
Eli Bendersky 597fc1233a In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Renato Golin d4c392e6ff Moving Cost Tables up to share with other targets
llvm-svn: 173382
2013-01-24 23:01:00 +00:00
Michael Liao 3dffc5e2b7 Fix an issue of pseudo atomic instruction DAG schedule
- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200
2013-01-22 21:47:38 +00:00
Benjamin Kramer fee7d21ae7 X86: Make sure we account for the FMA4 register immediate value, otherwise rip-rel relocations will be off by one byte.
PR15040.

llvm-svn: 173176
2013-01-22 18:05:59 +00:00
Eli Bendersky 0893e1079d Initial patch for x32 ABI support.
Add the x32 environment kind to the triple, and separate the concept of
pointer size and callee save stack slot size, since they're not equal
on x32.

llvm-svn: 173175
2013-01-22 18:02:49 +00:00
Tim Northover 29178a348a Make APFloat constructor require explicit semantics.
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.

llvm-svn: 173138
2013-01-22 09:46:31 +00:00
Craig Topper 66163a35ee Use <0 checks in place of ==-1 because it results in simpler code.
llvm-svn: 173010
2013-01-21 07:25:16 +00:00
Craig Topper 9b29486f42 Use MVT instead of EVT in LowerVECTOR_SHUFFLEtoBlend.
llvm-svn: 173009
2013-01-21 07:19:54 +00:00
Craig Topper 32c5406dcf Remove trailing whitespace.
llvm-svn: 173008
2013-01-21 06:57:59 +00:00
Craig Topper 5c84c25bf4 Fix some 80 column violations.
llvm-svn: 173006
2013-01-21 06:21:54 +00:00
Craig Topper 2cd375896a Make helper method static.
llvm-svn: 173005
2013-01-21 06:13:28 +00:00
Craig Topper cf93977920 Convert more EVT's to MVT's in the lowering methods.
llvm-svn: 172995
2013-01-20 21:50:27 +00:00
Craig Topper e65a08be64 Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards.
llvm-svn: 172994
2013-01-20 21:34:37 +00:00
Renato Golin e1fb059327 Revert CostTable algorithm, will re-write
llvm-svn: 172992
2013-01-20 20:57:20 +00:00
Craig Topper ce61fdf0a3 Make LowerVSETCC a static function and use MVT instead of EVT.
llvm-svn: 172969
2013-01-20 09:02:22 +00:00
Nadav Rotem 9450fcfff1 Revert 172708.
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.

llvm-svn: 172968
2013-01-20 08:35:56 +00:00
Craig Topper 9976974cc6 Make some helper methods static.
llvm-svn: 172936
2013-01-20 00:50:58 +00:00
Craig Topper 4ac87da529 Remove DebugLoc argument from static function. It can easily be obtained from the SVOp passed in.
llvm-svn: 172935
2013-01-20 00:43:42 +00:00
Craig Topper 3da6507c41 Use MVT instead of EVT in more instruction lowering code.
llvm-svn: 172933
2013-01-20 00:38:18 +00:00
Craig Topper 53c7fbabbf Use MVT instead of EVT in more of the shuffle lowering code.
llvm-svn: 172930
2013-01-19 23:36:09 +00:00
Craig Topper bb772d27a7 Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file.
llvm-svn: 172927
2013-01-19 23:14:09 +00:00
Nadav Rotem 7b3120b9ae On Sandybridge split unaligned 256bit stores into two xmm-sized stores.
llvm-svn: 172894
2013-01-19 08:38:41 +00:00
Craig Topper 84b01120bc Use MVT instead of EVT when computing shuffle immediates since they can only be for legal types. Keeps compiler from generating unneeded checks and handling for extended types.
llvm-svn: 172893
2013-01-19 08:27:45 +00:00
Nadav Rotem 7431211214 On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction.
llvm-svn: 172868
2013-01-18 23:10:30 +00:00
Craig Topper 1cb8aa581b Calculate vector element size more directly for VINSERTF128/VEXTRACTF128 immediate handling. Also use MVT since this only called on legal types during pattern matching.
llvm-svn: 172797
2013-01-18 08:41:28 +00:00
Craig Topper e938138daf Minor formatting fix. No functional change.
llvm-svn: 172795
2013-01-18 07:27:20 +00:00
Craig Topper 908f7d14b5 Spelling fix: extened->extended. Trailing whitespace in same function.
llvm-svn: 172793
2013-01-18 06:50:59 +00:00
Craig Topper 01fcf2e2f2 Make more use of is128BitVector/is256BitVector in place of getSizeInBits() == 128/256.
llvm-svn: 172792
2013-01-18 06:44:29 +00:00
Chad Rosier 1e8f053bd1 [ms-inline asm] Make the error message more generic now that we support the
'SIZE' and 'LENGTH' operators.

llvm-svn: 172773
2013-01-18 00:50:59 +00:00
Chad Rosier d0ed73acb4 [ms-inline asm] Add support for the 'SIZE' and 'LENGTH' operators.
Part of rdar://12576868

llvm-svn: 172743
2013-01-17 19:21:48 +00:00
Elena Demikhovsky f6a30e05d5 Optimization for the following SIGN_EXTEND pairs:
v8i8  -> v8i64, 
v8i8  -> v8i32, 
v4i8  -> v4i64, 
v4i16 -> v4i64 
for AVX and AVX2.

Bug 14865.

llvm-svn: 172708
2013-01-17 09:59:53 +00:00
Craig Topper c7e6feee42 Combine AVX and SSE forms of MOVSS and MOVSD into the same multiclasses so they get instantiated together.
llvm-svn: 172704
2013-01-17 06:59:42 +00:00
Jakob Stoklund Olesen 213a2f8b3f Provide a place for targets to insert ILP optimization passes.
Move the early if-conversion pass into this group.

ILP optimizations usually need to find the right balance between
register pressure and ILP using the MachineTraceMetrics analysis to
identify critical paths and estimate other costs. Such passes should run
together so they can share dominator tree and loop info analyses.

Besides if-conversion, future passes to run here here could include
expression height reduction and ARM's MLxExpansion pass.

llvm-svn: 172687
2013-01-17 00:58:38 +00:00
Renato Golin f104c4c4ca Change CostTable model to be global to all targets
Moving the X86CostTable to a common place, so that other back-ends
can share the code. Also simplifying it a bit and commoning up
tables with one and two types on operations.

llvm-svn: 172658
2013-01-16 21:29:55 +00:00
Chad Rosier 5c118fd2ec [ms-inline asm] Extend support for parsing Intel bracketed memory operands that
have an arbitrary ordering of the base register, index register and displacement.
rdar://12527141

llvm-svn: 172484
2013-01-14 22:31:35 +00:00
Craig Topper 0d2c29e807 Simplify nested strconcats in X86 td files since strconcat can take more than 2 arguments.
llvm-svn: 172379
2013-01-14 07:46:34 +00:00
Craig Topper 4c69a05d2d Create a single multiclass for SSE and AVX version of MOVL/MOVH. Prevents needing to specify everything twice. No functional change intended
llvm-svn: 172378
2013-01-14 07:26:58 +00:00
Nick Lewycky f41a80efd0 Fix typo in comment.
llvm-svn: 172364
2013-01-13 19:03:55 +00:00
Benjamin Kramer bcd14a0f26 X86: Add patterns for X86ISD::VSEXT in registers.
Those can occur when something between the sextload and the store is on the same
chain and blocks isel. Fixes PR14887.

llvm-svn: 172353
2013-01-13 11:37:04 +00:00
Preston Gurd 99c6990457 Update patch for the pad short functions pass for Intel Atom (only).
Adds a check for -Oz, changes the code to not re-visit BBs,
and skips over DBG_VALUE instrs.

Patch by Andy Zhang.

llvm-svn: 172258
2013-01-11 22:06:56 +00:00
NAKAMURA Takumi 7f25427686 X86AsmParser.cpp: Fix up r172148, to add initializer in another CreateMem().
llvm-svn: 172157
2013-01-11 01:13:54 +00:00
Jakub Staszak ab3d878f35 Remove heavy and unused #inclues from X86TargetObjectFile.cpp.
llvm-svn: 172151
2013-01-10 23:43:56 +00:00
Chad Rosier 8c2a9c744e [ms-inline asm] Make sure we set a default value for AddressOf. Follow on to
r172121.

llvm-svn: 172148
2013-01-10 23:39:07 +00:00
Chad Rosier a4bc9437a2 [ms-inline asm] Add support for calling functions from inline assembly.
Part of rdar://12991541

llvm-svn: 172121
2013-01-10 22:10:27 +00:00
Nadav Rotem b1791a75cd ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00
Nadav Rotem 977e0be4a0 Efficient lowering of vector sdiv when the divisor is a splatted power of two constant.
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.

Patch by Zvi Rackover.

llvm-svn: 171953
2013-01-09 05:14:33 +00:00
Eric Christopher bf7bc4966c Last in the series of removing unnecessary '0' arguments for
address space. Reordered the EmitULEB128IntValue arguments to
make this easier.

llvm-svn: 171949
2013-01-09 03:52:05 +00:00
Andrew Trick 9f0b95f260 MIsched: add an ILP window property to machine model.
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.

I converted some in-order scheduling tests to A2. Hal is working on
more test cases.

llvm-svn: 171946
2013-01-09 03:36:49 +00:00
Eric Christopher e3ab3d0e2c These functions have default arguments of 0 for the last arg. Use
them.

llvm-svn: 171933
2013-01-09 01:57:54 +00:00
Nadav Rotem b696c36fcd Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
2013-01-09 01:15:42 +00:00
Preston Gurd a01daace88 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Eli Bendersky 4d9ada036c Renamed MCInstFragment to MCRelaxableFragment and added some comments.
No change in functionality.

llvm-svn: 171822
2013-01-08 00:22:56 +00:00
Jordan Rose e8f1eaea8a Change SMRange to be half-open (exclusive end) instead of closed (inclusive)
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.

llvm-svn: 171765
2013-01-07 19:00:49 +00:00
Craig Topper 25cdf92b34 Remove # from the beginning and end of def names.
llvm-svn: 171696
2013-01-07 05:26:58 +00:00
Craig Topper bd62d64cbf Remove unnecessary # tokens at the beginning and end of defm names.
llvm-svn: 171694
2013-01-07 05:04:39 +00:00
Chandler Carruth 2109f47d97 Fix the enumerator names for ShuffleKind to match tho coding standards,
and make its comments doxygen comments.

llvm-svn: 171688
2013-01-07 03:20:02 +00:00
Chandler Carruth 50a36cd148 Make the popcnt support enums and methods have more clear names and
follow the conding conventions regarding enumerating a set of "kinds" of
things.

llvm-svn: 171687
2013-01-07 03:16:03 +00:00
Chandler Carruth d3e73556d6 Move TargetTransformInfo to live under the Analysis library. This no
longer would violate any dependency layering and it is in fact an
analysis. =]

llvm-svn: 171686
2013-01-07 03:08:10 +00:00
Chandler Carruth 664e354de7 Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681
2013-01-07 01:37:14 +00:00
Craig Topper 4f1c7256f9 Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior.
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.

Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.

llvm-svn: 171668
2013-01-06 20:39:29 +00:00
Evan Cheng 3fb03e23a4 Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch.
llvm-svn: 171665
2013-01-06 19:00:15 +00:00
Craig Topper 92a70b1e65 Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171608
2013-01-05 07:39:25 +00:00
Nadav Rotem 478b6a47ec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Jakub Staszak 43fafaf496 Move 'break' to the right place to prevent fallthru. There is no test-case
because conditions in the next case prevented from doing anything nasty.

llvm-svn: 171549
2013-01-04 23:01:26 +00:00
Preston Gurd e36b685a94 The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Nadav Rotem e1d5c4b8b9 LoopVectorizer:
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469
2013-01-04 17:48:25 +00:00
Nadav Rotem c616a5408a Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message:
Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171468
2013-01-04 17:35:21 +00:00
Elena Demikhovsky 5f2f06d2d9 Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171467
2013-01-03 08:48:33 +00:00
Michael Gottesman 820aac1c78 Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks."
This reverts commit r171461 since it breaks the following tests:

Clang :: Analysis/outofbound-notwork.c
Clang :: Analysis/string-fail.c
Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp
Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp
Clang :: CXX/temp/temp.param/p14.cpp
Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp
Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c
Clang :: CodeGen/blocks-2.c
Clang :: CodeGen/libcalls-d.c
Clang :: CodeGen/libcalls-ld.c
Clang :: CodeGenCXX/conversion-function.cpp
Clang :: CodeGenCXX/debug-info-limit-type.cpp
Clang :: CodeGenCXX/inheriting-constructor.cpp
Clang :: FixIt/fixit-errors.c
Clang :: FixIt/fixit-pmem.cpp
Clang :: Modules/namespaces.cpp
Clang :: PCH/changed-files.c
Clang :: PCH/pr4489.c
Clang :: PCH/source-manager-stack.c
Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp
Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp
Clang :: SemaTemplate/instantiate-function-1.mm

llvm-svn: 171466
2013-01-03 08:18:30 +00:00
Craig Topper 7c27cc9fd0 Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171461
2013-01-03 06:40:20 +00:00