Commit Graph

13648 Commits

Author SHA1 Message Date
Johnny Chen c86256fa5d Add NVTBLFrm to represent A8.6.406 VTBL, VTBX Vector Table Lookup Instructions.
These instructions use byte index in a control vector (M:Vm) to lookup byte
values in a table and generate a new vector (D:Vd).  The table is specified via
a list of vectors, which can be:

{Dn}
{Dn D<n+1>}
{Dn D<n+1> D<n+2>}
{Dn D<n+1> D<n+2> D<n+3>}

llvm-svn: 99789
2010-03-29 01:14:22 +00:00
Chris Lattner 11f85ccf7d zap an extra line that Eli noticed!
llvm-svn: 99770
2010-03-28 18:52:28 +00:00
Chris Lattner b7c48433df fix a type contradition: XCoreISD::RETSP has one argument, not zero.
llvm-svn: 99760
2010-03-28 08:47:39 +00:00
Chris Lattner 505849d277 remove a pattern with no testcase that doesn't appear to be
matchable: it seems like it would always constant fold.

llvm-svn: 99758
2010-03-28 08:40:48 +00:00
Chris Lattner 3dad5fbeb9 fix integer negates to use the proper type for the zero vectors,
this also depends on the new "bitconvert dropping" behavior just
added to tblgen.

llvm-svn: 99757
2010-03-28 08:39:10 +00:00
Chris Lattner 240154e633 fix a typo, bitconvert from node to itself isn't valid.
llvm-svn: 99755
2010-03-28 08:36:45 +00:00
Chris Lattner 6c223ee0e9 fix vnot matching to explicitly specify the type of the
input to be v8i8 or v16i8, which buildvectors get canonicalized to.

This allows the patterns that were previously using a bare 'vnot' to
match, before they couldn't.

llvm-svn: 99754
2010-03-28 08:08:07 +00:00
Chris Lattner 1c85e3476d fix up vnot matching, eliminating a dead pattern, correcting a couple of
patterns that would never match because of bitcast, and eliminating use
of vnot_conv.

llvm-svn: 99753
2010-03-28 08:00:23 +00:00
Chris Lattner e549d9b1f2 stop using vnot_conv
llvm-svn: 99750
2010-03-28 07:48:17 +00:00
Chris Lattner 227a83d6ed revert r99743, this is saying that the repmovs instructinos have an
*input* of other type, which is the VT. 

llvm-svn: 99749
2010-03-28 07:38:39 +00:00
Chris Lattner be980f2df7 remove a bunch of dead patterns.
llvm-svn: 99748
2010-03-28 07:38:00 +00:00
Chris Lattner cba70c8162 claiming to return other is pointless.
llvm-svn: 99743
2010-03-28 05:57:36 +00:00
Chris Lattner a520b166dc Improve systemz to model cmp and ucmp nodes as returning
their flags correctly.

llvm-svn: 99738
2010-03-28 05:21:52 +00:00
Chris Lattner e83591c616 the FPCmp node returns an i32.
llvm-svn: 99737
2010-03-28 05:12:57 +00:00
Chris Lattner ec5fe65838 fix some modelling problems exposed by a patch I'm working on. bsr/bsf/ptest
nodes all have an EFLAGS result when made by isel lowering.

llvm-svn: 99736
2010-03-28 05:07:17 +00:00
Bob Wilson 0f8a02830a Fix indentation.
llvm-svn: 99705
2010-03-27 04:01:23 +00:00
Bob Wilson cf603fb1c5 Add a format argument to the N3V and N3VX classes, removing the N3Vf class.
llvm-svn: 99704
2010-03-27 03:56:52 +00:00
Chris Lattner 07943af506 eliminate the last of the parallel's!
llvm-svn: 99700
2010-03-27 02:47:14 +00:00
Johnny Chen 6094cdab9f Add NVMulSLFrm to represent "3-register multiply with scalar" operations and set
it as the format for the appropriate N3V*SL*<> classes.  These instructions
require special handling of the M:Vm field which encodes the restricted Dm and
the lane index within Dm.

Examples are A8.6.325 VMLA, VMLAL, VMLS, VMLSL (by scalar):

	vmlal.s32	q3, d2, d10[0]

llvm-svn: 99690
2010-03-27 01:03:13 +00:00
Chris Lattner c5e20d9031 eliminate almost all the rest of the x86-32 parallels.
llvm-svn: 99686
2010-03-27 00:45:04 +00:00
Jim Grosbach 44313db557 Thumb2 storeFrom/LoadToStackSlot() need to handle tGPR regs directly, not pass
through to the generic version. The generic functions use STR/LDR, but T2
needs the t2STR/t2LDR instead so we get the addressing mode correct.

llvm-svn: 99678
2010-03-27 00:09:12 +00:00
Johnny Chen 93acfbf441 Remove the duplicate multiclass N3VSh_QHSD and use N3VInt_QHSD which is modified
to now take a format argument.  N3VDInt<> and N3VQInt<> are modified to take a
format argument as well.

llvm-svn: 99676
2010-03-26 23:49:07 +00:00
Johnny Chen 0b57de3c4c Add NVExtFrm to represent NEON Vector Extract Instructions, that uses Inst{11-8}
to encode the byte location of the extracted result in the concatenation of the
operands, from the least significant end.

Modify VEXTd and VEXTq classes to use the format.

llvm-svn: 99659
2010-03-26 22:28:56 +00:00
Johnny Chen 2cf04957c2 Add N3RegVShFrm to represent 3-Register Vector Shift Instructions, which do not
follow the N3RegFrm's operand order of D:Vd N:Vn M:Vm.  The operand order of
N3RegVShFrm is D:Vd M:Vm N:Vn (notice that M:Vm is the first src operand).

Add a parent class N3Vf which requires passing a Format argument and which the
N3V class is modified to inherit from.  N3V class represents the "normal"
3-Register NEON Instructions with N3RegFrm.

Also add a multiclass N3VSh_QHSD to represent clusters of NEON 3-Register Shift
Instructions and replace 8 invocations with it.

llvm-svn: 99655
2010-03-26 21:26:28 +00:00
Jim Grosbach bf59859b2b vldm/vstm can only do up to 16 double-word registers at a time.
Radar 7797856

llvm-svn: 99630
2010-03-26 18:41:09 +00:00
Johnny Chen 8fc94d6362 Add N3RegFrm to represent "NEON 3 vector register format" instructions.
Examples are VABA (Vector Absolute Difference and Accumulate), VABAL (Vector
Absolute Difference and Accumulate Long), and VABD (Vector Absolute Difference).

llvm-svn: 99628
2010-03-26 18:32:20 +00:00
Evan Cheng 3365fb1412 Do not sibcall if stack needs to be dynamically aligned.
llvm-svn: 99620
2010-03-26 16:26:03 +00:00
Evan Cheng 00a620c61e Allow trivial sibcall of vararg callee when no arguments are being passed.
llvm-svn: 99598
2010-03-26 02:13:13 +00:00
Johnny Chen 5d4e917d9f Add N2RegVShLFrm and N2RegVShRFrm formats so that the disassembler can easily
dispatch to the appropriate routines to handle the different interpretations of
the shift amount encoded in the imm6 field.  The Vd, Vm fields are interpreted
the same between the two, though.

See, for example, A8.6.367 VQSHL, VQSHLU (immediate) for N2RegVShLFrm format and
A8.6.368 VQSHRN, VQSHRUN for N2RegVShRFrm format.

llvm-svn: 99590
2010-03-26 01:07:59 +00:00
Jim Grosbach 71fcb4fedd switch the flag for using NEON for SP floating point to a subtarget 'feature'.
Re-commit. This time complete with testsuite updates.

llvm-svn: 99570
2010-03-25 23:47:34 +00:00
Jim Grosbach 42bb89c7d9 need to fix 'make check' tests first. revert for a moment.
llvm-svn: 99569
2010-03-25 23:34:05 +00:00
Jim Grosbach 7fce4e39aa switch the flag for using NEON for SP floating point to a subtarget 'feature'
llvm-svn: 99568
2010-03-25 23:32:19 +00:00
Johnny Chen a3617ec88a Removed instruction class NI from ARMInstrFormats.td.
It doesn't seem to be used anywhere.

llvm-svn: 99566
2010-03-25 23:11:56 +00:00
Jim Grosbach a43386ba8f switch the use-vml[as] instructions flag to a subtarget 'feature'
llvm-svn: 99565
2010-03-25 23:11:16 +00:00
Johnny Chen 91d2774416 Add NVDupLnFrm and change NVDupLane class to use that format.
llvm-svn: 99557
2010-03-25 21:49:12 +00:00
Jim Grosbach 4b3b2ef65c ARM cortex-a8 doesn't do vmla/vmls well. disable them by default for that cpu
llvm-svn: 99549
2010-03-25 20:48:50 +00:00
Johnny Chen d82f9002e4 Add NVCVTFrm (NEON Convert with fractional bits immediate) and modify N2VImm to
expect a Format arg.  N2VCvtD/N2VCvtQ are modified to use the NVCVTFrm format.

llvm-svn: 99548
2010-03-25 20:39:04 +00:00
Daniel Dunbar d919276bc0 Fix -Asserts warning, again.
llvm-svn: 99542
2010-03-25 19:35:53 +00:00
Jakob Stoklund Olesen 3758ff917e Tag SSE2 integer instructions as SSEPackedInt.
llvm-svn: 99540
2010-03-25 18:52:04 +00:00
Jakob Stoklund Olesen f8d7eda663 Teach TableGen to understand X.Y notation in the TSFlagsFields strings.
Remove much horribleness from X86InstrFormats as a result. Similar
simplifications are probably possible for other targets.

llvm-svn: 99539
2010-03-25 18:52:01 +00:00
Jakob Stoklund Olesen 49e121d5e4 Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings.
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.

The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.

This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.

The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.

llvm-svn: 99524
2010-03-25 17:25:00 +00:00
Johnny Chen 45ab3f3ccf Added a new instruction class NVDupLane to be inherited by VDUPLND and VDUPLNQ,
instead of the current N2V.  Format of NVDupLane instances are set to NEONFrm
currently.

llvm-svn: 99518
2010-03-25 17:01:27 +00:00
Bob Wilson e543e7fcb1 Reapply Kevin's change 94440, now that Chris has fixed the limitation on
opcode values fitting in one byte (svn r99494).

llvm-svn: 99514
2010-03-25 16:36:14 +00:00
Chris Lattner 23bf99a97c eliminate a bunch more parallels now that scheduling
handles dead implicit results more aggressively.  More
to come, I think this is now just a data entry problem.

llvm-svn: 99486
2010-03-25 05:44:01 +00:00
Evan Cheng b07a29ecd4 Disable folding loads into tail call in 32-bit PIC mode. It can introduce illegal code like this:
addl    $12, %esp
        popl    %esi
        popl    %edi
        popl    %ebx
        popl    %ebp
        jmpl    *__Block_deallocator-L1$pb(%esi)  # TAILCALL

The problem is the global base register is assigned GR32 register class. TCRETURNmi needs the registers making up the address mode to have the GR32_TC register class.

The *proper* fix is for X86DAGToDAGISel::getGlobalBaseReg() to return a copy from the global base register of the machine function rather than returning the register itself. But that has the potential of causing it to be coalesced to a more restrictive register class: GR32_TC. It can introduce additional copies and spills. For something as important the PIC base, it's not worth it especially since this is not an issue on 64-bit.

llvm-svn: 99455
2010-03-25 00:10:31 +00:00
Bob Wilson 5b2da69f6d Speculatively revert this to see if it fixes buildbot failures.
--- Reverse-merging r99440 into '.':
U    test/MC/AsmParser/X86/x86_32-bit_cat.s
U    test/MC/AsmParser/X86/x86_32-encoding.s
U    include/llvm/IntrinsicsX86.td
U    include/llvm/CodeGen/SelectionDAGNodes.h
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86ISelLowering.h

llvm-svn: 99450
2010-03-24 23:26:29 +00:00
Kevin Enderby f5584a7397 Added the Advanced Encryption Standard (AES) Instructions.
llvm-svn: 99440
2010-03-24 22:33:33 +00:00
Jim Grosbach 34de7768bf Make the use of the vmla and vmls VFP instructions controllable via cmd line.
Preliminary testing shows significant performance wins by not using these
instructions.

llvm-svn: 99436
2010-03-24 22:31:46 +00:00
Kevin Enderby b96eb68497 Fixed the SS42AI template for the SSE 4.2 instructions with TA prefix so it does
not get an "Unknown immediate size" assert failure when used.  All instructions 
of this form have an 8-bit immediate.  Also added a test case of an example
instruction that is of this form.

llvm-svn: 99435
2010-03-24 22:28:42 +00:00
Nate Begeman 2ceb288416 Per chris's request, add some comments.
llvm-svn: 99434
2010-03-24 22:19:06 +00:00