Commit Graph

3876 Commits

Author SHA1 Message Date
Hal Finkel 0d8db46799 [PowerPC] Add global named register support
Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).

llvm-svn: 208509
2014-05-11 19:29:11 +00:00
Hal Finkel c4c6c87666 [PowerPC] On PPC32, 128-bit shifts might be runtime calls
The counter-loops formation pass needs to know what operations might be
function calls (because they can't appear in counter-based loops). On PPC32,
128-bit shifts might be runtime calls (even though you can't use __int128 on
PPC32, it seems that SROA might form them).

Fixes PR19709.

llvm-svn: 208501
2014-05-11 16:23:29 +00:00
Rafael Espindola 566fcfe69b Remove the UseCFI option from createAsmStreamer.
We were already always passing true, this just removes the option.

llvm-svn: 208205
2014-05-07 13:00:43 +00:00
Rafael Espindola 3d082fa507 Fix pr19645.
The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.

This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.

Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.

llvm-svn: 207920
2014-05-03 19:57:04 +00:00
Craig Topper 2d2aa0ca1f Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Craig Topper ee7b0f3956 De-virtualize or remove some methods that have no overrides nor override anything. In some cases remove all together if there are no callers either.
llvm-svn: 207610
2014-04-30 05:53:27 +00:00
Craig Topper 0d3fa92514 [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition
llvm-svn: 207504
2014-04-29 07:57:37 +00:00
Eric Christopher 612bb69bf7 None of these targets actually define their own CFI_INSTRUCTION
opcode so there's no reason to use the target namespace for it
rather than TargetOpcode.

llvm-svn: 207475
2014-04-29 00:16:46 +00:00
Eric Christopher d17374919b 80-column, tab characters, comment fixups.
llvm-svn: 207473
2014-04-29 00:16:40 +00:00
Craig Topper 8c0b4d0791 Convert more SelectionDAG functions to use ArrayRef.
llvm-svn: 207397
2014-04-28 05:57:50 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper 481fb2879f Convert SelectionDAG::SelectNodeTo to use ArrayRef.
llvm-svn: 207377
2014-04-27 19:21:11 +00:00
Craig Topper 64941d9786 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper 206fcd450a Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper 48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Craig Topper 062a2baef0 [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Reid Kleckner 5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
David Blaikie 908f4d4bf5 Spread some const around for non-mutating uses of MCSymbolData.
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.

llvm-svn: 207124
2014-04-24 16:59:40 +00:00
Evgeniy Stepanov 0a951b775e Create MCTargetOptions.
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.

Patch by Yuri Gorshenin.

llvm-svn: 206971
2014-04-23 11:16:03 +00:00
Chandler Carruth 84e68b2994 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Chandler Carruth d174b72a28 [cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838
2014-04-22 02:03:14 +00:00
Chandler Carruth e96dd8975f [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Lang Hames a1bc0f5662 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Hal Finkel 0192cbac66 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Hal Finkel d9963c75da [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

llvm-svn: 206136
2014-04-13 17:10:58 +00:00
Hal Finkel 34974ed503 [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

llvm-svn: 206120
2014-04-12 21:52:38 +00:00
NAKAMURA Takumi 98905d3f85 LLVMBuild.txt: Reformat.
llvm-svn: 205961
2014-04-10 11:16:17 +00:00
Hal Finkel a775e51274 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

llvm-svn: 205787
2014-04-08 19:00:27 +00:00
Hal Finkel 41e9b1d559 [PowerPC] Remove unused TM member variable to unbreak build
Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"

llvm-svn: 205660
2014-04-05 00:16:28 +00:00
Hal Finkel de0b413ec0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Hal Finkel b1308d525c [PowerPC] PPCTTI Cleanup
Remove the declaration of an unimplemented function.

llvm-svn: 205657
2014-04-04 23:51:11 +00:00
Hal Finkel fbf7e2a1a1 [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

llvm-svn: 205630
2014-04-04 15:15:57 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Hal Finkel f823380a44 [PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.

llvm-svn: 205476
2014-04-02 22:43:49 +00:00
Jim Grosbach 36c4953348 Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453
2014-04-02 19:28:18 +00:00
Hal Finkel 9e0baa6d3a [PowerPC] Add some missing VSX bitcast patterns
llvm-svn: 205352
2014-04-01 19:24:27 +00:00
Hal Finkel b4240ca0f4 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

llvm-svn: 205231
2014-03-31 17:48:16 +00:00
Hal Finkel 4c8f634f23 [PowerPC] Correct P7 dispatch unit allocation for vector instructions
llvm-svn: 205222
2014-03-31 17:02:10 +00:00
Hal Finkel 5c0d1454d6 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

llvm-svn: 205146
2014-03-30 13:22:59 +00:00
Hal Finkel 777c9dd90a [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

llvm-svn: 205106
2014-03-29 16:04:40 +00:00
Hal Finkel e8fba98735 [PowerPC] VSX instruction latency corrections
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.

llvm-svn: 205096
2014-03-29 13:20:31 +00:00
Rafael Espindola 5904e12bfa Completely rewrite ELFObjectWriter::RecordRelocation.
I started trying to fix a small issue, but this code has seen a small fix too
many.

The old code was fairly convoluted. Some of the issues it had:

* It failed to check if a symbol difference was in the some section when
  converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
  relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
  requiring a quiet nasty work around in ARM.
* It was missing comments.

llvm-svn: 205076
2014-03-29 06:26:49 +00:00
Hal Finkel 19be506a5e [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075
2014-03-29 05:29:01 +00:00
Hal Finkel 2583b06310 [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

llvm-svn: 205046
2014-03-28 20:24:55 +00:00
Hal Finkel 7811c6188e [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

llvm-svn: 205041
2014-03-28 19:58:11 +00:00
Hal Finkel c6fc9b8960 [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

llvm-svn: 204980
2014-03-27 23:12:31 +00:00
Hal Finkel 9dcb3583d5 [PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.

llvm-svn: 204976
2014-03-27 22:46:28 +00:00
Hal Finkel 82569b6366 [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

llvm-svn: 204971
2014-03-27 22:22:48 +00:00
Hal Finkel ad801b7459 [PowerPC] Expand v2i64 shifts
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.

llvm-svn: 204963
2014-03-27 21:26:33 +00:00