Those two files got mixed up.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175746
Fixes for-loop.cl piglit test
Patch By: Vincent Lejeune
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175742
available.
With this commit there are no longer any assertion or verifier failures when
running 'make check' without LiveVariables. There are still 56 failing tests
with codegen differences and 1 unexpectedly passing test.
llvm-svn: 175719
The constructs %hi() and %lo() represent the high and low 16
bits of the address.
Because the 16 bit offset field of an LW instruction is
interpreted as signed, if bit 15 of the low part is 1 then the
low part will act as a negative and 1 needs to be added to the
high part.
Contributer: Vladimir Medic
llvm-svn: 175707
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.
One optimization is implemented here: folds to clean up complex
addressing expressions for thread-local storage and medium code
model. It will also be useful for large code model sequences when
those are added later. I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point. DCE is
trivial on the DAG representation.
A typical example of a candidate code sequence in assembly:
addis 3, 2, globalvar@toc@ha
addi 3, 3, globalvar@toc@l
lwz 5, 0(3)
When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:
addis 3, 2, globalvar@toc@ha
lwz 5, globalvar@toc@l(3)
Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.
llvm-svn: 175697
Rewrite value numbers directly in the 'Other' LiveInterval which is
moribund anyway. This avoids allocating the OtherAssignments vector.
llvm-svn: 175690
excluding visibility bits.
Mips specific standalone assembler directive "set at".
This directive changes the general purpose register
that the assembler will use when given the symbolic
register name $at.
This does not include negative testing. That will come
in a future patch.
A side affect of this patch recognizes the different
GPR register names for temporaries between old abi
and new abi so a test case for that is included.
Contributer: Vladimir Medic
llvm-svn: 175686
When findReachingDefs() finds that only one value can reach the basic
block, just copy the work list of visited blocks directly into the live
interval.
Sort the block list and use a LiveRangeUpdater to make the bulk add
fast.
When multiple reaching defs are found, transfer the work list to the
updateSSA() work list as before. Also use LiveRangeUpdater in
updateLiveIns() following updateSSA().
This makes live interval analysis more than 3x faster on one huge test
case.
llvm-svn: 175685
The slot that we're adding/removing the attribute from may not be the same as
the attribute coming in. Make sure that they match up before we try to
add/remove them.
PR15313
llvm-svn: 175684
related failures when running 'make check' without LiveVariables with the
verifier enabled. Some of the remaining failures elsewhere may still be fallout
from incorrect updating of LiveIntervals or the few missing cases left in the
two-address pass.
llvm-svn: 175672
(2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y)))
can be folded into a (2xi32) (buildvector i32 a, i32 b).
Such a DAG would cause uneccessary vdup instructions followed by vmovn
instructions.
We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in
the vectorized version of the code below.
double A[N];
double B[N];
void test_double_compare_to_double() {
int i;
for(i=0;i<N;i++)
A[i] = (double)(A[i] < B[i]);
}
radar://13191881
Fixes bug 15283.
llvm-svn: 175670
This handles the cases where the 6-bit splat element is odd, converting
to a three-instruction sequence to add or subtract two splats. With this
fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed.
llvm-svn: 175663
Adding new segments to large LiveIntervals can be expensive because the
LiveRange objects after the insertion point may need to be moved left or
right. This can cause quadratic behavior when adding a large number of
segments to a live range.
The LiveRangeUpdater class allows the LIveInterval to be in a temporary
invalid state while segments are being added. It maintains an internal
gap in the LiveInterval when it is shrinking, and it has a spill area
for new segments when the LiveInterval is growing.
The behavior is similar to the existing mergeIntervalRanges() function,
except it allocates less memory for the spill area, and the algorithm is
turned inside out so the loop is driven by the clients.
llvm-svn: 175644
- When extloading from a vector with non-byte-addressable element, e.g.
<4 x i1>, the current logic breaks. Extend the current logic to
fix the case where the element type is not byte-addressable by loading
all bytes, bit-extracting/packing each element.
llvm-svn: 175642
The PPC backend doesn't handle these correctly. This patch uses logic
similar to that in the X86 and ARM backends to track these arguments
properly.
llvm-svn: 175635
Add HexagonMCInst class which adds various Hexagon VLIW annotations.
In addition, this class also includes some APIs related to the
constant extenders.
llvm-svn: 175634
During lowering of a BUILD_VECTOR, we look for opportunities to use a
vector splat. When the splatted value fits in 5 signed bits, a single
splat does the job. When it doesn't fit in 5 bits but does fit in 6,
and is an even value, we can splat on half the value and add the result
to itself.
This last optimization hasn't been working recently because of improved
constant folding. To circumvent this, create a pseudo VADD_SPLAT that
can be expanded during instruction selection.
llvm-svn: 175632
sext <4 x i1> to <4 x i64>
sext <4 x i8> to <4 x i64>
sext <4 x i16> to <4 x i64>
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
(sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.
I also added a cost of this operations to the AVX costs table.
llvm-svn: 175619
It is possible that frame pointer is not found in the
callee saved info, thus FramePtrSpillFI may be incorrect
if we don't check the result of hasFP(MF).
Besides, if we enable the stack coloring algorithm, there
will be an assertion to ensure the slot is live. But in
the test case, %var1 is not live in the prologue of the
function, and we will get the assertion failure.
Note: There is similar code in ARMFrameLowering.cpp.
llvm-svn: 175616
and removing instructions. The implementation seems more complicated than it
needs to be, but I couldn't find something simpler that dealt with all of the
corner cases.
Also add a call to repairIndexesInRange() from repairIntervalsInRange().
llvm-svn: 175601
SltCCRxRy16, SltiCCRxImmX16, SltiuCCRxImmX16, SltuCCRxRy16
$T8 shows up as register $24 when emitted from C++ code so we had
to change some tests that were already there for this functionality.
llvm-svn: 175593
MS-style inline assembly.
This is a follow-on to r175334. Forcing a FP to be emitted doesn't ensure it
will be used. Therefore, force the base pointer as well. We now treat MS
inline assembly in the same way we treat functions with dynamic stack
realignment and VLAs. This guarantees the BP will be used to reference
parameters and locals.
rdar://13218191
llvm-svn: 175576
excluding visibility bits.
Mips (o32 abi) specific e_header setting.
EF_MIPS_ABI_O32 needs to be set in the
ELF header flags for o32 abi output.
Contributer: Reed Kotler
llvm-svn: 175569
excluding visibility bits.
Mips (Mips16) specific e_header setting.
EF_MIPS_ARCH_ASE_M16 needs to be set in the
ELF header flags for Mips16.
Contributer: Reed Kotler
llvm-svn: 175566
excluding visibility bits.
Mips (MicroMips) specific STO handling .
The st_other field settig for STO_MIPS_MICROMIPS
Contributer: Zoran Jovanovic
llvm-svn: 175564
excluding visibility bits.
Generic STO handling at the Target level.
The st_other field of the ELF symbol table is one
byte in size. The first 2 bytes are used for generic
visibility and are currently handled by llvm.
The other six bits are processor specific and need
to be set at the target level.
A couple of notes:
The new static methods for accessing and setting the "other"
flags in include/llvm/MC/MCELF.h match the style guide
and not the other methods in the file. I don't like the
inconsistency, but feel I should follow the prescribed
lowerUpper() convention.
STO_ value definitions are not specified in gnu land as
consistently as the STT_ and STB_ fields. Probably because
the latter were defined in a standards doc and the former
defined partially in code. I have stuck with the full byte
definition of the flags.
Contributer: Zoran Jovanovic
llvm-svn: 175561
In my previous commit:
"Merge a f32 bitcast of a v2i32 extractelt
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers."
I added a pattern containing a copy_to_regclass. The copy_to_regclass is
actually not needed.
radar://13191881
llvm-svn: 175555
/dev/stdin as an input when stdin is connected to a tty, for example.
No test, because it's difficult to write a reasonably portable test
for this. /dev/stdin isn't a character device when stdin is redirected
from a file or connected to a pipe.
llvm-svn: 175542
When creating an allocation hint for a register pair, make sure the hint
for the physical register reference is still in the allocation order.
rdar://13240556
llvm-svn: 175541
Target implementations of getRegAllocationHints() should use the
provided allocation order, and they can never return hints outside the
order. This is already documented in TargetRegisterInfo.h.
<rdar://problem/13240556>
llvm-svn: 175540
Due to the execution order of doFinalization functions, the GC information were
deleted before AsmPrinter::doFinalization was executed. Thus, the
GCMetadataPrinter::finishAssembly was never called.
The patch fixes that by moving the code of the GCInfoDeleter::doFinalization to
Printer::doFinalization.
llvm-svn: 175528
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.
The patch fixes this by using a COPY_TO_REGCLASS and a EXTRACT_SUBREG to extract
the element from the vector instead.
radar://13191881
llvm-svn: 175520
This stops the Machine Verifier from complaining about uses of undefined
physical registers.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175518
Kernel function arguments are lowered to loads from the PARAM_I address
space. When creating these load instructions, we were initializing
their MachinePointerInfo with an Arguement object that was not attached
to any function. This was causing the MachineScheduler to crash when
it tried to access the parent of the Arguement.
This has been fixed by initializing the MachinePointerInfo with a
UndefValue instead.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175517
In some cases, we were losing track of live implicit registers which
was creating dead defs and causing the scheduler to produce invalid
code.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175516
This patch makes asan instrument memory accesses with unusual sizes (e.g. 5 bytes or 10 bytes), e.g. long double or
packed structures.
Instrumentation is done with two 1-byte checks
(first and last bytes) and if the error is found
__asan_report_load_n(addr, real_size) or
__asan_report_store_n(addr, real_size)
is called.
Also, call these two new functions in memset/memcpy
instrumentation.
asan-rt part will follow.
llvm-svn: 175507
fields were only ever set in the constructor. The create method retains
its consistent interface so that these bits can be re-threaded through
the emitter if they're ever needed.
This was found by the -Wunused-private-field Clang warning.
llvm-svn: 175482
Previously we seemed to be assuming that all functions were definitions and all
methods were declarations. This may be consistent with how Clang uses DIBuilder
but doesn't have to be true of all clients (such as DragonEgg).
llvm-svn: 175423
at this time, llvm is generating a different but equivalent pattern
that would lead to this instruction. I am trying to think of a way
to get it to generate this. If I can't, I may just remove the pseudo.
llvm-svn: 175419
This expansion will be moved to expandISelPseudos as soon as I can figure
out how to do that. There are other instructions which use this
ExpandFEXT_T8I816_ins and as soon as I have finished expanding them all,
I will delete the macro asm string text so it has no way to be used
in the future.
llvm-svn: 175413
This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.
bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.
llvm-svn: 175397
GCC warns about the attribute being ignored if it occurs after void*.
There seems to be some kind of incompatibility between clang and gcc here, but
I can't fathom who's right.
void* LLVM_LIBRARY_VISIBILITY foo(); // clang: hidden, gcc: default
LLVM_LIBRARY_VISIBILITY void *bar(); // clang: hidden, gcc: hidden
void LLVM_LIBRARY_VISIBILITY qux(); // clang: hidden, gcc: hidden
llvm-svn: 175394
arguably better than forward iterators for this use case, they are confusing and
there are some implementation problems with reverse iterators and MI bundles.
llvm-svn: 175393
MachineBasicBlock::SplitCriticalEdge. Since this is an iterator rather than
an instr_iterator, the isBundled() check only passes if getFirstTerminator()
returned end() and the garbage memory happens to lean that way.
Multiple successors can be present without any terminator instructions in the
case of exception handling with a fallthrough.
llvm-svn: 175383
terminators that actually have register uses when splitting critical edges.
This commit also introduces a method repairIntervalsInRange() on LiveIntervals,
which allows for repairing LiveIntervals in a small range after an arbitrary
target hook modifies, inserts, and removes instructions. It's pretty limited
right now, but I hope to extend it to support all of the things that are done
by the convertToThreeAddress() target hooks.
llvm-svn: 175382
(or (bool?A:B),(bool?C:D)) --> (bool?(or A,C):(or B,D))
By the time the OR is visited, both the SELECTs have been visited and not
optimized and the OR itself hasn't been transformed so we do this transform in
the hopes that the new ORs will be optimized.
The transform is explicitly disabled for vector-selects until "codegen matures
to handle them better".
Patch by Muhammad Tauqir!
llvm-svn: 175380
Avoids malloc and is a lot denser. We lose iteration over target independent
attributes, but that's a strange interface anyways and didn't have any users
outside of AttrBuilder.
llvm-svn: 175370
as well as 16/32 bit variants to do and so I want this to look nice
when I do it. I've been experimenting with this. No new test cases
are needed.
llvm-svn: 175369