This fixes one immediate bug where an expression with side-effects
could be emitted twice during a NEON call.
It also prepares the way for folding CodeGen for many of the SISD
intrinsics into a table, reducing code size and hopefully increasing
performance eventually ("binary search + few switch cases" should be
better than "lots of switch cases").
llvm-svn: 201667
These instructions (well, the f32 ones) are supported on 32-bit ARMv8, not just
AArch64. Now that the arm_neon.td refactoring is complete, adding them is
surprisingly simple.
rdar://problem/16035743
llvm-svn: 201661
Previously, we made one traversal of the AST prior to codegen to assign
counters to the ASTs and then propagated the count values during codegen. This
patch now adds a separate AST traversal prior to codegen for the
-fprofile-instr-use option to propagate the count values. The counts are then
saved in a map from which they can be retrieved during codegen.
This new approach has several advantages:
1. It gets rid of a lot of extra PGO-related code that had previously been
added to codegen.
2. It fixes a serious bug. My original implementation (which was mailed to the
list but never committed) used 3 counters for every loop. Justin improved it to
move 2 of those counters into the less-frequently executed breaks and continues,
but that turned out to produce wrong count values in some cases. The solution
requires visiting a loop body before the condition so that the count for the
condition properly includes the break and continue counts. Changing codegen to
visit a loop body first would be a fairly invasive change, but with a separate
AST traversal, it is easy to control the order of traversal. I've added a
testcase (provided by Justin) to make sure this works correctly.
3. It improves the instrumentation overhead, reducing the number of counters for
a loop from 3 to 1. We no longer need dedicated counters for breaks and
continues, since we can just use the propagated count values when visiting
breaks and continues.
To make this work, I needed to make a change to the way we count case
statements, going back to my original approach of not including the fall-through
in the counter values. This was necessary because there isn't always an AST node
that can be used to record the fall-through count. Now case statements are
handled the same as default statements, with the fall-through paths branching
over the counter increments. While I was at it, I also went back to using this
approach for do-loops -- omitting the fall-through count into the loop body
simplifies some of the calculations and make them behave the same as other
loops. Whenever we start using this instrumentation for coverage, we'll need
to add the fall-through counts into the counter values.
llvm-svn: 201528
When a function has a single counter, we will offset the pointer by 1 when
parsing the next function. If a function has multiple counters, we are
okay after skipping rest of the counters.
llvm-svn: 201456
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
useBitFieldTypeAlignment() and appears to ignore the special
bit-packing semantics of __attribute__((packed)).
Further flesh out an already-extensive comment.
llvm-svn: 201282
This test case doesn't belong in Clang (it's testing IndVarSimplify) but
in an effort to reproduce the test case this was intended to cover (by
essentially reverting r134441) I wasn't able to reproduce the failure
this test case should've produced. So I haven't ported this down to
LLVM, instead I'm just deleting it.
I suspect the test is just underconstrained, but I've no great interest
in trying hard to fix it right now - if anyone else wants to, I'd be
more than welcome to that.
llvm-svn: 201178
Xcore target ABI requires const data that is externally visible
to be handled differently if it has C-language linkage rather than
C++ language linkage.
llvm-svn: 201142
According to the AAPCS, we can split structs between GPRs and the stack,
except for when an argument has already been allocated on the stack. This
can occur when a large number of floating-point arguments fill up the VFP
registers, and are alllocated on the stack before the general-purpose argument
registers are full.
llvm-svn: 201137
This option has the following effects:
* It adds the sspstrong IR attribute to each function within the CU.
* It defines the macro __SSP_STRONG__ with the value of 2.
Differential Revision: http://llvm-reviews.chandlerc.com/D2717
llvm-svn: 201120
An HFA is defined as a struct containing floating point values of the
same machine type. In the 32-bit ABI, double and long double have the
same machine type, so a struct with a mixture of these types must be an
HFA (assuming it meets the other criteria).
llvm-svn: 200971
We collect a maximal function count among all functions in the pgo data file.
For functions that are hot, we set its InlineHint attribute. For functions that
are cold, we set its Cold attribute.
We currently treat functions with >= 30% of the maximal function count as hot
and functions with <= 1% of the maximal function count are treated as cold.
These two numbers are from preliminary tuning on SPEC.
This commit should not affect non-PGO builds and should boost performance on
instrumentation based PGO.
llvm-svn: 200874
Arguments and return values must always be marshalled as for the base
AAPCS when the callee is a variadic function.
Patch by Oliver Stannard!
llvm-svn: 200307
Due to statement expressions supported as GCC extension, it is possible
to put 'break' or 'continue' into a loop/switch statement but outside
its body, for example:
for ( ; ({ if (first) { first = 0; continue; } 0; }); )
This code is rejected by GCC if compiled in C mode but is accepted in C++
code. GCC bug 44715 tracks this discrepancy. Clang used code generation
that differs from GCC in both modes: only statement of the third
expression of 'for' behaves as if it was inside loop body.
This change makes code generation more close to GCC, considering 'break'
or 'continue' statement in condition and increment expressions of a
loop as it was inside the loop body. It also adds error for the cases
when 'break'/'continue' appear outside loop due to this syntax. If
code generation differ from GCC, warning is issued.
Differential Revision: http://llvm-reviews.chandlerc.com/D2518
llvm-svn: 199897
PNaCl and Emscripten can both handle va_arg IR instructions with
struct type.
Also add a test to cover generating a va_arg IR instruction from
va_arg in C on le32 (as already handled by VisitVAArgExpr() in
CGExprScalar.cpp), which was not covered by a test before.
(This fixes https://code.google.com/p/nativeclient/issues/detail?id=2381)
Differential Revision: http://llvm-reviews.chandlerc.com/D2539
llvm-svn: 199830
of the current compilation unit.
As a side effect this enables many more LTO uniquing opportunities.
This reapplies r199757 with a better testcase.
llvm-svn: 199760
Without them they can be merged with non unnamed_addr constants during LTO.
The resulting constant is not unnamed_addr and goes in a different section,
which causes ld64 to crash.
A testcase that would crash before:
* file1.mm:
void g(id notification) {
[notification valueForKey:@"name"];
}
* file2.cpp:
extern const char js_name_str[] = "name";
* file3.cpp
extern bool JS_GetProperty(const char *name);
extern const char js_name_str[];
bool js_ReportUncaughtException() { JS_GetProperty(js_name_str); }
run
clang file1.mm -o file1.o -c -w -emit-llvm
clang file2.cpp -o file2.o -c -w -emit-llvm
clang file3.cpp -o file3.o -c -w
ld -dylib -o XUL file1.o file2.o file3.o -undefined dynamic_lookup.
llvm-svn: 199688
marked as AlwaysInline or ForceInline.
This moves us to what gcc does with -fno-inline. The attribute approach
was discussed to be better than switching to InlineAlways inliner in presence
of LTO.
llvm-svn: 199324
a subprocess invocation which is pretty significant on Windows. It also
likely saves a bunch of thrashing the host machine needlessly. Finally
it makes the tests much more predictable and less dependent on the host.
For example 'header_lookup1.c' was passing '-fno-ms-extensions' just to
thwart the host detection adding it into the compilation. By runnig CC1
directly we don't have to deal with such oddities.
llvm-svn: 199308
This makes the C++ ABI depend entirely on the target: MS ABI for -win32 triples,
Itanium otherwise. It's no longer possible to do weird combinations.
To be able to run a test with a specific ABI without constraining it to a
specific triple, new substitutions are added to lit: %itanium_abi_triple and
%ms_abi_triple can be used to get the current target triple adjusted to the
desired ABI. For example, if the test suite is running with the i686-pc-win32
target, %itanium_abi_triple will expand to i686-pc-mingw32.
Differential Revision: http://llvm-reviews.chandlerc.com/D2545
llvm-svn: 199250
These functions have the same constness properties of the normal libm
functions, which allows LLVM to optimise code better in general. There
are also a couple of specific optimisations that only trigger when
these are properly marked.
rdar://problem/13729466
llvm-svn: 199249
With the old linkage types removed, set the linkage to external for both
dllimport and dllexport to reflect what's currently supported.
llvm-svn: 199220
In preparation for making the Win32 triple imply MS ABI mode,
make all tests pass in this mode, or make them use the Itanium
mode explicitly.
Differential Revision: http://llvm-reviews.chandlerc.com/D2401
llvm-svn: 199130