Commit Graph

226382 Commits

Author SHA1 Message Date
Hal Finkel 2e0ff2b244 [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

llvm-svn: 264904
2016-03-30 19:37:08 +00:00
Adhemerval Zanella 69b29b2b7c [lld] [ELF/AArch64] Add aarch64 TLS IE to LE relax for local symbol test
This patch add a TLS relax optimization test when transforming
Initial-Exec to Local-Exec for local symbols (which can not be preempted).

llvm-svn: 264903
2016-03-30 19:12:18 +00:00
Rong Xu b534166fd4 [PGO] PGOFuncName in LTO optimizations
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.

Also add a new map that maps PGOFuncName's MD5 value to the function definition.

Differential Revision: http://reviews.llvm.org/D17895

llvm-svn: 264902
2016-03-30 18:37:52 +00:00
Reid Kleckner 747dc2eb61 [cmake] Get the MSVC version by running cl rather than relying on MSVC_VERSION
MSVC_VERSION comes from the _MSC_VER macro, which won't correspond to
the STL version if the host compiler is clang-cl.

llvm-svn: 264901
2016-03-30 18:31:14 +00:00
Reid Kleckner 88ad225e94 [cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version
Credit to Aaron Ballman for thinking of this.

llvm-svn: 264886
2016-03-30 18:19:39 +00:00
Tobias Grosser 6deba4ea03 Revert 264782 and 264789
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:

FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time

The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.

llvm-svn: 264885
2016-03-30 18:18:31 +00:00
Teresa Johnson 83c517c44e Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This restores commit 264869, with a fix for windows bots to properly
escape '\' in the path when serializing out. Added test.

llvm-svn: 264884
2016-03-30 18:15:08 +00:00
Jim Ingham a7c5e1922d Fix header name.
llvm-svn: 264883
2016-03-30 18:14:36 +00:00
Chad Rosier f7ac5f28ab [AArch64] Fix warnings pointed out by Hal.
llvm-svn: 264882
2016-03-30 18:08:51 +00:00
Reid Kleckner 2b3db2c1bb [cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t
What we are really trying to do here is to figure out if we are using
the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
define a version macro that we can check directly. Instead I wrote a
check to see if char16_t works.

llvm-svn: 264881
2016-03-30 17:30:26 +00:00
Reid Kleckner 8c18019d50 [cmake] Allow EH usage with clang-cl
llvm-svn: 264880
2016-03-30 17:28:21 +00:00
Rong Xu 311ada11f8 [PGO] Use ArrayRef in annotateValueSite()
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.

Differential Revision: http://reviews.llvm.org/D18568

llvm-svn: 264879
2016-03-30 16:56:31 +00:00
Rui Ueyama 38dc83417b Include line number in error message for linker scripts.
This patch is based on http://reviews.llvm.org/D18545 written
by George Rimar.

llvm-svn: 264878
2016-03-30 16:51:57 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Jonas Paulsson f76123386a [SystemZ] Add nop and nopr InstAliases.
For compatability with GAS, nop and nopr are recognized as alises for
bc and bcr, respectively. A mask of 0 turns these instructions
effectively into no-operations.

Reviewed by Ulrich Weigand.

llvm-svn: 264875
2016-03-30 16:11:58 +00:00
Vedant Kumar b64d86ff8e [c-index-test] Delete dead function, NFC
llvm-svn: 264874
2016-03-30 16:03:02 +00:00
Jonas Paulsson 3ace74a414 [SystemZ] Specify required features for builtins.
BuiltinsSystemZ.def is extended to include the required processor
features per intrinsic.

New test test/CodeGen/builtins-systemz-error2.c that checks for
expected errors when instrinsics are used with a subtarget that does
not support the required feature (e.g. vector support).

Reviewed by Ulrich Weigand.

llvm-svn: 264873
2016-03-30 15:51:24 +00:00
Nirav Dave 8dd66e5753 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Teresa Johnson 20beeea24a Revert "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This reverts commit r264869. I am seeing Windows bot failures due to the
"\" in the path being mishandled at some point (seems to be interpreted
wrongly at some point and llvm-as | llvm-dis is yielding some junk
characters). Need to investigate.

llvm-svn: 264871
2016-03-30 15:16:04 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Teresa Johnson 832a6790f6 [ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly
Summary:
This change serializes out and in the SourceFileName to LLVM assembly
so that it is preserved through "llvm-dis | llvm-as". This is
necessary to ensure that the global identifiers created for local values
in the module summary index are the same even if the bitcode is
streamed out and read back from LLVM assembly.

Serializing the summary itself to LLVM assembly is in progress.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18588

llvm-svn: 264869
2016-03-30 14:00:02 +00:00
Teresa Johnson 0c7bb96533 Prepare tests for change to emit Module SourceFileName to LLVM assembly
Modify these tests to ignore the source file name when looking for the
expected string. It was already catching the source file name once via
the ModuleID, and will catch it another time with an impending change to
LLVM to serialize out the module's SourceFileName.

llvm-svn: 264868
2016-03-30 13:59:49 +00:00
Simon Pilgrim 9490b56a89 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

llvm-svn: 264867
2016-03-30 13:55:00 +00:00
Rafael Espindola 287e100db2 No relocation needs bot SA and ZA.
Pass only one of them to relocateOne.

llvm-svn: 264866
2016-03-30 13:27:50 +00:00
Rafael Espindola 8cc68c313b Implement getImplicitAddend for mips.
llvm-svn: 264865
2016-03-30 13:18:08 +00:00
Rafael Espindola abc9a12929 Simplify mips addend processing.
It is now added to the addend in the same way as a regular Elf_Rel
addend.

llvm-svn: 264864
2016-03-30 12:45:58 +00:00
Rafael Espindola da99df366d Fix handling of addends on i386.
Because of merge sections it is not sufficient to just add them while
applying a relocation.

llvm-svn: 264863
2016-03-30 12:40:38 +00:00
Alexander Kornienko b014596047 [clang-tidy] Fix MSVC build.
llvm-svn: 264862
2016-03-30 12:35:05 +00:00
Benjamin Kramer 9415e06da7 [NVPTX] Avoid temporary std::string and make single-use function local to the cpp file.
No functionality change intended.

llvm-svn: 264861
2016-03-30 12:31:51 +00:00
Marianne Mailhot-Sarrasin a5a750eaf1 gold-plugin: Fixed typo in an error message.
llvm-svn: 264860
2016-03-30 12:20:53 +00:00
Gabor Horvath 349c828bea [clang-tidy] Adjust dangling references check to ASTMatcher changes.
llvm-svn: 264859
2016-03-30 12:16:09 +00:00
Alexander Kornienko dbe0a1fd92 [docs] Added 3.8 clang-tidy release notes, fixed formatting.
llvm-svn: 264858
2016-03-30 12:05:33 +00:00
Simon Pilgrim ab305a9d4c [X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455

llvm-svn: 264857
2016-03-30 11:43:26 +00:00
Alexander Kornienko e3ae0c6f19 [clang-tidy] readability check for const params in declarations
Summary: Adds a clang-tidy warning for top-level consts in function declarations.

Reviewers: hokein, sbenza, alexfh

Subscribers: cfe-commits

Patch by Matt Kulukundis!

Differential Revision: http://reviews.llvm.org/D18408

llvm-svn: 264856
2016-03-30 11:31:33 +00:00
Gabor Horvath 1b654f2293 [ASTMatchers] Existing matcher hasAnyArgument fixed
Summary: A checker (will be uploaded after this patch) needs to check implicit casts. The checker needs matcher hasAnyArgument but it ignores implicit casts and parenthesized expressions which disables checking of implicit casts for arguments in the checker. However the documentation of the matcher contains a FIXME that this should be removed once separate matchers for ignoring implicit casts and parenthesized expressions are ready. Since these matchers were already there the fix could be executed. Only one Clang checker was affected which was also fixed (ignoreParenImpCasts added) and is separately uploaded. Third party checkers (not in the Clang repository) may be affected by this fix so the fix must be emphasized in the release notes.

Reviewers: klimek, sbenza, alexfh

Subscribers: alexfh, klimek, xazax.hun, cfe-commits

Differential Revision: http://reviews.llvm.org/D18243

llvm-svn: 264855
2016-03-30 11:22:14 +00:00
Kuba Brecka 058c302e0a Fix the ThreadSanitizer support to avoid creating empty SBThreads and to not crash when thread_id is unavailable. Plus a whitespace fix.
llvm-svn: 264854
2016-03-30 10:50:24 +00:00
Alexey Bataev 587e1de4ea [OPENMP 4.0] Initial support for '#pragma omp declare simd' directive.
Initial parsing/sema/serialization/deserialization support for '#pragma
omp declare simd' directive.
The 'declare simd' construct can be applied to a function to enable the
creation of one or more versions that can process multiple arguments
using SIMD instructions from a single invocation from a SIMD loop.
If the function has any declarations, then the declare simd construct
for any declaration that has one must be equivalent to the one specified
 for the definition. Otherwise, the result is unspecified.
This pragma can be applied many times to the same declaration.
Internally this pragma is represented as an attribute. But we need special processing for this pragma because it must be used before function declaration, this directive is applied to.
Differential Revision: http://reviews.llvm.org/D10599

llvm-svn: 264853
2016-03-30 10:43:55 +00:00
James Molloy 8e46cd05a1 [VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.

However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.

This fixes PR17018.

llvm-svn: 264852
2016-03-30 10:11:43 +00:00
Gabor Horvath b780c44eec [analyzer] Fix an assertion fail in hash generation.
In case the (uniqueing) location of the diagnostic is in a line that only
contains whitespaces there was an assertion fail during issue hash generation.

Unfortunately I am unable to reproduce this error with the built in checkers, 
so no there is no failing test case with this patch. It would be possible to
write a debug checker for that purpuse but it does not worth the effort.

Differential Revision: http://reviews.llvm.org/D18210

llvm-svn: 264851
2016-03-30 10:08:59 +00:00
Pavel Labath 021ccdb7cd Fix SocketAddressTest (again)
On some versions of Windows, the address is returned as "::1", while on others it's
"0:0:...:0:1". Accept both versions, as they represent the same address.

llvm-svn: 264850
2016-03-30 09:43:04 +00:00
Pavel Labath 1b46a72eb2 Fix warning in ThreadSanitizerRuntime
llvm-svn: 264849
2016-03-30 09:42:59 +00:00
Pavel Labath 6ce88f8d66 Fix warning in ClangExpressionParser
llvm-svn: 264847
2016-03-30 08:45:37 +00:00
Pavel Labath ec62c0559f Fix flakyness in TestWatchpointMultipleThreads
Summary:
the inferior in the test deliberately does not lock a mutex when accessing the watched variable.
The reason for that is unclear as, based on the logs, the original intention of the test was to
check whether watchpoints get propagated to newly created threads, which should work fine even
with a mutex. Furthermore, in the unlikely event (which I have still observed happening from time
to time) that two threads do manage the execute the "critical section" simultaneously, the test
will fail, as it is expecting the watchpoint "hit count" to be 1, but in this case it will be 2.

Given this, I have simply chose to lock the mutex always, so that we have more predictible
behavior. Watchpoints being hit simultaneously is still (and correctly!) tested by
TestConcurrentEvents.

Reviewers: clayborg, jingham

Subscribers: lldb-commits

Differential Revision: http://reviews.llvm.org/D18558

llvm-svn: 264846
2016-03-30 08:43:54 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Ismail Donmez 22921c9cf5 Fix shared build after r264790
llvm-svn: 264844
2016-03-30 08:31:46 +00:00
George Rimar f1c0bf5b40 [ELF] - Do not keep undefined locals in .symtab
gold and bfd do not include the undefined locals in symtab.
We have no reasons to support that either.

That fixes PR27016

Differential revision: http://reviews.llvm.org/D18554

llvm-svn: 264843
2016-03-30 08:16:11 +00:00
Stephan Bergmann 17d7d14571 For MS ABI, emit dllexport friend functions defined inline in class
...as that is apparently what MSVC does.  This is an updated version of r263738,
which had to be reverted in r263740 due to test failures.  The original version
had erroneously emitted functions that are defined in class templates, too (see
the updated "Handle friend functions" code in EmitDeferredDecls,
lib/CodeGen/ModuleBuilder.cpp).  (The updated tests needed to be split out into
their own dllexport-ms-friend.cpp because of the CHECK-NOTs which would have
interfered with subsequent CHECK-DAGs in dllexport.cpp.)

Differential Revision: http://reviews.llvm.org/D18430

llvm-svn: 264841
2016-03-30 06:27:31 +00:00
Craig Topper e9ff01b2a7 [CodeGen] Mark EVT:getExtendedSizeInBits() as LLVM_READONLY.
I think I had tried this a long time back and some bots failed. Hoping that was with an older gcc and maybe now it will work.

llvm-svn: 264840
2016-03-30 05:26:43 +00:00
Jingyue Wu f190ed4355 [docs] Add gpucc publication and tutorial.
llvm-svn: 264839
2016-03-30 05:05:40 +00:00