Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.
Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.
I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.
As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.
This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.
Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.
Reviewers: zvi, DavidKreitzer, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39583
llvm-svn: 317413
Summary:
Posix core files sometime don't contain enough information to correctly
detect the OS. If that is the case we should use the OS from the target
instead as it will contain usable information in more cases and if the
target and the core contain different OS-es then we are already in a
pretty bad state so moving from an unknown OS to a known (but possibly
incorrect) OS will do no harm.
We already had similar code in place for MIPS. This change tries to make
it more generic by using ArchSpec::MergeFrom and extends it to all
architectures but some MIPS specific issue prevent us from getting rid
of special casing MIPS.
Reviewers: clayborg, nitesh.jain
Subscribers: aemerson, sdardis, arichardson, kristof.beyls, lldb-commits
Differential Revision: https://reviews.llvm.org/D36046
llvm-svn: 317411
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39616
llvm-svn: 317409
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
Stop using SectionKey for creating output sections.
Initially SectionKey was designed because we merged section with
use of Flags and Alignment fields. Currently LLD merges them by name only,
except the case when -relocatable output is produced. In that case
we still merge sections only with the same flags and alignment.
There is probably no issue at all to stop using Flags and Alignment for -r and
just disable the merging in that case.
After doing that change we can get rid of using SectionKey. That is not only
simplifies the code, but also gives some perfomance boost.
I tried to link chrome and mozilla, results are next:
* chrome link time goes from 1,666750355s to 1,551585364s, that is about 7%.
* mozilla time changes from 3,210261947 to 3,153782940, or about 2%.
Differential revision: https://reviews.llvm.org/D39594
llvm-svn: 317406
Currently LLD tries to use information about functions and variables location
taking it from debug sections. When --strip-* is given we discard such sections
and that breaks error reporting.
Patch stops discarding such sections and just removes them from InputSections list.
Differential revision: https://reviews.llvm.org/D39550
llvm-svn: 317405
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.
As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.
llvm-svn: 317403
GNU frontends don't have options like /MT, /MD
This fixes a few link error regressions with libc++ and libc++abi
Reviewers: rnk, mstorsjo, compnerd
Differential Revision: https://reviews.llvm.org/D33620
llvm-svn: 317398
This is a re-apply of rL313082 which was reverted in rL313088
In rL289668 the ability to specify the default linker at compile time
was added but because the MinGW driver used custom detection we could
not take advantage of this new CMAKE flag CLANG_DEFAULT_LINKER.
rL289668 added no test cases and the mingw driver was either overlooked
or purposefully skipped because it has some custom linker tests
Removing them here because they are covered by the generic case.
Reviewers: rnk
Differntial Revision: https://reviews.llvm.org/D37727
llvm-svn: 317397
Add a mix of postive and negative tests to check that wrong Decls won't be
flagged in the diagnostic. Split the check everything test and moved the
pieces closer to where the related tests are.
llvm-svn: 317394
Summary: The PARENT_TARGET was correctly set under APPLE but not under linux.
Reviewers: kubamracek, samsonov
Subscribers: dberris, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D39621
llvm-svn: 317391
Summary:
Call NanoTime() in primary 64 bit allocator only when necessary,
otherwise the unwarranted syscall causes problems in sandbox environments.
ReleaseToOSIntervalMs() conditional allows them to turn the feature off
with allocator_release_to_os_interval_ms=-1 flag.
Reviewers: eugenis
Subscribers: kubamracek, llvm-commits
Differential Revision: https://reviews.llvm.org/D39624
llvm-svn: 317386
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
This preserves the debug info for the cast operation in the original location.
rdar://problem/33460652
Reapplied r317340 with the test moved into an ARM-specific directory.
llvm-svn: 317375
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374
Now that we have only SymbolBody as the symbol class. So, "SymbolBody"
is a bit strange name now. This is a mechanical change generated by
perl -i -pe s/SymbolBody/Symbol/g $(git grep -l SymbolBody lld/ELF lld/COFF)
nd clang-format-diff.
Differential Revision: https://reviews.llvm.org/D39459
llvm-svn: 317370
Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold.
The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold.
Differential Revision: https://reviews.llvm.org/D39570
llvm-svn: 317368
This class was split between libIR and libSupport, which breaks under
modular code generation. Move it into the one library that uses it,
ProfileData, to resolve this issue.
llvm-svn: 317366
Adds blacklist parsing behaviour for filtering results into four categories:
- Expected Protected: Things that are not in the blacklist and are protected.
- Unexpected Protected: Things that are in the blacklist and are protected.
- Expected Unprotected: Things that are in the blacklist and are unprotected.
- Unexpected Unprotected: Things that are not in the blacklist and are unprotected.
now can optionally be invoked with a second command line argument, which specifies the blacklist file that the binary was built with.
Current statistics for chromium:
Reviewers: vlad.tsyrklevich
Subscribers: mgorny, llvm-commits, pcc, kcc
Differential Revision: https://reviews.llvm.org/D39525
llvm-svn: 317364
Summary:
Stop using the Linux solution with pthread_key_create(3).
This approach does not work on NetBSD, because calling
the thread destructor is not the latest operation on a POSIX
thread entity. NetBSD's libpthread still calls at least
pthread_mutex_lock and pthread_mutex_unlock.
Detect _lwp_exit(2) call as it is really the latest operation
called from a detaching POSIX thread.
This resolves one set of crashes observed in
the Thread Sanitizer execution.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, kcc, vitalybuka, dvyukov, eugenis
Reviewed By: vitalybuka
Subscribers: llvm-commits, kubamracek, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D39618
llvm-svn: 317363
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
DenseMaps require the definition of a type to be available when using a
pointer to that type as a key to know how many bits are available for
tombstone/etc.
llvm-svn: 317360