Commit Graph

393019 Commits

Author SHA1 Message Date
Philip Reames 9ffa90d6c2 [LV] Disable epilogue vectorization for non-latch exits
When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed.  Better late (4 months), than never right?

I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error.  Thus, let's play it safe for now.
2021-07-06 10:57:10 -07:00
Philip Reames 600624a103 [LoopVersion] Move an assert [nfc-ish] 2021-07-06 10:57:10 -07:00
Eli Friedman 74d6ce5d5f [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Jonas Paulsson 458eac2573 [SystemZ] Support the 'N' code for the odd register in inline-asm.
The odd register of a (128 bit) register pair is accessed with the 'N' code
with an inline assembly operand.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D105502
2021-07-06 19:46:49 +02:00
Jeremy Morse 2b2ffb7bdc [DebugInfo][InstrRef][3/4] Produce DBG_INSTR_REFs for all variable locations
This patch emits DBG_INSTR_REFs for two remaining flavours of variable
locations that weren't supported: copies, and inter-block VRegs. There are
still some locations that must be represented by DBG_VALUE such as
constants, but they're mostly independent of optimisations.

For variable locations that refer to values defined in different blocks,
vregs are allocated before isel begins, but the defining instruction
might not exist until late in isel. To get around this, emit
DBG_INSTR_REFs in a "half done" state, where the first operand refers to a
VReg. Then at the end of isel, patch these back up to refer to
instructions, using the finalizeDebugInstrRefs method.

Copies are something that I complained about the original RFC, and I
really don't want to have to put instruction numbers on copies. They don't
define a value: they move them. To address this isel, salvageCopySSA
interprets:
 * COPYs,
 * SUBREG_TO_REG,
 * Anything that isCopyInstr thinks is a copy.
And follows chains of copies back to the defining instruction that they
read from. This relies on any physical registers that COPYs read being
defined in the same block, or being entry-block arguments. For the former
we can put an instruction number on the defining instruction; for the
latter we can drop a DBG_PHI that reads the incoming value.

Differential Revision: https://reviews.llvm.org/D88896
2021-07-06 18:31:38 +01:00
Craig Topper 2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Arnold Schwaighofer 846a530e7d Fix coro lowering of single predecessor phis
Code assumes that uses of single predecessor phis are not live accross
suspend points. Cleanup any single predecessor phis preceeding the code
making this assumption.

rdar://76020301

Differential Revision: https://reviews.llvm.org/D105488
2021-07-06 10:22:25 -07:00
Simon Pilgrim b298308ba2 [CostModel][X86] fptosi/fptoui to i8/i16 are truncated from fptosi to i32
Provide a generic fallback that performs the fptosi to i32 types, then truncates to sub-i32 scalars.

These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.
2021-07-06 17:28:03 +01:00
ShihPo Hung f1cbea3e52 [RISCV] Remove Zvamo implication for v1.0-rc change
As v1.0-rc specs say Zvamo is removed from standard extension,
Zvamo has to be specified explicitly.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D105396
2021-07-07 00:14:58 +08:00
Jonas Paulsson 37a92f3b03 [SystemZ] Generate XC loop for memset 0 of variable length.
Benchmarking has shown that it is worthwhile to implement a variable length
memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall.

This requires the use of the EXecute Relative Long (EXRL) instruction which
can now be done in a framework that can also be used with other target
instructions (not just XC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D103865
2021-07-06 18:07:31 +02:00
Jon Chesterfield ddfb074a80 [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global
[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global

Folds some duplicates logic into a helper function, passes the new environment
struct into getLaunchVals which no longer reads the DeviceInfo global.

Implemented on top of D105237

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D105239
2021-07-06 17:06:38 +01:00
Alexey Bataev 4e1a0684f1 [SLP]Fix non-determinism in PHI sorting.
Compare type IDs and DFS numbering for basic block instead of addresses
to fix non-determinism.

Differential Revision: https://reviews.llvm.org/D105031
2021-07-06 08:45:45 -07:00
Arnold Schwaighofer 130ea3ceb4 Use swift mangling for resume functions
The resume partial functions generated for swift suspend points will now
use a Swift mangling suffix.

Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_")
and suspend resume partial functions will use the suffix 'TY'[0-9]+'_'
(e.g "...TY1_").

Reviewed By: nate_chandler

Differential Revision: https://reviews.llvm.org/D104144
2021-07-06 08:27:46 -07:00
Nico Weber 3eb2fc4b50 [lld/mac] Partially implement -export_dynamic
This implements the part of -export_dynamic that adds external
symbols as dead strip roots even for executables.

It does not yet implement the effect -export_dynamic has for LTO.
I tried just replacing `config->outputType != MH_EXECUTE` with
`(config->outputType != MH_EXECUTE || config->exportDynamic)` in
LTO.cpp, but then local symbols make it into the symbol table too,
which is too much (and also doesn't match ld64). So punt on this
for now until I understand it better.
(D91583 may or may not be related too).

Differential Revision: https://reviews.llvm.org/D105482
2021-07-06 11:22:18 -04:00
Bradley Smith 5ab9000fbb [AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru
Differential Revision: https://reviews.llvm.org/D105348
2021-07-06 14:17:23 +00:00
Louis Dionne 5ffa051447 [libc++] NFC: Remove outdated link to TS status 2021-07-06 10:04:54 -04:00
Louis Dionne cf005c4c50 [libc++] NFC: Move the status docs to their own subdirectory
This cleans up the libcxx/doc directory quite a bit and will avoid the
proliferation of status files in that directory as new standards are voted.
2021-07-06 09:48:34 -04:00
Florian Hahn ef0d147cdc
Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups.
This reverts commit 706bbfb35b.

The committed version moves the definition of VPReductionPHIRecipe out
of an ifdef only intended for ::print helpers. This should resolve the
build failures that caused the revert
2021-07-06 14:15:42 +01:00
Simon Pilgrim 6f3f9535fc [CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp
Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions.

These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.

We get the extension for free for non-vector loads.
2021-07-06 13:58:52 +01:00
Nico Weber f814cd7406 Revert "[profile][test] Improve coverage-linkage.cpp"
This reverts commit 36ba86fe8a.
Fails on some bots, see comments on
https://reviews.llvm.org/rG36ba86fe8a29cdf3251b786db7f342efde666cb2
2021-07-06 08:49:43 -04:00
Louis Dionne f7d8754312 [runtimes] Move enable_32bit to the DSL
This is necessary for from-scratch configurations to support the 32-bit
mode of the test suite.

Differential Revision: https://reviews.llvm.org/D105435
2021-07-06 08:42:07 -04:00
Kerry McLaughlin a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Peter Waller c5dfee44b9 [CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory
This avoids the use of the vector unit for copying from scalar to
vector. There is an extra ptrue instruction, but a predicate register
with the ptrue pattern populated is likely to be free in the context of
real code.

Tests were generated from a template to cover the axes mentioned at the
top of the test file.

Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D103170
2021-07-06 12:03:54 +00:00
Florian Mayer 745758acf3 [hwasan] Fix incorrect candidate matching for stack OOB.
We would find an address with matching tag, only to discover in
ShowCandidate that it's very far away from [stack].

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105197
2021-07-06 12:24:07 +01:00
Florian Hahn 706bbfb35b
Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups
This reverts commit 3fed6d443f,
bbcbf21ae6 and
6c3451cd76.

The changes causing build failures with certain configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio

    lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction*, llvm::ArrayRef<llvm::VPValue*>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]':
    LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe'
    collect2: error: ld returned 1 exit status
2021-07-06 12:10:03 +01:00
Florian Hahn 3fed6d443f
[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual.
VPReductionRecipe overrides those implementations. Mark them as virtual
in the VPWidenPHIRecipe to unbreak build in certain configurations.
2021-07-06 12:00:41 +01:00
Florian Hahn bbcbf21ae6
[VPlan] Add destructor to VPReductionRecipe to unbreak build.
Attempt to unbreak
https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio
2021-07-06 11:41:20 +01:00
Jay Foad c9d747e9cd [AMDGPU] Remove outdated comment and tidy up. NFC.
This was left over from D94746.
2021-07-06 11:29:36 +01:00
Florian Hahn 6c3451cd76
[VPlan] Add VPReductionPHIRecipe (NFC).
This patch is a first step towards splitting up VPWidenPHIRecipe into
separate recipes for the 3 distinct cases they model:

    1. reduction phis,
    2. first-order recurrence phis,
    3. pointer induction phis.

This allows untangling the code generation and allows us to reduce the
reliance on LoopVectorizationCostModel during VPlan code generation.

Discussed/suggested in D100102, D100113, D104197.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104989
2021-07-06 11:25:28 +01:00
Florian Mayer a0b1f3aac5 [hwasan] Check for overflow when searching candidates.
If the fault address is at the boundary of memory regions, this could
cause us to segfault otherwise.

Ran test with old compiler_rt to make sure it fails.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105032
2021-07-06 11:22:13 +01:00
Sebastian Neubauer db646de3ee [AMDGPU] Set optional PAL metadata
Set informational fields in the .shader_functions table.

Also correct the documentation, .scratch_memory_size and .lds_size are
integers.

Differential Revision: https://reviews.llvm.org/D105116
2021-07-06 11:58:00 +02:00
Uday Bondhugula 0c29f45ac9 [MLIR] Fix dialect conversion cancelRootUpdate
Fix dialect conversion ConversionPatternRewriter::cancelRootUpdate: the
erasure of operations here from the list of root update was off by one.
Should have been:
```
rootUpdates.erase(rootUpdates.begin() + (rootUpdates.rend() - it - 1));
```
instead of
```
rootUpdates.erase(rootUpdates.begin() + (rootUpdates.rend() - it));
```

or more directly:
```
rootUpdates.erase(it.base() - 1)
```

While on this, add an assertion to improve dev experience when a cancel is
called on an op on which a root update hasn't been started.

Differential Revision: https://reviews.llvm.org/D105397
2021-07-06 15:19:49 +05:30
Kerry McLaughlin 17b701c43c [LV] Collect a list of all element types found in the loop (NFC)
Splits `getSmallestAndWidestTypes` into two functions, one of which now collects
a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not
have to iterate over all instructions in the loop again in other places, such as in D102253
which disables scalable vectorization of a loop if any of the instructions use invalid types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105437
2021-07-06 10:37:41 +01:00
patacca f482497c38 [Polly][Isl] Use isl::set::tuple_dim, isl::map::domain_tuple_dim and isl::map::range_tuple_dim. NFC
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.

Changes made:
 - Use `isl::set::tuple_dim` instead of `isl::set::dim` and `isl::set::n_dim`
 - Use `isl::map::domain_tuple_dim` instead of `isl::map::dim`
 - Use `isl::map::range_tuple_dim` instead of `isl::map::dim`
 - isl-noexceptions.h has been generated by this 45576e1b42

Note that not all the usage of `isl::{set,map}::dim` where replaced

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D104994
2021-07-06 11:20:45 +02:00
Adrian Kuegel cbb09c5b2c Revert "[clang] fixes named return of variables with dependent alignment"
This reverts commit 21106388eb.
It causes a segfault in certain cases.
2021-07-06 10:31:39 +02:00
Raphael Isemann 51ab17b91d [lldb][docs] Fix reference warnings in python-reference.rst
References with a single '`' around them are interpreted as references instead
of text with monospaced font since the introduction of the new Python API
generator. This meant that all the single-quoted code in this document that
doesn't reference any Python class was throwing sphinx errors. This just adds
the neede extra ` around this code and fixed up the legitimate typos
(e.g. `SBframe` -> `SBFrame`).
2021-07-06 10:14:33 +02:00
Valeriy Savchenko 6017cb31bb [analyzer][solver] Use all sources of constraints
Prior to this patch, we always gave priority to constraints that we
actually know about symbols in question.  However, these can get
outdated and we can get better results if we look at all possible
sources of knowledge, including sub-expressions.

Differential Revision: https://reviews.llvm.org/D105436
2021-07-06 11:09:08 +03:00
Nico Weber 64be5b7d87 [lld/mac] Implement -arch_multiple
This is the other flag clang passes when calling clang with two -arch
flags (which means with this, `clang -arch x86_64 -arch arm64 -fuse-ld=lld ...`
now no longer prints any warnings \o/). Since clang calls the linker several
times in that setup, it's not clear to the user from which invocation the
errors are. The flag's help text is

    Specifies that the linker should augment error and warning messages
    with the architecture name.

In ld64, the only effect of the flag is that undefined symbols are prefaced
with

    Undefined symbols for architecture x86_64:

instead of the usual "Undefined symbols:". So for now, let's add this
only to undefined symbol errors too. That's probably the most common
linker diagnostic.

Another idea would be to prefix errors and warnings with "ld64.lld(x86_64):"
instead of the usual "ld64.lld:", but I'm not sure if people would
misunderstand that as a comment about the arch of ld itself.
But open to suggestions on what effect this flag should have :) And we
don't have to get it perfect now, we can iterate on it.

Differential Revision: https://reviews.llvm.org/D105450
2021-07-06 00:25:18 -04:00
Albion Fung 203b48c71a [PowerPC] Removed a test case meant for a later patch
A test case meant for a later patch was accidentally
included.

Original Patch: https://reviews.llvm.org/D105236

Differential revision: https://reviews.llvm.org/D105454
2021-07-05 22:31:17 -05:00
Albion Fung 7d10dd60ce [PowerPC] Implament Load and Reserve and Store Conditional Builtins
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.

Differential revision: https://reviews.llvm.org/D105236
2021-07-05 21:35:41 -05:00
Matheus Izvekov 21106388eb [clang] fixes named return of variables with dependent alignment
Named return of a variable with aligned attribute would
trip an assert in case alignment was dependent.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D105380
2021-07-06 02:30:44 +02:00
Nico Weber 2c25f39fcc [lld/mac] Implement -final_output
This is one of two flags clang passes to the linker when giving calling
clang with multiple -arch flags.

I think it'd make sense to also use finalOutput instead of outputFile
in CodeSignatureSection() and when replacing @executable_path, but
ld64 doesn't do that, so I'll at least put those in separate commits.

Differential Revision: https://reviews.llvm.org/D105449
2021-07-05 20:06:26 -04:00
Nico Weber db64306d99 [lld/mac] Implement -umbrella
I think this is an old way for doing what is done with
-reexport_library these days, but it's e.g. still used in libunwind's
build (the opensource.apple.com one, not the llvm one).

Differential Revision: https://reviews.llvm.org/D105448
2021-07-05 20:06:25 -04:00
Jez Ng 718c32175b [lld-macho] Only emit one BIND_OPCODE_SET_SYMBOL per symbol
Size-wise, BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM is the most
expensive opcode, since it comes with an associated symbol string. We
were previously emitting it once per binding, instead of once per
symbol. This diff groups all bindings for a given symbol together and
ensures we only emit one such opcode per symbol. This matches ld64's
behavior.

While this is a relatively small win on chromium_framework (-72KiB), for
programs that have more dynamic bindings, the difference can be quite
large.

This change is perf-neutral when linking chromium_framework.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D105075
2021-07-05 20:00:19 -04:00
Steven Wan e2904c8e0f [clang] unbreak Index/preamble-reparse-changed-module.m with LLVM_APPEND_VC_REV=NO after 9964b0e
See revision b8b7a9d for prior art.
2021-07-05 19:51:00 -04:00
Christopher Di Bella 873e8b96b1 [compiler-rt][iwyu] explicitly includes `<new>` in xray_utils.cpp
Compiling compiler-rt with Clang modules and libc++ revealed that the
global `operator new` is being called without including `<new>`.

Differential Revision: https://reviews.llvm.org/D105401
2021-07-05 21:46:42 +00:00
David Tenty 9964b0ef82 [clang] Add -fdump-record-layouts-canonical option
This option implies -fdump-record-layouts but dumps record layout information with canonical field types, which can be more useful in certain cases when comparing structure layouts.

Reviewed By: stevewan

Differential Revision: https://reviews.llvm.org/D105112
2021-07-05 17:35:37 -04:00
Benjamin Kramer 775cac4cca [Bazel] Fix build for 98f078324f 2021-07-05 22:40:18 +02:00
Benjamin Kramer b3f5d0639e [Bazel] Fix build for 35d4593e6b 2021-07-05 22:40:18 +02:00
Jake Egan 52f34673ea [AIX] Add _AIX73 version macro
This patch defines _AIX73 version macro for AIX 7.3.

It extends the following patch https://reviews.llvm.org/D61530.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D105185
2021-07-05 16:28:48 -04:00