Commit Graph

2034 Commits

Author SHA1 Message Date
Tobias Grosser 75dfaa1dbe BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.

Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.

It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.

We also document the current behavior a little bit better.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28892

llvm-svn: 292486
2017-01-19 14:12:45 +00:00
Tobias Grosser 943c369c60 BlockGenerator: remove obfuscating const and const casts
Making certain values 'const' to just cast it away a little later mainly
obfuscates the code. Hence, we just drop the 'const' parts.

Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 292480
2017-01-19 13:25:52 +00:00
Tobias Grosser 97b8490982 Use range-based for loop [NFC]
llvm-svn: 292471
2017-01-19 05:09:23 +00:00
Tobias Grosser e1ff0cf2eb Relax assert when setting access functions with invariant base pointers
Summary:
Instead of forbidding such access functions completely, we verify that their
base pointer has been hoisted and only assert in case the base pointer was
not hoisted.

I was trying for a little while to get a test case that ensures the assert is
correctly fired in case of invariant load hoisting being disabled, but I could
not find a good way to do so, as llvm-lit immediately aborts if a command
yields a non-zero return value. As we do not generally test our asserts,
not having a test case here seems OK.

This resolves http://llvm.org/PR31494

Suggested-by: Michael Kruse <llvm@meinersbur.de>

Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28798

llvm-svn: 292213
2017-01-17 12:00:42 +00:00
Eli Friedman 71329901ea Tidy up getFirstNonBoxedLoopFor [NFC]
Move the function getFirstNonBoxedLoopFor which is used in ScopBuilder
and in ScopInfo to Support/ScopHelpers to make it reusable in other
locations. No functionality change.

Patch by Sameer Abu Asal.

Differential Revision: https://reviews.llvm.org/D28754

llvm-svn: 292168
2017-01-16 22:54:29 +00:00
Tobias Grosser 0032d87337 ScopInfo: document base pointers in alias-checks must be invariant [NFC]
Before this change, this code has been mixed with a check for non-affine
loops (and when originally introduce was also duplicated). By creating
a separate loop and explicitly documenting this property, the current
behavior becomes a lot more clear.

llvm-svn: 292140
2017-01-16 15:49:14 +00:00
Tobias Grosser f3c145f2ab ScopInfo: Improve comments in buildAliasGroup [NFC]
llvm-svn: 292139
2017-01-16 15:49:09 +00:00
Tobias Grosser 77f3257b41 ScopInfo: split out construction of a single alias group [NFC]
The loop body in buildAliasGroups is still too large to easily scan it. Hence,
we split the loop body out into a separate function to improve readability.

llvm-svn: 292138
2017-01-16 15:49:07 +00:00
Tobias Grosser e95222343c ScopInfo: Do not modify the original alias group [NFC]
Instead of modifying the original alias group and repurposing it as read-write
access group when splitting accesses in read-only and read-write accesses, we
just keep all three groups: the original alias group, the set of read-only
accesses and the set of read-write accesses.  This allows us to remove some
complicated iterator handling and also allows for more code-reuse in
calculateMinMaxAccess.

llvm-svn: 292137
2017-01-16 15:49:04 +00:00
Tobias Grosser 457eb579dd ScopInfo: No need to keep ReadOnlyAccesses in an additional map [NFC]
It seems over time we added an additional map that maps from the base address
of a read-only access to the actual access. However this map is never used.
Drop the creation and use of this map to simplify our alias check generation
code.

llvm-svn: 292126
2017-01-16 14:24:48 +00:00
Tobias Grosser dba2206b65 ScopInfo: no need to clear alias group explicitly
The alias group will anyhow be cleared at the end of this function and is not
used afterwards. We avoid an explicit clear() call at multiple places to
improve readability of this code.

llvm-svn: 292125
2017-01-16 14:13:01 +00:00
Tobias Grosser 21a059af09 Adjust formatting to commit r292110 [NFC]
llvm-svn: 292123
2017-01-16 14:08:10 +00:00
Tobias Grosser 92fd612c84 ScopInfo: Fold SmallVectors used in alias check generation back into loop [NFC]
Hoisting small vectors out of a loop seems to be a pure performance
optimization, which is unlikely to have great impact in practice. As this
hoisting just increases code-complexity, we fold the SmallVectors back into
the loop.

In subsequent commits, we will further simplify and structure this code, but
we committed this change separately to provide an explanation to make clear
that we purposefully reverted this optimization.

llvm-svn: 292122
2017-01-16 14:08:02 +00:00
Tobias Grosser e39f9127f9 ScopInfo: Extract out splitAliasGroupsByDomain [NFC]
The function buildAliasGroups got very large. We extract out the splitting
of alias groups to reduce its size and to better document the current behavior.

llvm-svn: 292121
2017-01-16 14:08:00 +00:00
Tobias Grosser 9edcf07e83 ScopInfo: Extract out buildAliasGroupsForAccesses [NFC]
The function buildAliasGroups got very large. We extract out the actual
construction of alias groups to reduce its size and to better document the
current behavior.

llvm-svn: 292120
2017-01-16 14:07:57 +00:00
Tobias Grosser 2ef3887d28 Do not track the isl PDF manual in SVN
There is no point in regularly committing a binary file to the repository, as
this just unnecessarily increases the repository size. Interested people can
find the isl manual for example at isl.gforge.inria.fr/manual.pdf.

llvm-svn: 292105
2017-01-16 11:48:03 +00:00
Hongbin Zheng 6aded2a0e4 Fix compilation on MSVC, NFC
Differential Revision: https://reviews.llvm.org/D28739

llvm-svn: 292067
2017-01-15 16:47:26 +00:00
Tobias Grosser 4d5a917287 Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo
To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 292030
2017-01-14 20:25:44 +00:00
Tobias Grosser 67e94fb435 ScheduleOptimizer: Allow to set register width in command line
We use this option to set a fixed register width in our test cases to make
sure the results are identical accross platforms.

llvm-svn: 292002
2017-01-14 07:14:54 +00:00
Tobias Grosser e29db2173b Update to recent clang-format changes
llvm-svn: 291810
2017-01-12 21:05:19 +00:00
Eli Friedman c6e3b6f156 Delete stray isl_map_dump call.
llvm-svn: 291521
2017-01-10 01:08:11 +00:00
Tobias Grosser cdbe5c9d6c Fix some typos in comments
llvm-svn: 291247
2017-01-06 17:30:34 +00:00
Tobias Grosser 94e5371dde Update to isl-0.18-43-g0b4256f
Even more isl coalesce changes.

llvm-svn: 290783
2016-12-31 07:46:11 +00:00
Tobias Grosser ba3ea97689 Update to isl-0.18-28-gccb9f33
Another set of isl coalesce changes.

llvm-svn: 290681
2016-12-28 19:35:49 +00:00
Tobias Grosser 600941351e Update to isl-0.18-17-g2844ebf
This update improves isl's ability to coalesce different convex sets/maps,
especially when the contain existentially quantified variables.

llvm-svn: 290538
2016-12-26 12:11:40 +00:00
Roman Gareev 1c2927b209 Specify the default values of the cache parameters
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 290518
2016-12-25 16:32:28 +00:00
Tobias Grosser 0791d5f5aa ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma'
througput -> throughput

llvm-svn: 290418
2016-12-23 07:33:39 +00:00
Tobias Grosser ccae1ee4df Update isl to isl-0.18-9-gd4734f3
llvm-svn: 290389
2016-12-22 23:08:57 +00:00
Roman Gareev be5299af0b Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28019

llvm-svn: 290256
2016-12-21 12:51:12 +00:00
Roman Gareev bd5c6039c6 Align newly created arrays to the first level cache line boundary
Aligning data to cache lines boundaries helps to avoid overheads related to
an access to it ([1]). This patch aligns newly created arrays and adds an
option to specify the first level cache line size. By default we use 64 bytes,
which is a typical cache-line size ([2]).

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).

Refs.:

[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/

Differential Revision: https://reviews.llvm.org/D28020

Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 290253
2016-12-21 12:37:36 +00:00
Roman Gareev 92c446016a [Polly] Use three-dimensional arrays to store packed operands of the matrix
multiplication

Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7

it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D27878

llvm-svn: 290251
2016-12-21 11:18:42 +00:00
Tobias Grosser b6945e3301 Fix clang-format
llvm-svn: 290103
2016-12-19 14:06:40 +00:00
Daniel Jasper e5f3eba9c3 Fix format after recent clang-format change.
llvm-svn: 290085
2016-12-19 07:54:15 +00:00
Alexandre Isoard cbed3ce39f Add isl_multi_pw_aff to GICHelper
Add isl_multi_pw_aff* to GICHelper and add some missing isl_pw_multi_aff* handlers.

llvm-svn: 290007
2016-12-16 23:41:26 +00:00
Roman Gareev 2606c48a1d Restrict ranges of extension maps
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D25655

llvm-svn: 289815
2016-12-15 12:35:59 +00:00
Roman Gareev 15db81ef71 [NFC] Fix typos in getMacroKernelParams.
llvm-svn: 289808
2016-12-15 12:00:57 +00:00
Roman Gareev 8babe1a216 The order of the loops defines the data reused in the BLIS implementation of
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.

In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .

Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D25653

llvm-svn: 289806
2016-12-15 11:47:38 +00:00
Michael Kruse 7037fde427 Remove references to AssumptionCache. NFC.
The AssumptionCache was removed in r289756 after being replaced by the an
addtional operand list of affected values in r289755. The absence of that cache
means that we have now have to manually search for llvm.assume intrinsics as
now done by other passes (LazyValueInfo, CodeMetrics) do not take into
account an llvm::Instruction's user lists (ScalarEvolution).

llvm-svn: 289791
2016-12-15 09:25:14 +00:00
Tobias Grosser b02b6a8404 Adjust clang-format formatting to r289531
clang-format has been updated in r289531 to keep labels and values on
the same line. This change updates Polly to the new formatting style.

llvm-svn: 289533
2016-12-13 12:44:00 +00:00
Michael Kruse 79c0173f53 [ScheduleOptimizer] Fix memory leak. NFC.
llvm-svn: 289434
2016-12-12 14:51:06 +00:00
Michael Kruse b9a683d75d Add more ISL foreachElt functions. NFC.
Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are
used by an out-of-tree patch which is in process of being upstreamed.

llvm-svn: 288924
2016-12-07 17:47:57 +00:00
Michael Kruse 2ead2bfc12 Add IslPtr type traits. NFC.
Add traits for isl_id and isl_multi_aff, required by out-of-tree patches
currently in progress of upstreaming.

isl_union_pw_aff_dump has been added to ISL during one of the last ISL
updates, such that we can also enable its dump() trait.

llvm-svn: 288915
2016-12-07 16:17:59 +00:00
Michael Kruse 1b8eb4104b Update to isl-0.17.1-314-g3106e8d
This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes
the compilation under windows, which does not know ssize_t.

In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source
directory. It now generates a file isl_srcdir.c at configure-time, containing
the source path, to not require setting the environment variable "srcdir" at
test-time. The cmake build system had to be modified to also generate that file.

llvm-svn: 288811
2016-12-06 14:37:39 +00:00
Johannes Doerfert bda814350a Allow to disable unsigned operations (zext, icmp ugt, ...)
Unsigned operations are often useful to support but the heuristics are
not yet tuned. This options allows to disable them if necessary.

llvm-svn: 288521
2016-12-02 17:55:41 +00:00
Johannes Doerfert a94ae1aede Do not allow multiple possibly aliasing ptrs in an expression
Relational comparisons should not involve multiple potentially
  aliasing pointers. Similarly this should hold for switch conditions
  and the two conditions involved in equality comparisons (separately!).
  This is a heuristic based on the C semantics that does only allow such
  operations when the base pointers do point into the same object.
  Since this makes aliasing likely we will bail out early instead of
  producing a probably failing runtime check.

llvm-svn: 288516
2016-12-02 17:49:52 +00:00
Johannes Doerfert 2df9963fe3 Rerun mem2reg after the inliner
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.

llvm-svn: 288514
2016-12-02 17:43:57 +00:00
Tobias Grosser bedef00e2c [ScopInfo] Fold constant coefficients in array dimensions to the right
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.

   struct com {
     double Real;
     double Img;
   };

   void foo(long n, struct com A[][n]) {
     for (long i = 0; i < 100; i++)
       for (long j = 0; j < 1000; j++)
         A[i][j].Real += A[i][j].Img;
   }

   int main() {
     struct com A[100][1000];
     foo(1000, A);

llvm-svn: 288489
2016-12-02 08:10:56 +00:00
Tobias Grosser 491b799a4d [ScopInfo] Separate construction and finalization of memory accesses [NFC]
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.

Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.

llvm-svn: 288479
2016-12-02 05:21:22 +00:00
Johannes Doerfert b1d6608430 [NFC] Check for feasibility prior to the profitability check
Feasibility is checked late on its own but early it is hidden behind
the "PollyProcessUnprofitable" guard. This change will make sure we opt
out early if the runtime context is infeasible anyway.

llvm-svn: 288329
2016-12-01 11:12:14 +00:00
Johannes Doerfert b6c5a5dd01 [FIX] Do not try to hoist obviously overwritten loads
llvm-svn: 288328
2016-12-01 11:10:45 +00:00