Commit Graph

1053 Commits

Author SHA1 Message Date
Roman Gareev 23df27682a Map the new load to the base pointer of the invariant load hoisted load
Map the new load to the base pointer of the invariant load hoisted load
to be able to find the alias information for it.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30605

llvm-svn: 298507
2017-03-22 13:57:53 +00:00
Tobias Grosser b28f86e9e6 [CodeGen] Remove need for all parameters to be in scop context for load hoisting.
When not adding constraints on parameters using -polly-ignore-parameter-bounds,
the context may not necessarily list all parameter dimensions. To support code
generation in this situation, we now always iterate over the actual parameter
list, rather than relying on the context to list all parameter dimensions.

llvm-svn: 298197
2017-03-18 23:12:49 +00:00
Tobias Grosser 7693b116a1 [OpenMP] Do not emit lifetime markers for context
In commit r219005 lifetime markers have been introduced to mark the lifetime of
the OpenMP context data structure. However, their use seems incorrect and
recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was
not at all related to r298053. r298053 only caused a change in the loop order,
as this change resulted in a different isl internal representation which caused
the scheduler to derive a different schedule. This change then caused the IR to
change, which apparently created a pattern in which LLVM exploites the lifetime
markers. It seems we are using the OpenMP context outside of the lifetime
markers. Even though CrystalMk could probably be fixed by expanding the scope of
the lifetime markers, it is not clear what happens in case the OpenMP function
call is in a loop which will cause a sequence of starting and ending lifetimes.
As it is unlikely that the lifetime markers give any performance benefit, we
just drop them to remove complexity.

llvm-svn: 298192
2017-03-18 20:10:07 +00:00
Michael Kruse f3091bf4cf [PruneUnprofitable] Add -polly-prune-unprofitable pass.
ScopInfo's normal profitability heuristic considers SCoPs where all
statements have scalar writes as not profitably optimizable and
invalidate the SCoP in that case. However, -polly-delicm and
-polly-simplify may be able to remove some of the scalar writes such
that the flag -polly-unprofitable-scalar-accs=false allows disabling
that part of the heuristic.

In cases where DeLICM (or other passes after ScopInfo) are not
successful in removing scalar writes, the SCoP is still not profitably
optimizable. The schedule optimizer would again try computing another
schedule, resulting in slower compilation.

The -polly-prune-unprofitable pass applies the profitability heuristic
again before the schedule optimizer Polly can still bail out even with
-polly-unprofitable-scalar-accs=false.

Differential Revision: https://reviews.llvm.org/D31033

llvm-svn: 298080
2017-03-17 13:09:52 +00:00
Siddharth Bhat 65f3d5201e [DependenceInfo] Track may-writes and build flow information in
Dependences::calculateDependences.

This ensures that we handle may-writes correctly when building
dependence information. Also add a test case checking correctness of
may-write information. Not handling it before was an oversight.

Differential Revision: https://reviews.llvm.org/D31075

llvm-svn: 298074
2017-03-17 12:31:28 +00:00
Siddharth Bhat 65c4026992 Set Dependences::RED to be non-null once Dependences::calculateDependences()
occurs, even if there is no actual reduction. This ensures correctness
with isl operations.

llvm-svn: 297981
2017-03-16 20:06:49 +00:00
Tobias Grosser c9d4cb2f42 [ScheduleOptimizer] Allow tiling after fusion
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.

The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".

Reviewers: grosser

Subscribers: gareevroman, pollydev

Tags: #polly

Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>

Differential Revision: https://reviews.llvm.org/D30815

llvm-svn: 297587
2017-03-12 19:02:31 +00:00
Michael Kruse 0446d81e2d [Simplify] Add -polly-simplify pass.
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.

It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.

It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.

Differential Revision: https://reviews.llvm.org/D30820

llvm-svn: 297473
2017-03-10 16:05:24 +00:00
Tobias Grosser 8bd7f3c0a5 [ScopDetect/Info] Allow unconditional hoisting of loads from dereferenceable ptrs
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.

llvm-svn: 297375
2017-03-09 11:36:00 +00:00
Michael Kruse 9fb3ab1b19 [DeLICM] Add -polly-delicm-overapproximate-writes option.
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.

This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.

Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.

The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.

A patch that makes codegen support partial accesses is in preparation
as well.

Differential Revision: https://reviews.llvm.org/D30763

llvm-svn: 297373
2017-03-09 11:23:22 +00:00
Michael Kruse 6744efa8d8 [ScopDetection] Only allow SCoP-wide available base pointers.
Simplify ScopDetection::isInvariant(). Essentially deny everything that
is defined within the SCoP and is not load-hoisted.

The previous understanding of "invariant" has a few holes:

- Expressions without side-effects with only invariant arguments, but
  are defined withing the SCoP's region with the exception of selects
  and PHIs. These should be part of the index expression derived by
  ScalarEvolution and not of the base pointer.

- Function calls with that are !mayHaveSideEffects() (typically
  functions with "readnone nounwind" attributes). An example is given
  below.

      @C = external global i32
      declare float* @getNextBasePtr(float*) readnone nounwind
      ...
      %ptr = call float* @getNextBasePtr(float* %A, float %B)

  The call might return:

  * %A, so %ptr aliases with it in the SCoP
  * %B, so %ptr aliases with it in the SCoP
  * @C, so %ptr aliases with it in the SCoP
  * a new pointer everytime it is called, such as malloc()
  * a pointer into the allocated block of one of the aforementioned
  * any of the above, at random at each call

  Hence and contrast to a comment in the base_pointer.ll regression
  test, %ptr is not necessarily the same all the time. It might also
  alias with anything and no AliasAnalysis can tell otherwise if the
  definition is external. It is hence not suitable in the role of a
  base pointer.

The practical problem with base pointers defined in SCoP statements is
that it is not available globally in the SCoP. The statement instance
must be executed first before the base pointer can be used. This is no
problem if the base pointer is transferred as a scalar value between
statements. Uses of MemoryAccess::setNewAccessRelation may add a use of
the base pointer anywhere in the array. setNewAccessRelation is used by
JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently
assumes that base pointers are available globally and generates invalid
code for new access relation (referring to the base pointer of the
original code) if not, even if the base pointer would be available in
the statement.

This could be fixed with some added complexity and restrictions. The
ExprBuilder must lookup the local BBMap and code that call
setNewAccessRelation must check whether the base pointer is available
first.

The code would still be incorrect in the presence of aliasing. There
is the switch -polly-ignore-aliasing to explicitly allow this, but
it is hardly a justification for the additional complexity. It would
still be mostly useless because in most cases either getNextBasePtr()
has external linkage in which case the readnone nounwind attributes
cannot be derived in the translation unit itself, or is defined in the
same translation unit and gets inlined.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D30695

llvm-svn: 297281
2017-03-08 15:14:46 +00:00
Michael Kruse 5a4ec5c42b [ScopDetection] Require LoadInst base pointers to be hoisted.
Only when load-hoisted we can be sure the base pointer is invariant
during the SCoP's execution. Most of the time it would be added to
the required hoists for the alias checks anyway, except with
-polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if
AliasAnalysis is already sure it doesn't alias with anything
(for instance if there is no other pointer to alias with).

Two more parts in Polly assume that this load-hoisting took place:
- setNewAccessRelation() which contains an assert which tests this.
- BlockGenerator which would use to the base ptr from the original
  code if not load-hoisted (if the access expression is regenerated)

Differential Revision: https://reviews.llvm.org/D30694

llvm-svn: 297195
2017-03-07 20:28:43 +00:00
Tobias Grosser 6c9958e0b3 [tests] Make sure tests do not end in 'unreachable' - Part III
There is no point in optimizing unreachable code, hence our test cases should
always return.

This commit is part of a series that makes Polly more robust on the presence of
unreachables.

llvm-svn: 297158
2017-03-07 16:28:53 +00:00
Tobias Grosser 2d233fb35d [tests] Update bounds-check elimination test cases
These test cases should work in combination with
https://reviews.llvm.org/D12676, but became outdated over time. Update them
in preparation of discussions with Daniel Berlin on how to represent unreachable
in the post-dominator tree.

llvm-svn: 297157
2017-03-07 16:17:58 +00:00
Tobias Grosser 134a572951 [ScopDetection] Do not detect scops that exit to an unreachable
Scops that exit with an unreachable are today still permitted, but make little
sense to optimize. We therefore can already skip them during scop detection.
This speeds up scop detection in certain cases and also ensures that bugpoint
does not introduce unreachables when reducing test cases.

In practice this change should have little impact, as the performance of
unreachable code is unlikely to matter.

This commit is part of a series that makes Polly more robust in the presence
of unreachables.

llvm-svn: 297151
2017-03-07 15:50:43 +00:00
Tobias Grosser 87dcd46aa7 [tests] Make sure tests do not end in 'unreachable' - Part II
There is no point in optimizing unreachable code, hence our test cases should
always return.

This commit is part of a series that makes Polly more robust on the presence of
unreachables.

llvm-svn: 297150
2017-03-07 15:23:30 +00:00
Tobias Grosser 2dc1f547ae [tests] Make sure tests do not end in 'unreachable'
There is no point in optimizing unreachable code, hence our test cases should
always return.

This commit is part of a series that makes Polly more robust on the presence of
unreachables.

llvm-svn: 297147
2017-03-07 15:17:23 +00:00
Sanjoy Das b641a90529 Adapt to llvm change r296992 to unbreak the bots
r296992 made ScalarEvolution's CompareValueComplexity less aggressive,
and that broke the polly test being fixed in this change.  This change
explicitly bumps CompareValueComplexity in said test case to make it
pass.

Can someone from the polly team please can give me an idea on if this
case is important enough to have
scalar-evolution-max-value-compare-depth be 3 by default?

llvm-svn: 296994
2017-03-06 01:12:16 +00:00
Tobias Grosser 7d136d952e [tests] Specify the dependence to NVPTX backend for Polly ACC test cases
Some Polly ACC test cases fail without a working NVPTX backend. We explicitly
specify this dependence in REQUIRES. Alternatively, we could have only marked
polly-acc as supported in case the NVPTX backend is available, but as we might
use other backends in the future, this does not seem to be the best choice.

For this to work, we also need to make the 'targets_to_build' information
available.

Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 296853
2017-03-03 03:38:50 +00:00
Tobias Grosser 9d551da5c1 [test] Do not emit binary data to output
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 296852
2017-03-03 03:24:34 +00:00
Tobias Grosser 7a93d94a8f Revert "Currently broken by recent LLVM upstream changes"
This reverts commit r296579, which is not needed anymore as the relevant changes
in trunk have been reverted.

llvm-svn: 296817
2017-03-02 21:43:50 +00:00
Tobias Grosser 1c787e0b49 [ScopDetection] Do not allow required-invariant loads in non-affine region
These loads cannot be savely hoisted as the condition guarding the
non-affine region cannot be duplicated to also protect the hoisted load
later on. Today they are dropped in ScopInfo. By checking for this early, we
do not even try to model them and possibly can still optimize smaller regions
not containing this specific required-invariant load.

llvm-svn: 296744
2017-03-02 12:15:37 +00:00
Tobias Grosser c2f151084d [ScopInfo] Disable memory folding in case it results in multi-disjunct relations
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.

Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access

  [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }

which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.

For people only reading commit messages, here the comment that explains what
memory folding is:

To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.

We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.

As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:

[i,j] -> [i, j] :      j >= 0
[i,j] -> [i-1, j+N] :  j <  0

After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.

llvm-svn: 296679
2017-03-01 21:11:27 +00:00
Tobias Grosser 6f9b60cf38 Currently broken by recent LLVM upstream changes
We mark it as XFAIL to get buildbots back to green, until the upstream changes
have been addressed.

llvm-svn: 296579
2017-03-01 04:34:44 +00:00
Tobias Grosser d7c4975349 [ScopInfo] Simplify inbounds assumptions under domain constraints
Without this simplification for a loop nest:

  void foo(long n1_a, long n1_b, long n1_c, long n1_d,
           long p1_b, long p1_c, long p1_d,
           float A_1[][p1_b][p1_c][p1_d]) {
    for (long i = 0; i < n1_a; i++)
      for (long j = 0; j < n1_b; j++)
        for (long k = 0; k < n1_c; k++)
          for (long l = 0; l < n1_d; l++)
            A_1[i][j][k][l] += i + j + k + l;
 }

the assumption:

  n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
   p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)

is taken rather than the simpler assumption:

  p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.

The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.

This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.

llvm-svn: 296456
2017-02-28 09:45:54 +00:00
Michael Kruse 6469380daa [Cmake] Optionally use a system isl version.
This patch adds an option to build against a version of libisl already
installed on the system. The installation is autodetected using the
pkg-config file shipped with isl.

The detection of the library is in the FindISL.cmake module that creates
an imported target.

Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>

Differential Revision: https://reviews.llvm.org/D30043

llvm-svn: 296361
2017-02-27 17:54:25 +00:00
Michael Kruse c4f61d2346 [DeLICM] Add nomap regressions tests. NFC.
These verify that some scalars are not mapped because it would be
incorrect to do so.

For these check we verify that no transformation has been executed from
output of the pass's '-analyze'. Adding optimization remarks is not useful
as it would result in too many messages, even repeated ones. I avoided
checking the '-debug-only=polly-delicm' output which is an antipattern.

llvm-svn: 296348
2017-02-27 15:53:18 +00:00
Roman Gareev 96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Michael Kruse d8d32bb3d1 [DeLICM] Regression test for skipping map targets.
Add optimization-remarks-missed for when mapping targets have been
skipped and add regression tests for them.

llvm-svn: 295953
2017-02-23 10:25:20 +00:00
Michael Kruse deb30e8278 [DeLICM] Add regression tests for DeLICM reject cases.
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.

We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.

Differential Revision: https://reviews.llvm.org/D30253

llvm-svn: 295846
2017-02-22 15:14:08 +00:00
Michael Kruse 9e52c39f0a [DeLICM] Map values hoisted by LICM back to the array.
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.

The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.

Reviewers: grosser

Differential Revision: https://reviews.llvm.org/D24716

llvm-svn: 295713
2017-02-21 10:20:54 +00:00
Tobias Grosser 079d511891 [ScopInfo] Count read-only arrays when computing complexity of alias check
Instead of counting the number of read-only accesses, we now count the number of
distinct read-only array references when checking if a run-time alias check
may be too complex. The run-time alias check is quadratic in the number of
base pointers, not the number of accesses.

Before this change we accidentally skipped SPEC's lbm test case.

llvm-svn: 295567
2017-02-18 20:51:29 +00:00
Tobias Grosser 41f0d81b31 [test] Add reduction sequence test case [NFC]
This test case is a mini performance test case that shows the time needed for a
couple of simple reductions. It takes today about 325ms on my machine to run
this test case through 'opt' with scop construction and reduction detection. It
can be used as mini-proxy for further tuning of the reduction code.

Generally we do not commit performance test cases, but as this is very
small and also very fast it seems OK to keep it in the lit test suite.

This test case will also help to verify that future changes to the reduction
code will not affect the ordering of the reduction sets and will consequently
not cause spurious performance changes that only result from reordering of
dependences in the reduction set.

llvm-svn: 295549
2017-02-18 16:38:58 +00:00
Tobias Grosser cd01a363d6 [ScopInfo] Add statistics to count loops after scop modeling
llvm-svn: 295431
2017-02-17 08:12:36 +00:00
Tobias Grosser 65ce9362b8 [ScopDetection] Compute the maximal loop depth correctly
Before this change, we obtained loop depth numbers that were deeper then the
actual loop depth.

llvm-svn: 295430
2017-02-17 08:08:54 +00:00
Tobias Grosser 72745c2ef5 Updated isl to isl-0.18-254-g6bc184d
This update includes a couple more coalescing changes as well as a large
number of isl-internal code cleanups (dead assigments, ...).

llvm-svn: 295419
2017-02-17 05:11:16 +00:00
Tobias Grosser ca2cfd0bd8 [ScopInfo] Do not try to fold array dimensions of size zero
Trying to fold such kind of dimensions will result in a division by zero,
which crashes the compiler. As such arrays are likely to invalidate the
scop anyhow (but are not illegal in LLVM-IR), there is no point in trying
to optimize the array layout. Hence, we just avoid the folding of
constant dimensions of size zero.

llvm-svn: 295415
2017-02-17 04:48:52 +00:00
Tobias Grosser 76ec194951 [tests] Fix some misspellings [NFC]
llvm-svn: 295361
2017-02-16 19:11:29 +00:00
Tobias Grosser c8a8276710 [ScopInfo] Bound the number of disjuncts in context
Before this change wrapping range metadata resulted in exponential growth of
the context, which made context construction of large scops very slow. Instead,
we now just do not model the range information precisely, in case the number
of disjuncts in the context has already reached a certain limit.

llvm-svn: 295360
2017-02-16 19:11:25 +00:00
Tobias Grosser 3281f601bb [ScopInfo] Always derive upper and lower bounds for parameters
Commit r230230 introduced the use of range metadata to derive bounds for
parameters, instead of just looking at the type of the parameter. As part of
this commit support for wrapping ranges was added, where the lower bound of a
parameter is larger than the upper bound:

  { 255 < p || p < 0 }

However, at the same time, for wrapping ranges support for adding bounds given
by the size of the containing type has acidentally been dropped. As a result,
the range of the parameters was not guaranteed to be bounded any more. This
change makes sure we always add the bounds given by the size of the type and
then additionally add bounds based on signed wrapping, if available. For a
parameter p with a type size of 32 bit, the valid range is then:

  { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) }

llvm-svn: 295349
2017-02-16 18:39:14 +00:00
Tobias Grosser b3a85884f7 Do not use wrapping ranges to bound non-affine accesses
When deriving the range of valid values of a scalar evolution expression might
be a range [12, 8), where the upper bound is smaller than the lower bound and
where the range is expected to possibly wrap around. We theoretically could
model such a range as a union of two non-wrapping ranges, but do not do this
as of yet. Instead, we just do not derive any bounds. Before this change,
we could have obtained bounds where the maximal possible value is strictly
smaller than the minimal possible value, which is incorrect and also caused
assertions during scop modeling.

llvm-svn: 294891
2017-02-12 08:11:12 +00:00
Roman Gareev b196055c0c Check reduction dependencies in case of the matrix multiplication optimization
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Sven Verdoolaege <skimo-polly@kotnet.org>
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29814

llvm-svn: 294836
2017-02-11 09:59:09 +00:00
Roman Gareev 3d4eae31ea Use the size of the widest type of the matrix multiplication operands
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29269

llvm-svn: 294828
2017-02-11 07:00:05 +00:00
Tobias Grosser 4553463be4 [IRBuilder] Extract base pointers directly from ScopArray
Instead of iterating over statements and their memory accesses to extract the
set of available base pointers, just directly iterate over all ScopArray
objects. This reflects more the actual intend of the code: collect all arrays
(and their base pointers) to emit alias information that specifies that accesses
to different arrays cannot alias.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294574
2017-02-09 09:34:42 +00:00
Roman Gareev 028ba3702c [FIX] Disable the problematic run lines
There are problems with using the machine information to derive the precise
vector size on polly-amd64-linux and polly-arm-linux. We temporarily disable
the problematic run lines.

llvm-svn: 294571
2017-02-09 09:03:13 +00:00
Roman Gareev 2d0d294e3c [FIX] Specify the CPU to overwrite the machine info and set a fixed vector
size.

llvm-svn: 294569
2017-02-09 08:29:55 +00:00
Tobias Grosser 26fb7d7517 [IslAst] Print the ScopArray name to mark reductions
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294568
2017-02-09 08:06:15 +00:00
Roman Gareev 9989088ee9 Isolate a set of partial tile prefixes in case of the matrix multiplication
optimization

Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.

In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.

The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.

In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.

It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29244

llvm-svn: 294564
2017-02-09 07:10:01 +00:00
Roman Gareev 772498dc68 [NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized
with optimizeMatMulPattern

This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.

llvm-svn: 294444
2017-02-08 13:29:06 +00:00
Roman Gareev 98075fe181 A new algorithm for identification of a SCoP statement that implement a matrix
multiplication

The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28357

llvm-svn: 293890
2017-02-02 14:23:14 +00:00