Commit Graph

3028 Commits

Author SHA1 Message Date
Tobias Grosser 1f8b84094f Update isl bindings to latest version (+ Polly extensions)
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.

llvm-svn: 300398
2017-04-15 08:15:54 +00:00
Tobias Grosser 75aa1a9a49 Use isl C++ foreach implementation
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.

The original isl C function:

  isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
      isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);

which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:

  stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;

The following code illustrates the use of the new C++ interface:

  auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
    auto Shifted = shiftDimension(Set, Pos, Amount);
    Result = Result.add(Shifted);
    return isl::stat::ok;
  }

  UnionSet.foreach_set(Lambda);

Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.

Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D30620

llvm-svn: 300323
2017-04-14 13:39:40 +00:00
Michael Kruse a8e885d87c [DeLICM] Introduce unittesting infrastructure for Known and Written. NFC.
llvm-svn: 300212
2017-04-13 16:32:46 +00:00
Michael Kruse 72f3922534 [DeLICM] Export Known and Written to DeLICMTests. NFC.
This will allow unittesting of new functionality based on
Known and Written.

llvm-svn: 300211
2017-04-13 16:32:39 +00:00
Michael Kruse a2acc11949 [DeLICM] Add Knowledge::Known. NFC.
This field will later contain a ValInst that is known to be stored
in an occupied array element.

llvm-svn: 300210
2017-04-13 16:32:31 +00:00
Michael Kruse fa7c8cdfc6 [DeLICM] Make Knowledge::Written an isl::union_map. NFC.
The map will later point to a ValInst that is written.

llvm-svn: 300208
2017-04-13 16:32:25 +00:00
Michael Kruse 5e6456979b [DeLICM] Rename Knowledge to KnowledgeStr. NFC.
Some debuggers get confused by different class of the same name
defined independently in different translation units.

llvm-svn: 300207
2017-04-13 16:32:16 +00:00
Tobias Grosser 7b5a4dfd46 Exploit BasicBlock::getModule to shorten code
Suggested-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 299914
2017-04-11 04:59:13 +00:00
Tobias Grosser 67726b3260 SAdjust to recent change in constructor definition of AllocaInst
llvm-svn: 299913
2017-04-11 04:23:38 +00:00
Matt Arsenault b3e30c32ce Update for alloca construction changes
llvm-svn: 299905
2017-04-11 00:12:58 +00:00
Philip Pfaffe 78265cd237 Fix missing .git/indexloadPolly in ensure-correct-tile-sizes testcase
llvm-svn: 299765
2017-04-07 12:55:26 +00:00
Roman Gareev 9d4d91ca6a [FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPattern
Use new values of the dimensions during their permutation.

llvm-svn: 299663
2017-04-06 17:25:08 +00:00
Roman Gareev e0d466342b Restore the initial ordering of dimensions before applying the pattern matching
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.

For example, in case of the following matrix-matrix multiplication,

for (i = 0; i < 1024; i++)
  for (k = 0; k < 1024; k++)
    for (j = 0; j < 1024; j++)
      C[i][j] += A[i][k] * B[k][j];

it can produce the following schedule tree

domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
                                        0 <= i2 <= 1023 }"
child:
  schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i1)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
  permutable: 1
  coincident: [ 1, 1, 0 ]

The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).

This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.

Refs.:

[1] - https://bugs.llvm.org/show_bug.cgi?id=32500

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D31741

llvm-svn: 299662
2017-04-06 17:09:54 +00:00
Siddharth Bhat 5eeb1dd42e [Polly] [ScheduleOptimizer] Prevent incorrect tile size computation
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.

Check against a possible divide-by-zero in getMacroKernelParams
and return early.

Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31708

llvm-svn: 299633
2017-04-06 08:20:22 +00:00
Tobias Grosser 0d622a4bf9 Update to isl-0.18-417-gb9e7334
This is a regular maintenance update.

llvm-svn: 299617
2017-04-06 03:41:47 +00:00
Michael Kruse 895f5d8080 Remove llvm.lifetime.start/end in original region.
The current StackColoring algorithm does not correctly handle the
situation when some, but not all paths from a BB to the entry node
cross a llvm.lifetime.start. According to an interpretation of the
language reference at
http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic
this might be correct, but it would cost too much effort to handle
in StackColoring.

To be on the safe side, remove all lifetime markers even in the original
code version (they have never been copied to the optimized version)
to ensure that no path to the entry block will cross a
llvm.lifetime.start.

The same principle applies to paths the a function return and the
llvm.lifetime.end marker, so we remove them as well.

This fixes llvm.org/PR32251.

Also see the discussion at
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html

llvm-svn: 299585
2017-04-05 20:09:59 +00:00
Tobias Grosser 59e42b8f96 Add two Polly images
llvm-svn: 299534
2017-04-05 11:50:31 +00:00
Siddharth Bhat bcbfdade41 [Polly] [DependenceInfo] change WAR, WAW generation to correct semantics
= Change of WAR, WAW generation: =

- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
    `sink <- may source <- must source` as a *may* dependence.

- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```

- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.

- Now, we call WAR and WAW correctly.

== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
   Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
   WAW = isl_union_flow_get_may_dependence(Flow);
   isl_union_flow_free(Flow);
```

== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
    Flow = buildFlow(Write, Read, MustaWrite, Schedule);
    WAR = isl_union_flow_get_must_dependence(Flow);
    isl_union_flow_free(Flow);
```

- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
  before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
  intuitively correct, as we do not want may-writes to block
  must-writes.
- Leaves us with direct (R -> W).

- This affects reduction generation since RED is built using WAW and WAR.

= New StrictWAW for Reductions: =

- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
      Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
      WAW = isl_union_flow_get_must_dependence(Flow);
      WAR = isl_union_flow_get_may_dependence(Flow);
```

- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.

= Explanation: Why the new WAR dependences in tests are correct: =

- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.

== Code: ==
```lang=llvm, name=new-war-dependence.ll
  ;    void manyreductions(long *A) {
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S0:          *A += 42;
  ;
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S1:          *A += 42;
  ;
```
=== WAR dependence: ===
  {  S0[1023, 1023] -> S1[0, 0] }

- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:

```lang=cpp, name=dependence-incorrect, counterexample
        S0[1023, 1023]:
    *-- tmp = *A (load0)--*
WAR 2   add = tmp + 42    |
    *-> *A = add (store0) |
                         WAR 1
        S1[0, 0]:         |
        tmp = *A (load1)  |
        add = tmp + 42    |
        A = add (store1)<-*
```

- One may assume that WAR2 *hides* WAR1 (since store0 happens before
  store1). However, within a statement, Polly has no idea about the
  ordering of loads and stores.

- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
    S0[1023, 1023]:
    A = add (store0)
    tmp = A (load0) ---*
    add = A + 42       |
                     WAR 1
    S1[0, 0]:          |
    tmp = A (load1)    |
    add = A + 42       |
    A = add (store1) <-*
```

- So, Polly  generates (correct) WAR dependences. It does not make sense
  to remove these dependences, since they are correct with respect to
  Polly's model.

    Reviewers: grosser, Meinersbur

    tags: #polly

    Differential revision: https://reviews.llvm.org/D31386

llvm-svn: 299429
2017-04-04 13:08:23 +00:00
Philip Pfaffe 447f175eb5 Fix formatting in LoopGenerators
llvm-svn: 299424
2017-04-04 10:22:17 +00:00
Philip Pfaffe 2d950f36ee [Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers
Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.

This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31653

llvm-svn: 299423
2017-04-04 10:01:53 +00:00
Tobias Grosser 637be04b77 [PerfMonitor] Use Intrinsics::getDeclaration
Instead of creating the declaration ourselves, we obtain it directly from the
LLVM intrinsic definitions. This addresses a post-review comment for r299359.

Suggested-by: Hongzing Zheng <etherzhhb@gmail.com>
llvm-svn: 299360
2017-04-03 15:23:08 +00:00
Tobias Grosser 65371af2e1 [CodeGen] Add Performance Monitor
Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.

Example output:

  Polly runtime information
  -------------------------
  Total: 783110081637
  Scops: 663718949365

In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.

Reviewers: bollu,sebpop

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31599

llvm-svn: 299359
2017-04-03 14:55:37 +00:00
Michael Kruse 0b8949e6ed [test] Fix two testcases. NFC.
Trivial fix for two testcases. When Polly isn't linked into opt,
independent of whether it's built in-tree or not, these testcases forget
to load the appropriate library.

Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>

Differential Revision: https://reviews.llvm.org/D31596

llvm-svn: 299357
2017-04-03 12:37:10 +00:00
Michael Kruse 6e7854a560 [ScopInfo] Fix typos in option description.
llvm-svn: 299356
2017-04-03 12:03:38 +00:00
Tobias Grosser bd96c73a1a Add test case for r299352.
llvm-svn: 299353
2017-04-03 07:44:23 +00:00
Tobias Grosser 696a1ee99d [PollyIRBuilder] Bound size of alias metadata
No-alias metadata grows quadratic in the size of arrays involved, which can
become very costly for large programs. This commit bounds the number of arrays
for which we construct no-alias information to ten. This is conservatively
correct, as we just provide less information to LLVM and speeds up the compile
time of one of my internal test cases from 'does-not-terminate' to
'finishes-in-less-than-a-minute'. In the future we might try to be more clever
here, but this change should provide a good baseline.

llvm-svn: 299352
2017-04-03 07:42:50 +00:00
Tobias Grosser af940ae280 Update to isl-0.18-410-gc253447
This is a regular maintenance update to ensure latest isl changes are tested
in our buildbots.

llvm-svn: 299350
2017-04-03 06:46:16 +00:00
Huihui Zhang d6d6a3f2ee revert test commit r299024
llvm-svn: 299026
2017-03-29 20:23:56 +00:00
Huihui Zhang 9d19e9d232 test commit, add blank line
llvm-svn: 299024
2017-03-29 20:10:45 +00:00
Michael Kruse c3e9c1442d [ScopInfo] Introduce ScopStmt::contains(BB*). NFC.
Provide an common way for testing if a statement contains something
for region and block statements. First user is
RegionGenerator::addOperandToPHI.

Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298617
2017-03-23 16:12:21 +00:00
Tobias Grosser 1f7e7d3d93 Update to isl-0.18-402-ga30c537
This is a regular maintenance update.

llvm-svn: 298595
2017-03-23 13:38:24 +00:00
Michael Kruse 9e4e7b467f [DeLICM] Add const qualifiers. NFC.
llvm-svn: 298546
2017-03-22 20:09:58 +00:00
Michael Kruse 174f483990 [Support] Add functions to ISLTools.
Add shiftDim and convertZoneToTimepoints overloads for isl maps.

Add distributeDomain, liftDomains and applyDomainRange functions.

These are going to be used in https://reviews.llvm.org/D31247
(Add known array contents to Knowledge)

llvm-svn: 298543
2017-03-22 19:31:06 +00:00
Michael Kruse d07d155ebb [DeLICM] Remove overloaded Knowledge constructor. NFC.
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.

llvm-svn: 298529
2017-03-22 18:01:23 +00:00
Michael Kruse 29143ec3f7 [DeLICM] Remove AllElements. NFC.
It is not used and will not be used (anymore) in future commits.

llvm-svn: 298522
2017-03-22 17:18:39 +00:00
Roman Gareev cdfb57dc46 Introduce another level of metadata to distinguish non-aliasing accesses
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30606

llvm-svn: 298510
2017-03-22 14:25:24 +00:00
Roman Gareev 23df27682a Map the new load to the base pointer of the invariant load hoisted load
Map the new load to the base pointer of the invariant load hoisted load
to be able to find the alias information for it.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30605

llvm-svn: 298507
2017-03-22 13:57:53 +00:00
Siddharth Bhat 44b6cb4e63 [DependenceInfo] change name Write to MustWrite to remove ambiguity [NFC]
"Write" is an overloaded term. In collectInfo() till buildFlow(), it is
used to mean "must writes". However, within the memory based analysis,
it is used to mean "both may and must writes". Renaming the Write
variable helps clarify this difference.

Reviewers: grosser

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31181

llvm-svn: 298361
2017-03-21 11:54:08 +00:00
Tobias Grosser 29eaa16b7e Update isl to isl-0.18-395-g77701b3
This is a normal maintenance update.

llvm-svn: 298352
2017-03-21 09:12:11 +00:00
Michael Kruse 0d10696693 [DeLICM] Refector out parseSetOrNull. NFC.
Note that the isl::union_set(isl_ctx,std::string) constructor will
auto-convert the char* to an std::string. Converting a nullptr to
std::string is undefined in C++11 (sect. 21.4.2.9).

llvm-svn: 298259
2017-03-20 15:37:32 +00:00
Michael Kruse d75d56e9bf [DeLICM] Add forgotten isl_space_set_tuple_id in unittests.
Otherwise the isl_id NewId which ensures uniqueness of the
created space is unused. None of the tests currently uses an
nameless tuple, so there is not change in what is tested.

llvm-svn: 298258
2017-03-20 15:24:45 +00:00
Tobias Grosser b28f86e9e6 [CodeGen] Remove need for all parameters to be in scop context for load hoisting.
When not adding constraints on parameters using -polly-ignore-parameter-bounds,
the context may not necessarily list all parameter dimensions. To support code
generation in this situation, we now always iterate over the actual parameter
list, rather than relying on the context to list all parameter dimensions.

llvm-svn: 298197
2017-03-18 23:12:49 +00:00
Tobias Grosser 1be726a40d [IslExprBuilder] Print accessed memory locations with RuntimeDebugBuilder
After this change, enabling -polly-codegen-add-debug-printing in combination
with -polly-codegen-generate-expressions allows us to instrument the compiled
binaries to not only print the values stored and loaded to a given memory
access, but also to print the accessed location with array name and
per-dimension offset:

  MemRef_A[3][2]
  Store to  6299784: 5.000000
  MemRef_A[3][3]
  Load from 6299788: 0.000000
  MemRef_A[3][3]
  Store to  6299788: 6.000000

This can be very helpful for debugging.

llvm-svn: 298194
2017-03-18 20:54:43 +00:00
Tobias Grosser 7693b116a1 [OpenMP] Do not emit lifetime markers for context
In commit r219005 lifetime markers have been introduced to mark the lifetime of
the OpenMP context data structure. However, their use seems incorrect and
recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was
not at all related to r298053. r298053 only caused a change in the loop order,
as this change resulted in a different isl internal representation which caused
the scheduler to derive a different schedule. This change then caused the IR to
change, which apparently created a pattern in which LLVM exploites the lifetime
markers. It seems we are using the OpenMP context outside of the lifetime
markers. Even though CrystalMk could probably be fixed by expanding the scope of
the lifetime markers, it is not clear what happens in case the OpenMP function
call is in a loop which will cause a sequence of starting and ending lifetimes.
As it is unlikely that the lifetime markers give any performance benefit, we
just drop them to remove complexity.

llvm-svn: 298192
2017-03-18 20:10:07 +00:00
Siddharth Bhat 3e4a7d38ab [ScheduleOptimiser] fix typos in top comment [NFC]
coice -> choice
Transations -> Transactions

llvm-svn: 298095
2017-03-17 14:52:19 +00:00
Michael Kruse 89b1f94e64 Revert "Remove references to AssumptionCache. NFC."
The AssumptionCache removal of r289756 has been reverted in
r290086/r290087. A different solution has been implemented in r291671
which keeps the AssumptionCache. We can therefore use it again in Polly.

This reverts r289791.

llvm-svn: 298089
2017-03-17 13:56:53 +00:00
Siddharth Bhat 4fe11cf95f [DependenceInfo] Remove idempotent union: must-writes with may-writes [NFC]
Since may-writes are always a superset of the must-writes, there is no
point in taking a union of one with the other.

llvm-svn: 298085
2017-03-17 13:26:10 +00:00
Michael Kruse 9b91c62e3a [ScopInfo/PruneUnprofitable] Move default profitability check.
In the previous default ScopInfo applied the profitability heuristic for
scalar accesses (-polly-unprofitable-scalar-accs=true) and the
-polly-prune-unprofitable was disabled by default
(-polly-enable-prune-unprofitable=false) as that pruning was already done.

This changes switches the defaults to -polly-unprofitable-scalar-accs=true
-polly-enable-prune-unprofitable=false such that the scalar access
heuristic check is done by the pass. This allows passes between ScopInfo
and PruneUnprofitable to optimize away scalar accesses.

Without enabling such intermediate passes, there is no change in
behaviour of profitability checks in a PassManagerBuilder built
pass chain, but it allows us to cover this configuration with the
buildbots.

Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298081
2017-03-17 13:10:05 +00:00
Michael Kruse f3091bf4cf [PruneUnprofitable] Add -polly-prune-unprofitable pass.
ScopInfo's normal profitability heuristic considers SCoPs where all
statements have scalar writes as not profitably optimizable and
invalidate the SCoP in that case. However, -polly-delicm and
-polly-simplify may be able to remove some of the scalar writes such
that the flag -polly-unprofitable-scalar-accs=false allows disabling
that part of the heuristic.

In cases where DeLICM (or other passes after ScopInfo) are not
successful in removing scalar writes, the SCoP is still not profitably
optimizable. The schedule optimizer would again try computing another
schedule, resulting in slower compilation.

The -polly-prune-unprofitable pass applies the profitability heuristic
again before the schedule optimizer Polly can still bail out even with
-polly-unprofitable-scalar-accs=false.

Differential Revision: https://reviews.llvm.org/D31033

llvm-svn: 298080
2017-03-17 13:09:52 +00:00
Tobias Grosser 5842dee251 [ScopInfo] Add option to not add parameter bounds to context [NFC]
For experiments it is sometimes helpful to provide parameter bound information
to polly and to not use these parameter bounds for simplification.
Add a new option "-polly-ignore-parameter-bounds" which does precisely this.

llvm-svn: 298077
2017-03-17 13:00:53 +00:00