Commit Graph

2587 Commits

Author SHA1 Message Date
Michael Kruse fbde435517 [CodeGen] Use MapVector instead of DenseMap.
The map is iterated over when generating the values escaping the SCoP. The
indeterministic iteration order of DenseMap causes the output IR to change at
every compilation, adding noise to comparisons.

Replace DenseMap by a MapVector to ensure the same iteration order at every
compilation.

llvm-svn: 277832
2016-08-05 16:45:51 +00:00
Michael Kruse d82222fc1b [DependenceInfo] Reset operations counter when setting limit.
When entering the dependence computation and the max_operations is set, the
operations counter may have already exceeded the counter, thus aborting any ISL
computation from the start. The counter is reset at the end of the dependence
calculation such that a follow-up recomputation might succeed, ie. the success
of the first dependence calculation depends on unrelated ISL operations that
happened before, giving it a disadvantage to the following calculations.

This patch resets the operations counter at the beginning of the dependence
recalculation to not depend on previous actions. Otherwise additional
preprocessing of the Scop that aims to improve its schedulability (eg. DeLICM)
do have the effect that DependenceInfo and hence the scheduling fail more
likely, contraproductive to the goal of said preprocessing.

llvm-svn: 277810
2016-08-05 11:31:02 +00:00
Tobias Grosser 928d7573dd GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.

llvm-svn: 277802
2016-08-05 08:27:24 +00:00
Tobias Grosser 470608e3e4 Add missing 'REQUIRES' line
llvm-svn: 277800
2016-08-05 07:08:45 +00:00
Tobias Grosser c1c6a2a61b GPGPU: Add cuda annotations to specify maximal number of threads per block
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.

llvm-svn: 277799
2016-08-05 06:47:43 +00:00
Tobias Grosser f919d8b360 GPGPU: Support scalars that are mapped to shared memory
llvm-svn: 277726
2016-08-04 13:57:29 +00:00
Tobias Grosser 8950cead7f GPGPU: Disable verbose debug output
llvm-svn: 277724
2016-08-04 12:44:03 +00:00
Tobias Grosser b0dd95bcd2 Remove leftover debug output
llvm-svn: 277723
2016-08-04 12:41:28 +00:00
Tobias Grosser 130ca30f92 GPGPU: Add private memory support
llvm-svn: 277722
2016-08-04 12:39:03 +00:00
Tobias Grosser b513b4916b GPGPU: Add support for shared memory
llvm-svn: 277721
2016-08-04 12:18:14 +00:00
Tobias Grosser b187515784 GPGPU: Cache PTX kernels
We always keep a number of already compiled kernels available to ensure to avoid
costly recompilation.

llvm-svn: 277707
2016-08-04 09:15:58 +00:00
Tobias Grosser 00bb5a99f5 GPGPU: Handle scalar array references
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.

llvm-svn: 277699
2016-08-04 06:55:59 +00:00
Tobias Grosser 3216f8546c BlockGenerator: Assert that we do not get alloca of array access
llvm-svn: 277698
2016-08-04 06:55:53 +00:00
Tobias Grosser 576932728d GPGPU: Pass subtree values correctly to the kernel
llvm-svn: 277697
2016-08-04 06:55:49 +00:00
Tobias Grosser 629109b633 GPGPU: Mark kernel functions as polly.skip
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.

llvm-svn: 277589
2016-08-03 12:00:07 +00:00
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev 0c09a3af00 Add missing prefixes.
llvm-svn: 277264
2016-07-30 11:15:00 +00:00
Roman Gareev d7754a1245 Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D22828

llvm-svn: 277263
2016-07-30 09:25:51 +00:00
Tobias Grosser 8af38ecaa3 Add missing REQUIRES line
llvm-svn: 276964
2016-07-28 07:08:34 +00:00
Tobias Grosser d8b94bcac1 GPGPU: Pass context parameters to GPU kernel
llvm-svn: 276963
2016-07-28 06:47:59 +00:00
Tobias Grosser a490147c90 GPGPU: Pass host iterators to kernel
llvm-svn: 276962
2016-07-28 06:47:56 +00:00
Tobias Grosser 44143bb927 GPGPU: use current 'Index' to find slot in parameter array
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.

llvm-svn: 276961
2016-07-28 06:47:53 +00:00
Tobias Grosser 4e18d71c71 GPGPU: Generate kernel parameter allocation with right size
Before this change we miscounted the number of function parameters.

llvm-svn: 276960
2016-07-28 06:47:50 +00:00
Tobias Grosser 79a947c233 GPGPU: Add basic support for kernel launches
llvm-svn: 276863
2016-07-27 13:20:16 +00:00
Tobias Grosser 5779359624 GPGPU: Load GPU kernels
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.

llvm-svn: 276645
2016-07-25 16:31:21 +00:00
Johannes Doerfert 8031238017 [GSoC] Add PolyhedralInfo pass - new interface to polly analysis
Adding a new pass PolyhedralInfo. This pass will be the interface to Polly.
  Initially, we will provide the following interface:
    - #IsParallel(Loop *L) - return a bool depending on whether the loop is
                             parallel or not for the given program order.

Patch by Utpal Bora <cs14mtech11017@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D21486

llvm-svn: 276637
2016-07-25 12:48:45 +00:00
Tobias Grosser 13c78e4d51 GPGPU: Emit data-transfer code
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.

llvm-svn: 276636
2016-07-25 12:47:39 +00:00
Tobias Grosser 7287aeddf1 GPGPU: Complete code to allocate and free device arrays
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.

llvm-svn: 276635
2016-07-25 12:47:33 +00:00
Tobias Grosser 19b8a0bbfb GPURuntime: Add missing debug output
llvm-svn: 276634
2016-07-25 12:47:28 +00:00
Tobias Grosser 9855e8bd80 GPURuntime: Fix typo in docu
llvm-svn: 276633
2016-07-25 12:47:25 +00:00
Tobias Grosser a71eedd4c5 GPURuntime: Drop polly_cleanupGPGPUResources
This function is currently unused and won't be used in this form again. Instead
of freeing many unrelated items at the same time, we will instead explicitly
call free function from the host-IR we generate for each object we want to free.
These specific free functions will be added together with the corresponding
host-IR generation code.

llvm-svn: 276632
2016-07-25 12:47:22 +00:00
Johannes Doerfert 3b7ac0a691 [GSoC] Do not process SCoPs with infeasible runtime context
Do not process SCoPs with infeasible runtime context in the new
  ScopInfoWrapperPass. Do not compute dependences for such SCoPs in the new
  DependenceInfoWrapperPass.

Patch by Utpal Bora <cs14mtech11017@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D22402

llvm-svn: 276631
2016-07-25 12:40:59 +00:00
Roman Gareev 3a18a931a8 Apply all necessary tilings and interchangings to get a macro-kernel
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21491

llvm-svn: 276627
2016-07-25 09:42:53 +00:00
Tobias Grosser fa7b080218 GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface.
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.

llvm-svn: 276623
2016-07-25 09:16:01 +00:00
Tobias Grosser 8ed5e5999f IslNodeBuilder: Make finalize() virtual
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.

llvm-svn: 276622
2016-07-25 09:15:57 +00:00
Tobias Grosser 0a1a2720c8 GPURuntime: Check for debug-mode early on
Before this change, the debug statements in polly_initDevice would all be
skipped, as debug-mode would only be enabled _after_ they have already been run.

llvm-svn: 276621
2016-07-25 09:15:53 +00:00
Tobias Grosser dc816da455 GPURuntime: Drop timing functionality (some leftover II)
llvm-svn: 276617
2016-07-25 08:03:08 +00:00
Roman Gareev 2cb4d133f5 [NFC] Refactor creation of the BLIS mirco-kernel and improve documentation
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 276616
2016-07-25 07:27:59 +00:00
Tobias Grosser 97aa23519e GPURuntime: Drop timing functionality (some leftover)
llvm-svn: 276612
2016-07-25 07:11:49 +00:00
Tobias Grosser 92713bea42 GPURuntime: Drop timing functionality
This functionality won't be used in the current iteration. Drop it for now to
reduce the surface of the library. We can always add it back in when we need
it again.

llvm-svn: 276611
2016-07-25 07:10:45 +00:00
Tobias Grosser 9a18d55947 GPGPU: Optimize kernel IR before generating assembly code
We optimize the kernel _after_ dumping the IR we generate to make the IR we
dump easier readable and independent of possible changes in the general
purpose LLVM optimizers.

llvm-svn: 276551
2016-07-24 06:43:21 +00:00
Tobias Grosser e1a98343a1 GPGPU: Verify kernel IR before generating assembly
llvm-svn: 276550
2016-07-24 06:43:17 +00:00
Michael Kruse 977d38bd87 Remove unused parameters from simplifySCoP(). NFC.
llvm-svn: 276444
2016-07-22 17:31:17 +00:00
Tobias Grosser 74dc3cb431 GPGPU: Generate PTX assembly code for the kernel modules
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.

To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.

Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.

llvm-svn: 276396
2016-07-22 07:11:12 +00:00
Tobias Grosser edb885cb12 GPGPU: generate code for ScopStatements
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.

For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.

llvm-svn: 276270
2016-07-21 13:15:59 +00:00
Tobias Grosser 86083da0ec IslNodeBuilder: expose addReferencesFromStmt [NFC]
This will be used by Polly GPGPU to determine the values that need to be
passed to GPU kernels.

llvm-svn: 276269
2016-07-21 13:15:55 +00:00
Tobias Grosser 04b909fcca IslExprBuilder: allow to specify an external isl_id to ScopArrayInfo mapping
This is useful for external users using IslExprBuilder, in case they cannot
embed ScopArrayInfo data into their isl_ids, because the isl_ids either already
carry other information or the isl_ids have been created and their user pointers
cannot be updated any more.

llvm-svn: 276268
2016-07-21 13:15:51 +00:00
Tobias Grosser 9d12d8ade3 BlockGenerator: remove dead instructions in normal statements
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.

llvm-svn: 276263
2016-07-21 11:48:36 +00:00
Tobias Grosser 212469e0ed tests: make test cases more robust using regexp
llvm-svn: 276262
2016-07-21 11:48:31 +00:00
Tobias Grosser 903eefd1f2 tests: fix order of memory accesses to ensure import succeeds
It seems the order in which we generated memory accesses changed such that
the import of these updated memory accesses failed for the 'loop3' statement
in this test case. Unfortunately, the existing CHECK lines were not strict
enough to catch this. Hence, besides fixing the order of the memory access
lines we also ensure that the memory access changes are both clearly visibly
and well checked.

llvm-svn: 276247
2016-07-21 07:12:17 +00:00