[coroutines] Part 2 of N: Adding Coroutine Intrinsics

This is the second patch in the coroutine series. It adds coroutine
intrinsics and updates intrinsic cost in TargetTransformInfoImpl.h.

Patch by Gor Nishanov!

Differential Revision: https://reviews.llvm.org/D22659

llvm-svn: 276839
This commit is contained in:
David Majnemer 2016-07-27 05:12:35 +00:00
parent 18c964d7a4
commit 7855719c10
3 changed files with 72 additions and 27 deletions

View File

@ -215,15 +215,9 @@ RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
If a coroutine uses allocation and deallocation functions that are known to
LLVM, unused calls to `malloc` and calls to `free` with `null` argument will be
removed as dead code. However, if custom allocation functions are used, the
`coro.alloc` and `coro.free` intrinsics can be used to enable removal of custom
allocation and deallocation code when coroutine does not require dynamic
allocation of the coroutine frame.
In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
when dynamic allocation is required, and non-null otherwise:
when dynamic allocation is required, and an address of an alloca on the caller's
frame where coroutine frame can be stored if dynamic allocation is elided.
.. code-block:: llvm
@ -256,8 +250,7 @@ thus skipping the deallocation code:
...
With allocations and deallocations represented as described as above, after
coroutine heap allocation elision optimization, the resulting main will end up
looking just like it was when we used `malloc` and `free`:
coroutine heap allocation elision optimization, the resulting main will be:
.. code-block:: llvm
@ -419,12 +412,19 @@ store the current value produced by a coroutine.
entry:
%promise = alloca i32
%pv = bitcast i32* %promise to i8*
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%size = call i32 @llvm.coro.size.i32()
%alloc = call i8* @malloc(i32 %size)
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null)
br label %loop
loop:
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
%n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
%inc = add nsw i32 %n.val, 1
store i32 %n.val, i32* %promise
%0 = call i8 @llvm.coro.suspend(token none, i1 false)
@ -461,8 +461,7 @@ coroutine promise.
ret i32 0
}
After example in this section is compiled, result of the compilation will
exactly like the result of the very first example:
After example in this section is compiled, result of the compilation will be:
.. code-block:: llvm
@ -758,14 +757,13 @@ the coroutine frame.
Overview:
"""""""""
The '``llvm.coro.begin``' intrinsic returns an address of the
coroutine frame.
The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
Arguments:
""""""""""
The first argument is a pointer to a block of memory in which coroutine frame
may use if memory for the coroutine frame needs to be allocated dynamically.
The first argument is a pointer to a block of memory where coroutine frame
will be stored.
The second argument provides information on the alignment of the memory returned
by the allocation function and given to `coro.begin` by the first argument. If
@ -788,7 +786,7 @@ may be at offset to the `%mem` argument. (This could be beneficial if
instructions that express relative access to data can be more compactly encoded
with small positive and negative offsets).
Frontend should emit exactly one `coro.begin` intrinsic per coroutine.
A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
.. _coro.free:
@ -861,10 +859,8 @@ Semantics:
If the coroutine is eligible for heap elision, this intrinsic is lowered to an
alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
This intrinsic only needs to be used if a custom allocation function is used
(i.e. a function not recognized by LLVM as a memory allocation function) and the
language rules allow for custom allocation / deallocation to be elided when not
needed.
A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
Example:
""""""""
@ -1076,7 +1072,7 @@ to the coroutine:
Overview:
"""""""""
The '``llvm.coro.param``' is used by the frontend to mark up the code used to
The '``llvm.coro.param``' is used by a frontend to mark up the code used to
construct and destruct copies of the parameters. If the optimizer discovers that
a particular parameter copy is not used after any suspends, it can remove the
construction and destruction of the copy by replacing corresponding coro.param
@ -1180,8 +1176,8 @@ earlier passes.
Upstreaming sequence (rough plan)
=================================
#. Add documentation. <= we are here
#. Add coroutine intrinsics.
#. Add documentation.
#. Add coroutine intrinsics. <= we are here
#. Add empty coroutine passes.
#. Add coroutine devirtualization + tests.
#. Add CGSCC restart trigger + tests.

View File

@ -152,6 +152,15 @@ public:
case Intrinsic::var_annotation:
case Intrinsic::experimental_gc_result:
case Intrinsic::experimental_gc_relocate:
case Intrinsic::coro_alloc:
case Intrinsic::coro_begin:
case Intrinsic::coro_free:
case Intrinsic::coro_end:
case Intrinsic::coro_frame:
case Intrinsic::coro_size:
case Intrinsic::coro_suspend:
case Intrinsic::coro_param:
case Intrinsic::coro_subfn_addr:
// These intrinsics don't actually represent code after lowering.
return TTI::TCC_Free;
}

View File

@ -597,7 +597,47 @@ def int_experimental_gc_relocate : Intrinsic<[llvm_any_ty],
[llvm_token_ty, llvm_i32_ty, llvm_i32_ty],
[IntrReadMem]>;
//===-------------------------- Other Intrinsics --------------------------===//
//===------------------------ Coroutine Intrinsics ---------------===//
// These are documented in docs/Coroutines.rst
// Coroutine Structure Intrinsics.
def int_coro_alloc : Intrinsic<[llvm_ptr_ty], [], []>;
def int_coro_begin : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i32_ty,
llvm_ptr_ty, llvm_ptr_ty],
[WriteOnly<0>, ReadNone<2>, ReadOnly<3>,
NoCapture<3>]>;
def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
[IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>;
def int_coro_end : Intrinsic<[], [llvm_ptr_ty, llvm_i1_ty], []>;
def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;
def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;
def int_coro_param : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_ptr_ty],
[IntrNoMem, ReadNone<0>, ReadNone<1>]>;
// Coroutine Manipulation Intrinsics.
def int_coro_resume : Intrinsic<[], [llvm_ptr_ty], [Throws]>;
def int_coro_destroy : Intrinsic<[], [llvm_ptr_ty], [Throws]>;
def int_coro_done : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty],
[IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>;
def int_coro_promise : Intrinsic<[llvm_ptr_ty],
[llvm_ptr_ty, llvm_i32_ty, llvm_i1_ty],
[IntrNoMem, NoCapture<0>]>;
// Coroutine Lowering Intrinsics. Used internally by coroutine passes.
def int_coro_subfn_addr : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i8_ty],
[IntrArgMemOnly, ReadOnly<0>,
NoCapture<0>]>;
///===-------------------------- Other Intrinsics --------------------------===//
//
def int_flt_rounds : Intrinsic<[llvm_i32_ty]>,
GCCBuiltin<"__builtin_flt_rounds">;