[coroutines] Part 2 of N: Adding Coroutine Intrinsics
This is the second patch in the coroutine series. It adds coroutine intrinsics and updates intrinsic cost in TargetTransformInfoImpl.h. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22659 llvm-svn: 276839
This commit is contained in:
parent
18c964d7a4
commit
7855719c10
|
@ -215,15 +215,9 @@ RAII idiom and is suitable for allocation elision optimization which avoid
|
|||
dynamic allocation by storing the coroutine frame as a static `alloca` in its
|
||||
caller.
|
||||
|
||||
If a coroutine uses allocation and deallocation functions that are known to
|
||||
LLVM, unused calls to `malloc` and calls to `free` with `null` argument will be
|
||||
removed as dead code. However, if custom allocation functions are used, the
|
||||
`coro.alloc` and `coro.free` intrinsics can be used to enable removal of custom
|
||||
allocation and deallocation code when coroutine does not require dynamic
|
||||
allocation of the coroutine frame.
|
||||
|
||||
In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
|
||||
when dynamic allocation is required, and non-null otherwise:
|
||||
when dynamic allocation is required, and an address of an alloca on the caller's
|
||||
frame where coroutine frame can be stored if dynamic allocation is elided.
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
|
@ -256,8 +250,7 @@ thus skipping the deallocation code:
|
|||
...
|
||||
|
||||
With allocations and deallocations represented as described as above, after
|
||||
coroutine heap allocation elision optimization, the resulting main will end up
|
||||
looking just like it was when we used `malloc` and `free`:
|
||||
coroutine heap allocation elision optimization, the resulting main will be:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
|
@ -419,12 +412,19 @@ store the current value produced by a coroutine.
|
|||
entry:
|
||||
%promise = alloca i32
|
||||
%pv = bitcast i32* %promise to i8*
|
||||
%elide = call i8* @llvm.coro.alloc()
|
||||
%need.dyn.alloc = icmp ne i8* %elide, null
|
||||
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
|
||||
dyn.alloc:
|
||||
%size = call i32 @llvm.coro.size.i32()
|
||||
%alloc = call i8* @malloc(i32 %size)
|
||||
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
|
||||
br label %coro.begin
|
||||
coro.begin:
|
||||
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
|
||||
%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null)
|
||||
br label %loop
|
||||
loop:
|
||||
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
|
||||
%n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
|
||||
%inc = add nsw i32 %n.val, 1
|
||||
store i32 %n.val, i32* %promise
|
||||
%0 = call i8 @llvm.coro.suspend(token none, i1 false)
|
||||
|
@ -461,8 +461,7 @@ coroutine promise.
|
|||
ret i32 0
|
||||
}
|
||||
|
||||
After example in this section is compiled, result of the compilation will
|
||||
exactly like the result of the very first example:
|
||||
After example in this section is compiled, result of the compilation will be:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
|
@ -758,14 +757,13 @@ the coroutine frame.
|
|||
Overview:
|
||||
"""""""""
|
||||
|
||||
The '``llvm.coro.begin``' intrinsic returns an address of the
|
||||
coroutine frame.
|
||||
The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
|
||||
|
||||
Arguments:
|
||||
""""""""""
|
||||
|
||||
The first argument is a pointer to a block of memory in which coroutine frame
|
||||
may use if memory for the coroutine frame needs to be allocated dynamically.
|
||||
The first argument is a pointer to a block of memory where coroutine frame
|
||||
will be stored.
|
||||
|
||||
The second argument provides information on the alignment of the memory returned
|
||||
by the allocation function and given to `coro.begin` by the first argument. If
|
||||
|
@ -788,7 +786,7 @@ may be at offset to the `%mem` argument. (This could be beneficial if
|
|||
instructions that express relative access to data can be more compactly encoded
|
||||
with small positive and negative offsets).
|
||||
|
||||
Frontend should emit exactly one `coro.begin` intrinsic per coroutine.
|
||||
A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
|
||||
|
||||
.. _coro.free:
|
||||
|
||||
|
@ -861,10 +859,8 @@ Semantics:
|
|||
|
||||
If the coroutine is eligible for heap elision, this intrinsic is lowered to an
|
||||
alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
|
||||
This intrinsic only needs to be used if a custom allocation function is used
|
||||
(i.e. a function not recognized by LLVM as a memory allocation function) and the
|
||||
language rules allow for custom allocation / deallocation to be elided when not
|
||||
needed.
|
||||
|
||||
A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
|
||||
|
||||
Example:
|
||||
""""""""
|
||||
|
@ -1076,7 +1072,7 @@ to the coroutine:
|
|||
Overview:
|
||||
"""""""""
|
||||
|
||||
The '``llvm.coro.param``' is used by the frontend to mark up the code used to
|
||||
The '``llvm.coro.param``' is used by a frontend to mark up the code used to
|
||||
construct and destruct copies of the parameters. If the optimizer discovers that
|
||||
a particular parameter copy is not used after any suspends, it can remove the
|
||||
construction and destruction of the copy by replacing corresponding coro.param
|
||||
|
@ -1180,8 +1176,8 @@ earlier passes.
|
|||
|
||||
Upstreaming sequence (rough plan)
|
||||
=================================
|
||||
#. Add documentation. <= we are here
|
||||
#. Add coroutine intrinsics.
|
||||
#. Add documentation.
|
||||
#. Add coroutine intrinsics. <= we are here
|
||||
#. Add empty coroutine passes.
|
||||
#. Add coroutine devirtualization + tests.
|
||||
#. Add CGSCC restart trigger + tests.
|
||||
|
|
|
@ -152,6 +152,15 @@ public:
|
|||
case Intrinsic::var_annotation:
|
||||
case Intrinsic::experimental_gc_result:
|
||||
case Intrinsic::experimental_gc_relocate:
|
||||
case Intrinsic::coro_alloc:
|
||||
case Intrinsic::coro_begin:
|
||||
case Intrinsic::coro_free:
|
||||
case Intrinsic::coro_end:
|
||||
case Intrinsic::coro_frame:
|
||||
case Intrinsic::coro_size:
|
||||
case Intrinsic::coro_suspend:
|
||||
case Intrinsic::coro_param:
|
||||
case Intrinsic::coro_subfn_addr:
|
||||
// These intrinsics don't actually represent code after lowering.
|
||||
return TTI::TCC_Free;
|
||||
}
|
||||
|
|
|
@ -597,7 +597,47 @@ def int_experimental_gc_relocate : Intrinsic<[llvm_any_ty],
|
|||
[llvm_token_ty, llvm_i32_ty, llvm_i32_ty],
|
||||
[IntrReadMem]>;
|
||||
|
||||
//===-------------------------- Other Intrinsics --------------------------===//
|
||||
//===------------------------ Coroutine Intrinsics ---------------===//
|
||||
// These are documented in docs/Coroutines.rst
|
||||
|
||||
// Coroutine Structure Intrinsics.
|
||||
|
||||
def int_coro_alloc : Intrinsic<[llvm_ptr_ty], [], []>;
|
||||
def int_coro_begin : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i32_ty,
|
||||
llvm_ptr_ty, llvm_ptr_ty],
|
||||
[WriteOnly<0>, ReadNone<2>, ReadOnly<3>,
|
||||
NoCapture<3>]>;
|
||||
|
||||
def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
|
||||
[IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>;
|
||||
def int_coro_end : Intrinsic<[], [llvm_ptr_ty, llvm_i1_ty], []>;
|
||||
|
||||
def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
|
||||
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
|
||||
|
||||
def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;
|
||||
def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;
|
||||
|
||||
def int_coro_param : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_ptr_ty],
|
||||
[IntrNoMem, ReadNone<0>, ReadNone<1>]>;
|
||||
|
||||
// Coroutine Manipulation Intrinsics.
|
||||
|
||||
def int_coro_resume : Intrinsic<[], [llvm_ptr_ty], [Throws]>;
|
||||
def int_coro_destroy : Intrinsic<[], [llvm_ptr_ty], [Throws]>;
|
||||
def int_coro_done : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty],
|
||||
[IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>;
|
||||
def int_coro_promise : Intrinsic<[llvm_ptr_ty],
|
||||
[llvm_ptr_ty, llvm_i32_ty, llvm_i1_ty],
|
||||
[IntrNoMem, NoCapture<0>]>;
|
||||
|
||||
// Coroutine Lowering Intrinsics. Used internally by coroutine passes.
|
||||
|
||||
def int_coro_subfn_addr : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i8_ty],
|
||||
[IntrArgMemOnly, ReadOnly<0>,
|
||||
NoCapture<0>]>;
|
||||
|
||||
///===-------------------------- Other Intrinsics --------------------------===//
|
||||
//
|
||||
def int_flt_rounds : Intrinsic<[llvm_i32_ty]>,
|
||||
GCCBuiltin<"__builtin_flt_rounds">;
|
||||
|
|
Loading…
Reference in New Issue