[OpenMP][NFC] Provide a new remark and documentation
If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439
This commit is contained in:
parent
e75fec2b23
commit
994bb6eb7d
|
@ -4,7 +4,7 @@
|
|||
|
||||
// host-no-diagnostics
|
||||
|
||||
void bar1(void) {
|
||||
void bar1(void) { // all-remark {{[OMP100] Potentially unknown OpenMP target region caller}}
|
||||
#pragma omp parallel // #0
|
||||
// all-remark@#0 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}}
|
||||
// safe-remark@#0 {{Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}}
|
||||
|
@ -13,7 +13,7 @@ void bar1(void) {
|
|||
{
|
||||
}
|
||||
}
|
||||
void bar2(void) {
|
||||
void bar2(void) { // all-remark {{[OMP100] Potentially unknown OpenMP target region caller}}
|
||||
#pragma omp parallel // #1
|
||||
// all-remark@#1 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}}
|
||||
// safe-remark@#1 {{Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}}
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
|
||||
// host-no-diagnostics
|
||||
|
||||
void bar(void) {
|
||||
void bar(void) { // expected-remark {{[OMP100] Potentially unknown OpenMP target region caller}}
|
||||
#pragma omp parallel // #1 \
|
||||
// expected-remark@#1 {{Found a parallel region that is called in a target region but not part of a combined target construct nor nesed inside a target construct without intermediate code. This can lead to excessive register usage for unrelated target regions in the same translation unit due to spurious call edges assumed by ptxas.}} \
|
||||
// expected-remark@#1 {{Parallel region is not known to be called from a unique single target region, maybe the surrounding function has external linkage?; will not attempt to rewrite the state machine use.}}
|
||||
|
|
|
@ -1469,8 +1469,16 @@ Kernel OpenMPOpt::getUniqueKernelFor(Function &F) {
|
|||
}
|
||||
|
||||
CachedKernel = nullptr;
|
||||
if (!F.hasLocalLinkage())
|
||||
if (!F.hasLocalLinkage()) {
|
||||
|
||||
// See https://openmp.llvm.org/remarks/OptimizationRemarks.html
|
||||
auto Remark = [&](OptimizationRemark OR) {
|
||||
return OR << "[OMP100] Potentially unknown OpenMP target region caller";
|
||||
};
|
||||
emitRemarkOnFunction(&F, "OMP100", Remark);
|
||||
|
||||
return nullptr;
|
||||
}
|
||||
}
|
||||
|
||||
auto GetUniqueKernelForUse = [&](const Use &U) -> Kernel {
|
||||
|
|
|
@ -1,2 +1,30 @@
|
|||
OpenMP Optimization Remarks
|
||||
===========================
|
||||
|
||||
|
||||
.. _omp100:
|
||||
.. _omp_no_external_caller_in_target_region:
|
||||
|
||||
`[OMP100]` Potentially unknown OpenMP target region caller
|
||||
----------------------------------------------------------
|
||||
|
||||
A function remark that indicates the function, when compiled for a GPU, is
|
||||
potentially called from outside the translation unit. Note that a remark is
|
||||
only issued if we tried to perform an optimization which would require us to
|
||||
know all callers on the GPU.
|
||||
|
||||
To facilitate OpenMP semantics on GPUs we provide a runtime mechanism through
|
||||
which the code that makes up the body of a parallel region is shared with the
|
||||
threads in the team. Generally we use the address of the outlined parallel
|
||||
region to identify the code that needs to be executed. If we know all target
|
||||
regions that reach the parallel region we can avoid this function pointer
|
||||
passing scheme and often improve the register usage on the GPU. However, If a
|
||||
parallel region on the GPU is in a function with external linkage we may not
|
||||
know all callers statically. If there are outside callers within target
|
||||
regions, this remark is to be ignored. If there are no such callers, users can
|
||||
modify the linkage and thereby help optimization with a `static` or
|
||||
`__attribute__((internal))` function annotation. If changing the linkage is
|
||||
impossible, e.g., because there are outside callers on the host, one can split
|
||||
the function into an external visible interface which is not compiled for
|
||||
the target and an internal implementation which is compiled for the target
|
||||
and should be called from within the target region.
|
||||
|
|
Loading…
Reference in New Issue