[AMDGPU] Correct gfx940 memory model documentation.

Differential Revision: https://reviews.llvm.org/D121397
This commit is contained in:
Stanislav Mekhanoshin 2022-03-10 11:22:16 -08:00
parent 2ebe971103
commit 3a37d08b35
1 changed files with 16 additions and 13 deletions

View File

@ -8712,12 +8712,17 @@ For GFX940:
work-group since they execute on the same CU. The exception is when in
tgsplit execution mode as wavefronts of the same work-group can be in
different CUs and so a ``buffer_inv sc0`` is required which will invalidate
the L1 cache is in tgsplit mode.
the L1 cache.
* A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
* A ``buffer_inv sc0`` is required to invalidate the L1 cache for coherence
between wavefronts executing in different work-groups as they may be
executing on different CUs.
* Atomic read-modify-write instructions implicitly bypass the L1 cache.
Therefore, they do not use the sc0 bit for coherence and instead use it to
indicate if the instruction returns the original value being updated. They
do use sc1 to indicate system or agent scope coherence.
* The scalar memory operations access a scalar L1 cache shared by all wavefronts
on a group of CUs. The scalar and vector L1 caches are not coherent. However,
scalar operations are used in a restricted way so do not impact the memory
@ -8891,8 +8896,6 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
- generic sc0=1 sc1=1
store atomic monotonic - singlethread - global 1. buffer/global/flat_store
- wavefront - generic
store atomic monotonic - singlethread - global 1. buffer/global/flat_store
- wavefront - generic
store atomic monotonic - workgroup - global 1. buffer/global/flat_store
- generic sc0=1
store atomic monotonic - agent - global 1. buffer/global/flat_store
@ -9639,7 +9642,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
store that is being
released.
3. buffer/global/flat_store sc1=1
3. buffer/global/flat_store sc1=1
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
- generic
- Must happen before
@ -9694,7 +9697,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
store that is being
released.
2. buffer/global/flat_store
3. buffer/global/flat_store
sc0=1 sc1=1
atomicrmw release - singlethread - global 1. buffer/global/flat_atomic
- wavefront - generic
@ -10878,7 +10881,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
------------------------------------------------------------------------------------
load atomic seq_cst - singlethread - global *Same as corresponding
- wavefront - local load atomic acquire,
- generic except must generated
- generic except must generate
all instructions even
for OpenCL.*
load atomic seq_cst - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
@ -10963,7 +10966,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
instructions same as
corresponding load
atomic acquire,
except must generated
except must generate
all instructions even
for OpenCL.*
load atomic seq_cst - workgroup - local *If TgSplit execution mode,
@ -10972,7 +10975,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
*Same as corresponding
load atomic acquire,
except must generated
except must generate
all instructions even
for OpenCL.*
@ -11066,22 +11069,22 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
instructions same as
corresponding load
atomic acquire,
except must generated
except must generate
all instructions even
for OpenCL.*
store atomic seq_cst - singlethread - global *Same as corresponding
- wavefront - local store atomic release,
- workgroup - generic except must generated
- workgroup - generic except must generate
- agent all instructions even
- system for OpenCL.*
atomicrmw seq_cst - singlethread - global *Same as corresponding
- wavefront - local atomicrmw acq_rel,
- workgroup - generic except must generated
- workgroup - generic except must generate
- agent all instructions even
- system for OpenCL.*
fence seq_cst - singlethread *none* *Same as corresponding
- wavefront fence acq_rel,
- workgroup except must generated
- workgroup except must generate
- agent all instructions even
- system for OpenCL.*
============ ============ ============== ========== ================================