[AMDGPU] Update AMDGOUUsage.rst descriptions

- Improve description of XNACK ELF flag. - Rename all uses of wave to wavefront to be consistent. Differential Revision: https://reviews.llvm.org/D43983 llvm-svn: 326989
2018-03-08 05:46:01 +00:00 · 2018-03-08 05:46:01 +00:00 · 5bbcca6967
parent 003be7cbf4
commit 5bbcca6967
1 changed files with 32 additions and 27 deletions
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@ -503,6 +503,11 @@ The AMDGPU backend uses the following ELF header:
                                                  target feature is
                                                  enabled for all code
                                                  contained in the code object.
+                                                  If the processor
+                                                  does not support the
+                                                  ``xnack`` target
+                                                  feature then must
+                                                  be 0.
                                                  See
                                                  :ref:`amdgpu-target-features`.
     ================================= ========== =============================
@ -1455,7 +1460,7 @@ address to physical address is:
 There are different ways that the wavefront scratch base address is determined
 by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This
 memory can be accessed in an interleaved manner using buffer instruction with
-the scratch buffer descriptor and per wave scratch offset, by the scratch
+the scratch buffer descriptor and per wavefront scratch offset, by the scratch
 instructions, or by flat instructions. If each lane of a wavefront accesses the
 same private address, the interleaving results in adjacent dwords being accessed
 and hence requires fewer cache lines to be fetched. Multi-dword access is not
@ -1796,7 +1801,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
     Bits    Size    Field Name                      Description
     ======= ======= =============================== ===========================================================================
     0       1 bit   ENABLE_SGPR_PRIVATE_SEGMENT     Enable the setup of the
-                     _WAVE_OFFSET                    SGPR wave scratch offset
+                     _WAVEFRONT_OFFSET               SGPR wavefront scratch offset
                                                     system register (see
                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).

@ -1883,7 +1888,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     exceptions exceptions
                                                     enabled which are generated
                                                     when a memory violation has
-                                                     occurred for this wave from
+                                                     occurred for this wavefront from
                                                     L1 or LDS
                                                     (write-to-read-only-memory,
                                                     mis-aligned atomic, LDS
@ -2007,10 +2012,10 @@ SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
 an SGPR number.

 The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
-all waves of the grid. It is possible to specify more than 16 User SGPRs using
+all wavefronts of the grid. It is possible to specify more than 16 User SGPRs using
 the ``enable_sgpr_*`` bit fields, in which case only the first 16 are actually
 initialized. These are then immediately followed by the System SGPRs that are
-set up by ADC/SPI and can have different values for each wave of the grid
+set up by ADC/SPI and can have different values for each wavefront of the grid
 dispatch.

 SGPR register initial state is defined in
@ -2025,10 +2030,10 @@ SGPR register initial state is defined in
                field)                     SGPRs
     ========== ========================== ====== ==============================
     First      Private Segment Buffer     4      V# that can be used, together
-                (enable_sgpr_private              with Scratch Wave Offset as an
-                _segment_buffer)                  offset, to access the private
-                                                  memory space using a segment
-                                                  address.
+                (enable_sgpr_private              with Scratch Wavefront Offset
+                _segment_buffer)                  as an offset, to access the
+                                                  private memory space using a
+                                                  segment address.

                                                  CP uses the value provided by
                                                  the runtime.
@ -2068,7 +2073,7 @@ SGPR register initial state is defined in
                                                    address is
                                                    ``SH_HIDDEN_PRIVATE_BASE_VIMID``
                                                    plus this offset.) The value
-                                                    of Scratch Wave Offset must
+                                                    of Scratch Wavefront Offset must
                                                    be added to this offset by
                                                    the kernel machine code,
                                                    right shifted by 8, and
@ -2078,13 +2083,13 @@ SGPR register initial state is defined in
                                                    to SGPRn-4 on GFX7, and
                                                    SGPRn-6 on GFX8 (where SGPRn
                                                    is the highest numbered SGPR
-                                                    allocated to the wave).
+                                                    allocated to the wavefront).
                                                    FLAT_SCRATCH_HI is
                                                    multiplied by 256 (as it is
                                                    in units of 256 bytes) and
                                                    added to
                                                    ``SH_HIDDEN_PRIVATE_BASE_VIMID``
-                                                    to calculate the per wave
+                                                    to calculate the per wavefront
                                                    FLAT SCRATCH BASE in flat
                                                    memory instructions that
                                                    access the scratch
@ -2124,7 +2129,7 @@ SGPR register initial state is defined in
                                                    divides it if there are
                                                    multiple Shader Arrays each
                                                    with its own SPI). The value
-                                                    of Scratch Wave Offset must
+                                                    of Scratch Wavefront Offset must
                                                    be added by the kernel
                                                    machine code and the result
                                                    moved to the FLAT_SCRATCH
@ -2193,12 +2198,12 @@ SGPR register initial state is defined in
     then       Work-Group Id Z            1      32 bit work-group id in Z
                (enable_sgpr_workgroup_id         dimension of grid for
                _Z)                               wavefront.
-     then       Work-Group Info            1      {first_wave, 14'b0000,
+     then       Work-Group Info            1      {first_wavefront, 14'b0000,
                (enable_sgpr_workgroup            ordered_append_term[10:0],
-                _info)                            threadgroup_size_in_waves[5:0]}
-     then       Scratch Wave Offset        1      32 bit byte offset from base
+                _info)                            threadgroup_size_in_wavefronts[5:0]}
+     then       Scratch Wavefront Offset   1      32 bit byte offset from base
                (enable_sgpr_private              of scratch base of queue
-                _segment_wave_offset)             executing the kernel
+                _segment_wavefront_offset)        executing the kernel
                                                  dispatch. Must be used as an
                                                  offset with Private
                                                  segment address when using
@ -2244,8 +2249,8 @@ The setting of registers is is done by GPU CP/ADC/SPI hardware as follows:
   registers.
 2. Work-group Id registers X, Y, Z are set by ADC which supports any
   combination including none.
-3. Scratch Wave Offset is set by SPI in a per wave basis which is why its value
-   cannot included with the flat scratch init value which is per queue.
+3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why
+   its value cannot included with the flat scratch init value which is per queue.
 4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
   or (X, Y, Z).

@ -2293,7 +2298,7 @@ Flat Scratch

 If the kernel may use flat operations to access scratch memory, the prolog code
 must set up FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which
-are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wave
+are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wavefront
 Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):

 GFX6
@ -2304,7 +2309,7 @@ GFX7-GFX8
     ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
     being managed by SPI for the queue executing the kernel dispatch. This is
     the same value used in the Scratch Segment Buffer V# base address. The
-     prolog must add the value of Scratch Wave Offset to get the wave's byte
+     prolog must add the value of Scratch Wavefront Offset to get the wavefront's byte
     scratch backing memory offset from ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since
     FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right shifted
     by 8 before moving into FLAT_SCRATCH_LO.
@ -2318,7 +2323,7 @@ GFX7-GFX8
 GFX9
  The Flat Scratch Init is the 64 bit address of the base of scratch backing
  memory being managed by SPI for the queue executing the kernel dispatch. The
-  prolog must add the value of Scratch Wave Offset and moved to the FLAT_SCRATCH
+  prolog must add the value of Scratch Wavefront Offset and moved to the FLAT_SCRATCH
  pair for use as the flat scratch base in flat memory instructions.

 .. _amdgpu-amdhsa-memory-model:
@ -2384,12 +2389,12 @@ For GFX6-GFX9:
  global order and involve no caching. Completion is reported to a wavefront in
  execution order.
 * The LDS memory has multiple request queues shared by the SIMDs of a
-  CU. Therefore, the LDS operations performed by different waves of a work-group
+  CU. Therefore, the LDS operations performed by different wavefronts of a work-group
  can be reordered relative to each other, which can result in reordering the
  visibility of vector memory operations with respect to LDS operations of other
  wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
  ensure synchronization between LDS operations and vector memory operations
-  between waves of a work-group, but not between operations performed by the
+  between wavefronts of a work-group, but not between operations performed by the
  same wavefront.
 * The vector memory operations are performed as wavefront wide operations and
  completion is reported to a wavefront in execution order. The exception is
@ -2399,7 +2404,7 @@ For GFX6-GFX9:
 * The vector memory operations access a single vector L1 cache shared by all
  SIMDs a CU. Therefore, no special action is required for coherence between the
  lanes of a single wavefront, or for coherence between wavefronts in the same
-  work-group. A ``buffer_wbinvl1_vol`` is required for coherence between waves
+  work-group. A ``buffer_wbinvl1_vol`` is required for coherence between wavefronts
  executing in different work-groups as they may be executing on different CUs.
 * The scalar memory operations access a scalar L1 cache shared by all wavefronts
  on a group of CUs. The scalar and vector L1 caches are not coherent. However,
@ -2410,7 +2415,7 @@ For GFX6-GFX9:
 * The L2 cache has independent channels to service disjoint ranges of virtual
  addresses.
 * Each CU has a separate request queue per channel. Therefore, the vector and
-  scalar memory operations performed by waves executing in different work-groups
+  scalar memory operations performed by wavefronts executing in different work-groups
  (which may be executing on different CUs) of an agent can be reordered
  relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure
  synchronization between vector memory operations of different CUs. It ensures a
@ -2460,7 +2465,7 @@ case the AMDGPU backend ensures the memory location used to spill is never
 accessed by vector memory operations at the same time. If scalar writes are used
 then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
 return since the locations may be used for vector memory instructions by a
-future wave that uses the same scratch area, or a function call that creates a
+future wavefront that uses the same scratch area, or a function call that creates a
 frame at the same address, respectively. There is no need for a ``s_dcache_inv``
 as all scalar writes are write-before-read in the same thread.