AMDGPU: Improve documentation.

Summary:
Add links to ISA manuals and ABI.
Add text about assembler syntax.
Add info about instructions operands.
Add instruction examples for each encoding.
Update directives section, add missing .amdgpu_hsa_kernel.

Reviewers: tstellarAMD, SamWot, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, artem.tamazov, llvm-commits

Differential Revision: https://reviews.llvm.org/D24724

llvm-svn: 281962
This commit is contained in:
Nikolay Haustov 2016-09-20 09:04:51 +00:00
parent 02efef0525
commit 96a56bd0c6
2 changed files with 221 additions and 71 deletions

View File

@ -8,6 +8,8 @@ Introduction
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
the R600 family up until the current Volcanic Islands (GCN Gen 3).
Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
for additional documentation.
Conventions
===========
@ -35,96 +37,241 @@ OpenCL standard.
Assembler
=========
The assembler is currently considered experimental.
AMDGPU backend has LLVM-MC based assembler which is currently in development.
It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
For syntax examples look in test/MC/AMDGPU.
This document describes general syntax for instructions and operands. For more
information about instructions, their semantics and supported combinations
of operands, refer to one of Instruction Set Architecture manuals.
Below some of the currently supported features (modulo bugs). These
all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
are also supported but may be missing some instructions and have more bugs:
An instruction has the following syntax (register operands are
normally comma-separated while extra operands are space-separated):
DS Instructions
---------------
All DS instructions are supported.
*<opcode> <register_operand0>, ... <extra_operand0> ...*
FLAT Instructions
------------------
These instructions are only present in the Sea Islands and Volcanic Islands
instruction set. All FLAT instructions are supported for these architectures
MUBUF Instructions
------------------
All non-atomic MUBUF instructions are supported.
Operands
--------
SMRD Instructions
-----------------
Only the s_load_dword* SMRD instructions are supported.
The following syntax for register operands is supported:
SOP1 Instructions
-----------------
All SOP1 instructions are supported.
* SGPR registers: s0, ... or s[0], ...
* VGPR registers: v0, ... or v[0], ...
* TTMP registers: ttmp0, ... or ttmp[0], ...
* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
* Register index expressions: v[2*2], s[1-1:2-1]
* 'off' indicates that an operand is not enabled
SOP2 Instructions
-----------------
All SOP2 instructions are supported.
The following extra operands are supported:
SOPC Instructions
-----------------
All SOPC instructions are supported.
* offset, offset0, offset1
* idxen, offen bits
* glc, slc, tfe bits
* waitcnt: integer or combination of counter values
* VOP3 modifiers:
SOPP Instructions
-----------------
- abs (\| \|), neg (\-)
Unless otherwise mentioned, all SOPP instructions that have one or more
operands accept integer operands only. No verification is performed
on the operands, so it is up to the programmer to be familiar with the
* DPP modifiers:
- row_shl, row_shr, row_ror, row_rol
- row_mirror, row_half_mirror, row_bcast
- wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
- row_mask, bank_mask, bound_ctrl
* SDWA modifiers:
- dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
- dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
- abs, neg, sext
DS Instructions Examples
------------------------
.. code-block:: nasm
ds_add_u32 v2, v4 offset:16
ds_write_src2_b64 v2 offset0:4 offset1:8
ds_cmpst_f32 v2, v4, v6
ds_min_rtn_f64 v[8:9], v2, v[4:5]
For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
FLAT Instruction Examples
--------------------------
.. code-block:: nasm
flat_load_dword v1, v[3:4]
flat_store_dwordx3 v[3:4], v[5:7]
flat_atomic_swap v1, v[3:4], v5 glc
flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
MUBUF Instruction Examples
---------------------------
.. code-block:: nasm
buffer_load_dword v1, off, s[4:7], s1
buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
buffer_store_format_xy v[1:2], off, s[4:7], s1
buffer_wbinvl1
buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
SMRD/SMEM Instruction Examples
-------------------------------
.. code-block:: nasm
s_load_dword s1, s[2:3], 0xfc
s_load_dwordx8 s[8:15], s[2:3], s4
s_load_dwordx16 s[88:103], s[2:3], s4
s_dcache_inv_vol
s_memtime s[4:5]
For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
SOP1 Instruction Examples
--------------------------
.. code-block:: nasm
s_mov_b32 s1, s2
s_mov_b64 s[0:1], 0x80000000
s_cmov_b32 s1, 200
s_wqm_b64 s[2:3], s[4:5]
s_bcnt0_i32_b64 s1, s[2:3]
s_swappc_b64 s[2:3], s[4:5]
s_cbranch_join s[4:5]
For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
SOP2 Instruction Examples
-------------------------
.. code-block:: nasm
s_add_u32 s1, s2, s3
s_and_b64 s[2:3], s[4:5], s[6:7]
s_cselect_b32 s1, s2, s3
s_andn2_b32 s2, s4, s6
s_lshr_b64 s[2:3], s[4:5], s6
s_ashr_i32 s2, s4, s6
s_bfm_b64 s[2:3], s4, s6
s_bfe_i64 s[2:3], s[4:5], s6
s_cbranch_g_fork s[4:5], s[6:7]
For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
SOPC Instruction Examples
--------------------------
.. code-block:: nasm
s_cmp_eq_i32 s1, s2
s_bitcmp1_b32 s1, s2
s_bitcmp0_b64 s[2:3], s4
s_setvskip s3, s5
For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
SOPP Instruction Examples
--------------------------
.. code-block:: nasm
s_barrier
s_nop 2
s_endpgm
s_waitcnt 0 ; Wait for all counters to be 0
s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
s_sethalt 9
s_sleep 10
s_sendmsg 0x1
s_sendmsg sendmsg(MSG_INTERRUPT)
s_trap 1
For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
Unless otherwise mentioned, little verification is performed on the operands
of SOPP Instrucitons, so it is up to the programmer to be familiar with the
range or acceptable values.
s_waitcnt
^^^^^^^^^
Vector ALU Instruction Examples
-------------------------------
s_waitcnt accepts named arguments to specify which memory counter(s) to
wait for.
For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
the assembler will automatically use optimal encoding based on its operands.
To force specific encoding, one can add a suffix to the opcode of the instruction:
* _e32 for 32-bit VOP1/VOP2/VOPC
* _e64 for 64-bit VOP3
* _dpp for VOP_DPP
* _sdwa for VOP_SDWA
VOP1/VOP2/VOP3/VOPC examples:
.. code-block:: nasm
; Wait for all counters to be 0
s_waitcnt 0
v_mov_b32 v1, v2
v_mov_b32_e32 v1, v2
v_nop
v_cvt_f64_i32_e32 v[1:2], v2
v_floor_f32_e32 v1, v2
v_bfrev_b32_e32 v1, v2
v_add_f32_e32 v1, v2, v3
v_mul_i32_i24_e64 v1, v2, 3
v_mul_i32_i24_e32 v1, -3, v3
v_mul_i32_i24_e32 v1, -100, v3
v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
v_max_f16_e32 v1, v2, v3
; Equivalent to s_waitcnt 0. Counter names can also be delimited by
; '&' or ','.
s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
; Wait for vmcnt counter to be 1.
s_waitcnt vmcnt(1)
VOP1, VOP2, VOP3, VOPC Instructions
-----------------------------------
All 32-bit and 64-bit encodings should work.
The assembler will automatically detect which encoding size to use for
VOP1, VOP2, and VOPC instructions based on the operands. If you want to force
a specific encoding size, you can add an _e32 (for 32-bit encoding) or
_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all
instructions support an explicit suffix. These are all valid assembly
strings:
VOP_DPP examples:
.. code-block:: nasm
v_mul_i32_i24 v1, v2, v3
v_mul_i32_i24_e32 v1, v2, v3
v_mul_i32_i24_e64 v1, v2, v3
v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
v_mov_b32 v0, v0 wave_shl:1
v_mov_b32 v0, v0 row_mirror
v_mov_b32 v0, v0 row_bcast:31
v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
Assembler Directives
--------------------
VOP_SDWA examples:
.. code-block:: nasm
v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
For full list of supported instructions, refer to "Vector ALU instructions".
HSA Code Object Directives
--------------------------
AMDGPU ABI defines auxiliary data in output code object. In assembly source,
one can specify them with assembler directives.
.hsa_code_object_version major, minor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*major* and *minor* are integers that specify the version of the HSA code
object that will be generated by the assembler. This value will be stored
in an entry of the .note section.
object that will be generated by the assembler.
.hsa_code_object_isa [major, minor, stepping, vendor, arch]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -135,12 +282,14 @@ set architecture (ISA) version of the assembly program.
*vendor* and *arch* are quoted strings. *vendor* should always be equal to
"AMD" and *arch* should always be equal to "AMDGPU".
If no arguments are specified, then the assembler will derive the ISA version,
*vendor*, and *arch* from the value of the -mcpu option that is passed to the
assembler.
By default, the assembler will derive the ISA version, *vendor*, and *arch*
from the value of the -mcpu option that is passed to the assembler.
ISA version, *vendor*, and *arch* will all be stored in a single entry of the
.note section.
.amdgpu_hsa_kernel (name)
^^^^^^^^^^^^^^^^^^^^^^^^^
This directives specifies that the symbol with given name is a kernel entry point
(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
.amd_kernel_code_t
^^^^^^^^^^^^^^^^^^
@ -165,9 +314,8 @@ used. The default value for all keys is 0, with the following exceptions:
The *.amd_kernel_code_t* directive must be placed immediately after the
function label and before any instructions.
For a full list of amd_kernel_code_t keys, see the examples in
test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different
keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
Here is an example of a minimal amd_kernel_code_t specification:

View File

@ -78,8 +78,10 @@ AMDGPU
* `AMD Cayman/Trinity shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf>`_
* `AMD Southern Islands Series ISA <http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf>`_
* `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
* `AMD GCN3 Instruction Set Architecture <http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf>`__
* `AMD GPU Programming Guide <http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf>`_
* `AMD Compute Resources <http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/documentation/>`_
* `AMDGPU Compute Application Binary Interface <https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc/blob/master/AMDGPU-ABI.md>`__
SPARC
-----