1============================== 2User Guide for AMDGPU Back-end 3============================== 4 5Introduction 6============ 7 8The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with 9the R600 family up until the current Volcanic Islands (GCN Gen 3). 10 11 12Assembler 13========= 14 15The assembler is currently considered experimental. 16 17For syntax examples look in test/MC/AMDGPU. 18 19Below some of the currently supported features (modulo bugs). These 20all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands 21are also supported but may be missing some instructions and have more bugs: 22 23DS Instructions 24--------------- 25All DS instructions are supported. 26 27FLAT Instructions 28------------------ 29These instructions are only present in the Sea Islands and Volcanic Islands 30instruction set. All FLAT instructions are supported for these architectures 31 32MUBUF Instructions 33------------------ 34All non-atomic MUBUF instructions are supported. 35 36SMRD Instructions 37----------------- 38Only the s_load_dword* SMRD instructions are supported. 39 40SOP1 Instructions 41----------------- 42All SOP1 instructions are supported. 43 44SOP2 Instructions 45----------------- 46All SOP2 instructions are supported. 47 48SOPC Instructions 49----------------- 50All SOPC instructions are supported. 51 52SOPP Instructions 53----------------- 54 55Unless otherwise mentioned, all SOPP instructions that have one or more 56operands accept integer operands only. No verification is performed 57on the operands, so it is up to the programmer to be familiar with the 58range or acceptable values. 59 60s_waitcnt 61^^^^^^^^^ 62 63s_waitcnt accepts named arguments to specify which memory counter(s) to 64wait for. 65 66.. code-block:: nasm 67 68 // Wait for all counters to be 0 69 s_waitcnt 0 70 71 // Equivalent to s_waitcnt 0. Counter names can also be delimited by 72 // '&' or ','. 73 s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) 74 75 // Wait for vmcnt counter to be 1. 76 s_waitcnt vmcnt(1) 77 78VOP1, VOP2, VOP3, VOPC Instructions 79----------------------------------- 80 81All 32-bit and 64-bit encodings should work. 82 83The assembler will automatically detect which encoding size to use for 84VOP1, VOP2, and VOPC instructions based on the operands. If you want to force 85a specific encoding size, you can add an _e32 (for 32-bit encoding) or 86_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all 87instructions support an explicit suffix. These are all valid assembly 88strings: 89 90.. code-block:: nasm 91 92 v_mul_i32_i24 v1, v2, v3 93 v_mul_i32_i24_e32 v1, v2, v3 94 v_mul_i32_i24_e64 v1, v2, v3 95 96Assembler Directives 97-------------------- 98 99.hsa_code_object_version major, minor 100^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 101 102*major* and *minor* are integers that specify the version of the HSA code 103object that will be generated by the assembler. This value will be stored 104in an entry of the .note section. 105 106.hsa_code_object_isa [major, minor, stepping, vendor, arch] 107^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 108 109*major*, *minor*, and *stepping* are all integers that describe the instruction 110set architecture (ISA) version of the assembly program. 111 112*vendor* and *arch* are quoted strings. *vendor* should always be equal to 113"AMD" and *arch* should always be equal to "AMDGPU". 114 115If no arguments are specified, then the assembler will derive the ISA version, 116*vendor*, and *arch* from the value of the -mcpu option that is passed to the 117assembler. 118 119ISA version, *vendor*, and *arch* will all be stored in a single entry of the 120.note section. 121 122.amd_kernel_code_t 123^^^^^^^^^^^^^^^^^^ 124 125This directive marks the beginning of a list of key / value pairs that are used 126to specify the amd_kernel_code_t object that will be emitted by the assembler. 127The list must be terminated by the *.end_amd_kernel_code_t* directive. For 128any amd_kernel_code_t values that are unspecified a default value will be 129used. The default value for all keys is 0, with the following exceptions: 130 131- *kernel_code_version_major* defaults to 1. 132- *machine_kind* defaults to 1. 133- *machine_version_major*, *machine_version_minor*, and 134 *machine_version_stepping* are derived from the value of the -mcpu option 135 that is passed to the assembler. 136- *kernel_code_entry_byte_offset* defaults to 256. 137- *wavefront_size* defaults to 6. 138- *kernarg_segment_alignment*, *group_segment_alignment*, and 139 *private_segment_alignment* default to 4. Note that alignments are specified 140 as a power of two, so a value of **n** means an alignment of 2^ **n**. 141 142The *.amd_kernel_code_t* directive must be placed immediately after the 143function label and before any instructions. 144 145For a full list of amd_kernel_code_t keys, see the examples in 146test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different 147keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h 148 149Here is an example of a minimal amd_kernel_code_t specification: 150 151.. code-block:: nasm 152 153 .hsa_code_object_version 1,0 154 .hsa_code_object_isa 155 156 .text 157 158 hello_world: 159 160 .amd_kernel_code_t 161 enable_sgpr_kernarg_segment_ptr = 1 162 is_ptr64 = 1 163 compute_pgm_rsrc1_vgprs = 0 164 compute_pgm_rsrc1_sgprs = 0 165 compute_pgm_rsrc2_user_sgpr = 2 166 kernarg_segment_byte_size = 8 167 wavefront_sgpr_count = 2 168 workitem_vgpr_count = 3 169 .end_amd_kernel_code_t 170 171 s_load_dwordx2 s[0:1], s[0:1] 0x0 172 v_mov_b32 v0, 3.14159 173 s_waitcnt lgkmcnt(0) 174 v_mov_b32 v1, s0 175 v_mov_b32 v2, s1 176 flat_store_dword v0, v[1:2] 177 s_endpgm 178