1======== 2TableGen 3======== 4 5.. contents:: 6 :local: 7 8.. toctree:: 9 :hidden: 10 11 BackEnds 12 LangRef 13 LangIntro 14 Deficiencies 15 16Introduction 17============ 18 19TableGen's purpose is to help a human develop and maintain records of 20domain-specific information. Because there may be a large number of these 21records, it is specifically designed to allow writing flexible descriptions and 22for common features of these records to be factored out. This reduces the 23amount of duplication in the description, reduces the chance of error, and makes 24it easier to structure domain specific information. 25 26The core part of TableGen parses a file, instantiates the declarations, and 27hands the result off to a domain-specific `backend`_ for processing. 28 29The current major users of TableGen are :doc:`../CodeGenerator` 30and the 31`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. 32 33Note that if you work on TableGen much, and use emacs or vim, that you can find 34an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and 35``llvm/utils/vim`` directories of your LLVM distribution, respectively. 36 37.. _intro: 38 39 40The TableGen program 41==================== 42 43TableGen files are interpreted by the TableGen program: `llvm-tblgen` available 44on your build directory under `bin`. It is not installed in the system (or where 45your sysroot is set to), since it has no use beyond LLVM's build process. 46 47Running TableGen 48---------------- 49 50TableGen runs just like any other LLVM tool. The first (optional) argument 51specifies the file to read. If a filename is not specified, ``llvm-tblgen`` 52reads from standard input. 53 54To be useful, one of the `backends`_ must be used. These backends are 55selectable on the command line (type '``llvm-tblgen -help``' for a list). For 56example, to get a list of all of the definitions that subclass a particular type 57(which can be useful for building up an enum list of these records), use the 58``-print-enums`` option: 59 60.. code-block:: bash 61 62 $ llvm-tblgen X86.td -print-enums -class=Register 63 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, 64 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, 65 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, 66 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, 67 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, 68 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, 69 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, 70 XMM6, XMM7, XMM8, XMM9, 71 72 $ llvm-tblgen X86.td -print-enums -class=Instruction 73 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, 74 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, 75 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, 76 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, 77 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... 78 79The default backend prints out all of the records. There is also a general 80backend which outputs all the records as a JSON data structure, enabled using 81the `-dump-json` option. 82 83If you plan to use TableGen, you will most likely have to write a `backend`_ 84that extracts the information specific to what you need and formats it in the 85appropriate way. You can do this by extending TableGen itself in C++, or by 86writing a script in any language that can consume the JSON output. 87 88Example 89------- 90 91With no other arguments, `llvm-tblgen` parses the specified file and prints out all 92of the classes, then all of the definitions. This is a good way to see what the 93various definitions expand to fully. Running this on the ``X86.td`` file prints 94this (at the time of this writing): 95 96.. code-block:: text 97 98 ... 99 def ADD32rr { // Instruction X86Inst I 100 string Namespace = "X86"; 101 dag OutOperandList = (outs GR32:$dst); 102 dag InOperandList = (ins GR32:$src1, GR32:$src2); 103 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; 104 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; 105 list<Register> Uses = []; 106 list<Register> Defs = [EFLAGS]; 107 list<Predicate> Predicates = []; 108 int CodeSize = 3; 109 int AddedComplexity = 0; 110 bit isReturn = 0; 111 bit isBranch = 0; 112 bit isIndirectBranch = 0; 113 bit isBarrier = 0; 114 bit isCall = 0; 115 bit canFoldAsLoad = 0; 116 bit mayLoad = 0; 117 bit mayStore = 0; 118 bit isImplicitDef = 0; 119 bit isConvertibleToThreeAddress = 1; 120 bit isCommutable = 1; 121 bit isTerminator = 0; 122 bit isReMaterializable = 0; 123 bit isPredicable = 0; 124 bit hasDelaySlot = 0; 125 bit usesCustomInserter = 0; 126 bit hasCtrlDep = 0; 127 bit isNotDuplicable = 0; 128 bit hasSideEffects = 0; 129 InstrItinClass Itinerary = NoItinerary; 130 string Constraints = ""; 131 string DisableEncoding = ""; 132 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; 133 Format Form = MRMDestReg; 134 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; 135 ImmType ImmT = NoImm; 136 bits<3> ImmTypeBits = { 0, 0, 0 }; 137 bit hasOpSizePrefix = 0; 138 bit hasAdSizePrefix = 0; 139 bits<4> Prefix = { 0, 0, 0, 0 }; 140 bit hasREX_WPrefix = 0; 141 FPFormat FPForm = ?; 142 bits<3> FPFormBits = { 0, 0, 0 }; 143 } 144 ... 145 146This definition corresponds to the 32-bit register-register ``add`` instruction 147of the x86 architecture. ``def ADD32rr`` defines a record named 148``ADD32rr``, and the comment at the end of the line indicates the superclasses 149of the definition. The body of the record contains all of the data that 150TableGen assembled for the record, indicating that the instruction is part of 151the "X86" namespace, the pattern indicating how the instruction is selected by 152the code generator, that it is a two-address instruction, has a particular 153encoding, etc. The contents and semantics of the information in the record are 154specific to the needs of the X86 backend, and are only shown as an example. 155 156As you can see, a lot of information is needed for every instruction supported 157by the code generator, and specifying it all manually would be unmaintainable, 158prone to bugs, and tiring to do in the first place. Because we are using 159TableGen, all of the information was derived from the following definition: 160 161.. code-block:: text 162 163 let Defs = [EFLAGS], 164 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y 165 isConvertibleToThreeAddress = 1 in // Can transform into LEA. 166 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), 167 (ins GR32:$src1, GR32:$src2), 168 "add{l}\t{$src2, $dst|$dst, $src2}", 169 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; 170 171This definition makes use of the custom class ``I`` (extended from the custom 172class ``X86Inst``), which is defined in the X86-specific TableGen file, to 173factor out the common features that instructions of its class share. A key 174feature of TableGen is that it allows the end-user to define the abstractions 175they prefer to use when describing their information. 176 177Syntax 178====== 179 180TableGen has a syntax that is loosely based on C++ templates, with built-in 181types and specification. In addition, TableGen's syntax introduces some 182automation concepts like multiclass, foreach, let, etc. 183 184Basic concepts 185-------------- 186 187TableGen files consist of two key parts: 'classes' and 'definitions', both of 188which are considered 'records'. 189 190**TableGen records** have a unique name, a list of values, and a list of 191superclasses. The list of values is the main data that TableGen builds for each 192record; it is this that holds the domain specific information for the 193application. The interpretation of this data is left to a specific `backend`_, 194but the structure and format rules are taken care of and are fixed by 195TableGen. 196 197**TableGen definitions** are the concrete form of 'records'. These generally do 198not have any undefined values, and are marked with the '``def``' keyword. 199 200.. code-block:: text 201 202 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", 203 "Enable ARMv8 FP">; 204 205In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised 206with some values. The names of the classes are defined via the 207keyword `class` either on the same file or some other included. Most target 208TableGen files include the generic ones in ``include/llvm/Target``. 209 210**TableGen classes** are abstract records that are used to build and describe 211other records. These classes allow the end-user to build abstractions for 212either the domain they are targeting (such as "Register", "RegisterClass", and 213"Instruction" in the LLVM code generator) or for the implementor to help factor 214out common properties of records (such as "FPInst", which is used to represent 215floating point instructions in the X86 backend). TableGen keeps track of all of 216the classes that are used to build up a definition, so the backend can find all 217definitions of a particular class, such as "Instruction". 218 219.. code-block:: text 220 221 class ProcNoItin<string Name, list<SubtargetFeature> Features> 222 : Processor<Name, NoItineraries, Features>; 223 224Here, the class ProcNoItin, receiving parameters `Name` of type `string` and 225a list of target features is specializing the class Processor by passing the 226arguments down as well as hard-coding NoItineraries. 227 228**TableGen multiclasses** are groups of abstract records that are instantiated 229all at once. Each instantiation can result in multiple TableGen definitions. 230If a multiclass inherits from another multiclass, the definitions in the 231sub-multiclass become part of the current multiclass, as if they were declared 232in the current multiclass. 233 234.. code-block:: text 235 236 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, 237 dag address, ValueType sty> { 238 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), 239 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") 240 Base, Offset, Extend)>; 241 242 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), 243 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") 244 Base, Offset, Extend)>; 245 } 246 247 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, 248 !foreach(decls.pattern, address, 249 !subst(SHIFT, imm_eq0, decls.pattern)), 250 i8>; 251 252 253 254See the :doc:`TableGen Language Introduction <LangIntro>` for more generic 255information on the usage of the language, and the 256:doc:`TableGen Language Reference <LangRef>` for more in-depth description 257of the formal language specification. 258 259.. _backend: 260.. _backends: 261 262TableGen backends 263================= 264 265TableGen files have no real meaning without a back-end. The default operation 266of running ``llvm-tblgen`` is to print the information in a textual format, but 267that's only useful for debugging of the TableGen files themselves. The power 268in TableGen is, however, to interpret the source files into an internal 269representation that can be generated into anything you want. 270 271Current usage of TableGen is to create huge include files with tables that you 272can either include directly (if the output is in the language you're coding), 273or be used in pre-processing via macros surrounding the include of the file. 274 275Direct output can be used if the back-end already prints a table in C format 276or if the output is just a list of strings (for error and warning messages). 277Pre-processed output should be used if the same information needs to be used 278in different contexts (like Instruction names), so your back-end should print 279a meta-information list that can be shaped into different compile-time formats. 280 281See the `TableGen BackEnds <BackEnds.html>`_ for more information. 282 283TableGen Deficiencies 284===================== 285 286Despite being very generic, TableGen has some deficiencies that have been 287pointed out numerous times. The common theme is that, while TableGen allows 288you to build Domain-Specific-Languages, the final languages that you create 289lack the power of other DSLs, which in turn increase considerably the size 290and complexity of TableGen files. 291 292At the same time, TableGen allows you to create virtually any meaning of 293the basic concepts via custom-made back-ends, which can pervert the original 294design and make it very hard for newcomers to understand the evil TableGen 295file. 296 297There are some in favour of extending the semantics even more, but making sure 298back-ends adhere to strict rules. Others are suggesting we should move to less, 299more powerful DSLs designed with specific purposes, or even re-using existing 300DSLs. 301 302Either way, this is a discussion that will likely span across several years, 303if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ 304document. 305