1========
2TableGen
3========
4
5.. contents::
6   :local:
7
8.. toctree::
9   :hidden:
10
11   BackEnds
12   LangRef
13   LangIntro
14   Deficiencies
15
16Introduction
17============
18
19TableGen's purpose is to help a human develop and maintain records of
20domain-specific information.  Because there may be a large number of these
21records, it is specifically designed to allow writing flexible descriptions and
22for common features of these records to be factored out.  This reduces the
23amount of duplication in the description, reduces the chance of error, and makes
24it easier to structure domain specific information.
25
26The core part of TableGen parses a file, instantiates the declarations, and
27hands the result off to a domain-specific `backend`_ for processing.
28
29The current major users of TableGen are :doc:`../CodeGenerator`
30and the
31`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
32
33Note that if you work on TableGen much, and use emacs or vim, that you can find
34an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
35``llvm/utils/vim`` directories of your LLVM distribution, respectively.
36
37.. _intro:
38
39
40The TableGen program
41====================
42
43TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
44on your build directory under `bin`. It is not installed in the system (or where
45your sysroot is set to), since it has no use beyond LLVM's build process.
46
47Running TableGen
48----------------
49
50TableGen runs just like any other LLVM tool.  The first (optional) argument
51specifies the file to read.  If a filename is not specified, ``llvm-tblgen``
52reads from standard input.
53
54To be useful, one of the `backends`_ must be used.  These backends are
55selectable on the command line (type '``llvm-tblgen -help``' for a list).  For
56example, to get a list of all of the definitions that subclass a particular type
57(which can be useful for building up an enum list of these records), use the
58``-print-enums`` option:
59
60.. code-block:: bash
61
62  $ llvm-tblgen X86.td -print-enums -class=Register
63  AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
64  ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
65  MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
66  R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
67  R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
68  RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
69  XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
70  XMM6, XMM7, XMM8, XMM9,
71
72  $ llvm-tblgen X86.td -print-enums -class=Instruction
73  ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
74  ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
75  ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
76  ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
77  ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
78
79The default backend prints out all of the records. There is also a general
80backend which outputs all the records as a JSON data structure, enabled using
81the `-dump-json` option.
82
83If you plan to use TableGen, you will most likely have to write a `backend`_
84that extracts the information specific to what you need and formats it in the
85appropriate way. You can do this by extending TableGen itself in C++, or by
86writing a script in any language that can consume the JSON output.
87
88Example
89-------
90
91With no other arguments, `llvm-tblgen` parses the specified file and prints out all
92of the classes, then all of the definitions.  This is a good way to see what the
93various definitions expand to fully.  Running this on the ``X86.td`` file prints
94this (at the time of this writing):
95
96.. code-block:: text
97
98  ...
99  def ADD32rr {   // Instruction X86Inst I
100    string Namespace = "X86";
101    dag OutOperandList = (outs GR32:$dst);
102    dag InOperandList = (ins GR32:$src1, GR32:$src2);
103    string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
104    list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
105    list<Register> Uses = [];
106    list<Register> Defs = [EFLAGS];
107    list<Predicate> Predicates = [];
108    int CodeSize = 3;
109    int AddedComplexity = 0;
110    bit isReturn = 0;
111    bit isBranch = 0;
112    bit isIndirectBranch = 0;
113    bit isBarrier = 0;
114    bit isCall = 0;
115    bit canFoldAsLoad = 0;
116    bit mayLoad = 0;
117    bit mayStore = 0;
118    bit isImplicitDef = 0;
119    bit isConvertibleToThreeAddress = 1;
120    bit isCommutable = 1;
121    bit isTerminator = 0;
122    bit isReMaterializable = 0;
123    bit isPredicable = 0;
124    bit hasDelaySlot = 0;
125    bit usesCustomInserter = 0;
126    bit hasCtrlDep = 0;
127    bit isNotDuplicable = 0;
128    bit hasSideEffects = 0;
129    InstrItinClass Itinerary = NoItinerary;
130    string Constraints = "";
131    string DisableEncoding = "";
132    bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
133    Format Form = MRMDestReg;
134    bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
135    ImmType ImmT = NoImm;
136    bits<3> ImmTypeBits = { 0, 0, 0 };
137    bit hasOpSizePrefix = 0;
138    bit hasAdSizePrefix = 0;
139    bits<4> Prefix = { 0, 0, 0, 0 };
140    bit hasREX_WPrefix = 0;
141    FPFormat FPForm = ?;
142    bits<3> FPFormBits = { 0, 0, 0 };
143  }
144  ...
145
146This definition corresponds to the 32-bit register-register ``add`` instruction
147of the x86 architecture.  ``def ADD32rr`` defines a record named
148``ADD32rr``, and the comment at the end of the line indicates the superclasses
149of the definition.  The body of the record contains all of the data that
150TableGen assembled for the record, indicating that the instruction is part of
151the "X86" namespace, the pattern indicating how the instruction is selected by
152the code generator, that it is a two-address instruction, has a particular
153encoding, etc.  The contents and semantics of the information in the record are
154specific to the needs of the X86 backend, and are only shown as an example.
155
156As you can see, a lot of information is needed for every instruction supported
157by the code generator, and specifying it all manually would be unmaintainable,
158prone to bugs, and tiring to do in the first place.  Because we are using
159TableGen, all of the information was derived from the following definition:
160
161.. code-block:: text
162
163  let Defs = [EFLAGS],
164      isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
165      isConvertibleToThreeAddress = 1 in // Can transform into LEA.
166  def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
167                                     (ins GR32:$src1, GR32:$src2),
168                   "add{l}\t{$src2, $dst|$dst, $src2}",
169                   [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
170
171This definition makes use of the custom class ``I`` (extended from the custom
172class ``X86Inst``), which is defined in the X86-specific TableGen file, to
173factor out the common features that instructions of its class share.  A key
174feature of TableGen is that it allows the end-user to define the abstractions
175they prefer to use when describing their information.
176
177Syntax
178======
179
180TableGen has a syntax that is loosely based on C++ templates, with built-in
181types and specification. In addition, TableGen's syntax introduces some
182automation concepts like multiclass, foreach, let, etc.
183
184Basic concepts
185--------------
186
187TableGen files consist of two key parts: 'classes' and 'definitions', both of
188which are considered 'records'.
189
190**TableGen records** have a unique name, a list of values, and a list of
191superclasses.  The list of values is the main data that TableGen builds for each
192record; it is this that holds the domain specific information for the
193application.  The interpretation of this data is left to a specific `backend`_,
194but the structure and format rules are taken care of and are fixed by
195TableGen.
196
197**TableGen definitions** are the concrete form of 'records'.  These generally do
198not have any undefined values, and are marked with the '``def``' keyword.
199
200.. code-block:: text
201
202  def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
203                                        "Enable ARMv8 FP">;
204
205In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
206with some values. The names of the classes are defined via the
207keyword `class` either on the same file or some other included. Most target
208TableGen files include the generic ones in ``include/llvm/Target``.
209
210**TableGen classes** are abstract records that are used to build and describe
211other records.  These classes allow the end-user to build abstractions for
212either the domain they are targeting (such as "Register", "RegisterClass", and
213"Instruction" in the LLVM code generator) or for the implementor to help factor
214out common properties of records (such as "FPInst", which is used to represent
215floating point instructions in the X86 backend).  TableGen keeps track of all of
216the classes that are used to build up a definition, so the backend can find all
217definitions of a particular class, such as "Instruction".
218
219.. code-block:: text
220
221 class ProcNoItin<string Name, list<SubtargetFeature> Features>
222       : Processor<Name, NoItineraries, Features>;
223
224Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
225a list of target features is specializing the class Processor by passing the
226arguments down as well as hard-coding NoItineraries.
227
228**TableGen multiclasses** are groups of abstract records that are instantiated
229all at once.  Each instantiation can result in multiple TableGen definitions.
230If a multiclass inherits from another multiclass, the definitions in the
231sub-multiclass become part of the current multiclass, as if they were declared
232in the current multiclass.
233
234.. code-block:: text
235
236  multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
237                          dag address, ValueType sty> {
238  def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
239            (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
240              Base, Offset, Extend)>;
241
242  def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
243            (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
244              Base, Offset, Extend)>;
245  }
246
247  defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
248                        !foreach(decls.pattern, address,
249                                 !subst(SHIFT, imm_eq0, decls.pattern)),
250                        i8>;
251
252
253
254See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
255information on the usage of the language, and the
256:doc:`TableGen Language Reference <LangRef>` for more in-depth description
257of the formal language specification.
258
259.. _backend:
260.. _backends:
261
262TableGen backends
263=================
264
265TableGen files have no real meaning without a back-end. The default operation
266of running ``llvm-tblgen`` is to print the information in a textual format, but
267that's only useful for debugging of the TableGen files themselves. The power
268in TableGen is, however, to interpret the source files into an internal
269representation that can be generated into anything you want.
270
271Current usage of TableGen is to create huge include files with tables that you
272can either include directly (if the output is in the language you're coding),
273or be used in pre-processing via macros surrounding the include of the file.
274
275Direct output can be used if the back-end already prints a table in C format
276or if the output is just a list of strings (for error and warning messages).
277Pre-processed output should be used if the same information needs to be used
278in different contexts (like Instruction names), so your back-end should print
279a meta-information list that can be shaped into different compile-time formats.
280
281See the `TableGen BackEnds <BackEnds.html>`_ for more information.
282
283TableGen Deficiencies
284=====================
285
286Despite being very generic, TableGen has some deficiencies that have been
287pointed out numerous times. The common theme is that, while TableGen allows
288you to build Domain-Specific-Languages, the final languages that you create
289lack the power of other DSLs, which in turn increase considerably the size
290and complexity of TableGen files.
291
292At the same time, TableGen allows you to create virtually any meaning of
293the basic concepts via custom-made back-ends, which can pervert the original
294design and make it very hard for newcomers to understand the evil TableGen
295file.
296
297There are some in favour of extending the semantics even more, but making sure
298back-ends adhere to strict rules. Others are suggesting we should move to less,
299more powerful DSLs designed with specific purposes, or even re-using existing
300DSLs.
301
302Either way, this is a discussion that will likely span across several years,
303if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
304document.
305