1===========================
2TableGen Language Reference
3===========================
4
5.. contents::
6   :local:
7
8.. warning::
9   This document is extremely rough. If you find something lacking, please
10   fix it, file a documentation bug, or ask about it on llvm-dev.
11
12Introduction
13============
14
15This document is meant to be a normative spec about the TableGen language
16in and of itself (i.e. how to understand a given construct in terms of how
17it affects the final set of records represented by the TableGen file). If
18you are unsure if this document is really what you are looking for, please
19read the :doc:`introduction to TableGen <index>` first.
20
21Notation
22========
23
24The lexical and syntax notation used here is intended to imitate
25`Python's`_. In particular, for lexical definitions, the productions
26operate at the character level and there is no implied whitespace between
27elements. The syntax definitions operate at the token level, so there is
28implied whitespace between tokens.
29
30.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation
31
32Lexical Analysis
33================
34
35TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``)
36comments.
37
38The following is a listing of the basic punctuation tokens::
39
40   - + [ ] { } ( ) < > : ; .  = ? #
41
42Numeric literals take one of the following forms:
43
44.. TableGen actually will lex some pretty strange sequences an interpret
45   them as numbers. What is shown here is an attempt to approximate what it
46   "should" accept.
47
48.. productionlist::
49   TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger`
50   DecimalInteger: ["+" | "-"] ("0"..."9")+
51   HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+
52   BinInteger: "0b" ("0" | "1")+
53
54One aspect to note is that the :token:`DecimalInteger` token *includes* the
55``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
56most languages do.
57
58Also note that :token:`BinInteger` creates a value of type ``bits<n>``
59(where ``n`` is the number of bits).  This will implicitly convert to
60integers when needed.
61
62TableGen has identifier-like tokens:
63
64.. productionlist::
65   ualpha: "a"..."z" | "A"..."Z" | "_"
66   TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")*
67   TokVarName: "$" `ualpha` (`ualpha` |  "0"..."9")*
68
69Note that unlike most languages, TableGen allows :token:`TokIdentifier` to
70begin with a number. In case of ambiguity, a token will be interpreted as a
71numeric literal rather than an identifier.
72
73TableGen also has two string-like literals:
74
75.. productionlist::
76   TokString: '"' <non-'"' characters and C-like escapes> '"'
77   TokCodeFragment: "[{" <shortest text not containing "}]"> "}]"
78
79:token:`TokCodeFragment` is essentially a multiline string literal
80delimited by ``[{`` and ``}]``.
81
82.. note::
83   The current implementation accepts the following C-like escapes::
84
85      \\ \' \" \t \n
86
87TableGen also has the following keywords::
88
89   bit   bits      class   code         dag
90   def   foreach   defm    field        in
91   int   let       list    multiclass   string
92
93TableGen also has "bang operators" which have a
94wide variety of meanings:
95
96.. productionlist::
97   BangOperator: one of
98               :!eq     !if      !head    !tail      !con
99               :!add    !shl     !sra     !srl       !and
100               :!or     !empty   !subst   !foreach   !strconcat
101               :!cast   !listconcat       !size      !foldl
102               :!isa    !dag     !le      !lt        !ge
103               :!gt     !ne
104
105
106Syntax
107======
108
109TableGen has an ``include`` mechanism. It does not play a role in the
110syntax per se, since it is lexically replaced with the contents of the
111included file.
112
113.. productionlist::
114   IncludeDirective: "include" `TokString`
115
116TableGen's top-level production consists of "objects".
117
118.. productionlist::
119   TableGenFile: `Object`*
120   Object: `Class` | `Def` | `Defm` | `Defset` | `Let` | `MultiClass` |
121           `Foreach`
122
123``class``\es
124------------
125
126.. productionlist::
127   Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
128   TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
129
130A ``class`` declaration creates a record which other records can inherit
131from. A class can be parametrized by a list of "template arguments", whose
132values can be used in the class body.
133
134A given class can only be defined once. A ``class`` declaration is
135considered to define the class if any of the following is true:
136
137.. break ObjectBody into its consituents so that they are present here?
138
139#. The :token:`TemplateArgList` is present.
140#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty.
141#. The :token:`BaseClassList` in the :token:`ObjectBody` is present.
142
143You can declare an empty class by giving and empty :token:`TemplateArgList`
144and an empty :token:`ObjectBody`. This can serve as a restricted form of
145forward declaration: note that records deriving from the forward-declared
146class will inherit no fields from it since the record expansion is done
147when the record is parsed.
148
149Every class has an implicit template argument called ``NAME``, which is set
150to the name of the instantiating ``def`` or ``defm``. The result is undefined
151if the class is instantiated by an anonymous record.
152
153Declarations
154------------
155
156.. Omitting mention of arcane "field" prefix to discourage its use.
157
158The declaration syntax is pretty much what you would expect as a C++
159programmer.
160
161.. productionlist::
162   Declaration: `Type` `TokIdentifier` ["=" `Value`]
163
164It assigns the value to the identifier.
165
166Types
167-----
168
169.. productionlist::
170   Type: "string" | "code" | "bit" | "int" | "dag"
171       :| "bits" "<" `TokInteger` ">"
172       :| "list" "<" `Type` ">"
173       :| `ClassID`
174   ClassID: `TokIdentifier`
175
176Both ``string`` and ``code`` correspond to the string type; the difference
177is purely to indicate programmer intention.
178
179The :token:`ClassID` must identify a class that has been previously
180declared or defined.
181
182Values
183------
184
185.. productionlist::
186   Value: `SimpleValue` `ValueSuffix`*
187   ValueSuffix: "{" `RangeList` "}"
188              :| "[" `RangeList` "]"
189              :| "." `TokIdentifier`
190   RangeList: `RangePiece` ("," `RangePiece`)*
191   RangePiece: `TokInteger`
192             :| `TokInteger` "-" `TokInteger`
193             :| `TokInteger` `TokInteger`
194
195The peculiar last form of :token:`RangePiece` is due to the fact that the
196"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as
197two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``,
198instead of "1", "-", and "5".
199The :token:`RangeList` can be thought of as specifying "list slice" in some
200contexts.
201
202
203:token:`SimpleValue` has a number of forms:
204
205
206.. productionlist::
207   SimpleValue: `TokIdentifier`
208
209The value will be the variable referenced by the identifier. It can be one
210of:
211
212.. The code for this is exceptionally abstruse. These examples are a
213   best-effort attempt.
214
215* name of a ``def``, such as the use of ``Bar`` in::
216
217     def Bar : SomeClass {
218       int X = 5;
219     }
220
221     def Foo {
222       SomeClass Baz = Bar;
223     }
224
225* value local to a ``def``, such as the use of ``Bar`` in::
226
227     def Foo {
228       int Bar = 5;
229       int Baz = Bar;
230     }
231
232  Values defined in superclasses can be accessed the same way.
233
234* a template arg of a ``class``, such as the use of ``Bar`` in::
235
236     class Foo<int Bar> {
237       int Baz = Bar;
238     }
239
240* value local to a ``class``, such as the use of ``Bar`` in::
241
242     class Foo {
243       int Bar = 5;
244       int Baz = Bar;
245     }
246
247* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
248
249     multiclass Foo<int Bar> {
250       def : SomeClass<Bar>;
251     }
252
253* the iteration variable of a ``foreach``, such as the use of ``i`` in::
254
255     foreach i = 0-5 in
256     def Foo#i;
257
258* a variable defined by ``defset``
259
260* the implicit template argument ``NAME`` in a ``class`` or ``multiclass``
261
262.. productionlist::
263   SimpleValue: `TokInteger`
264
265This represents the numeric value of the integer.
266
267.. productionlist::
268   SimpleValue: `TokString`+
269
270Multiple adjacent string literals are concatenated like in C/C++. The value
271is the concatenation of the strings.
272
273.. productionlist::
274   SimpleValue: `TokCodeFragment`
275
276The value is the string value of the code fragment.
277
278.. productionlist::
279   SimpleValue: "?"
280
281``?`` represents an "unset" initializer.
282
283.. productionlist::
284   SimpleValue: "{" `ValueList` "}"
285   ValueList: [`ValueListNE`]
286   ValueListNE: `Value` ("," `Value`)*
287
288This represents a sequence of bits, as would be used to initialize a
289``bits<n>`` field (where ``n`` is the number of bits).
290
291.. productionlist::
292   SimpleValue: `ClassID` "<" `ValueListNE` ">"
293
294This generates a new anonymous record definition (as would be created by an
295unnamed ``def`` inheriting from the given class with the given template
296arguments) and the value is the value of that record definition.
297
298.. productionlist::
299   SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"]
300
301A list initializer. The optional :token:`Type` can be used to indicate a
302specific element type, otherwise the element type will be deduced from the
303given values.
304
305.. The initial `DagArg` of the dag must start with an identifier or
306   !cast, but this is more of an implementation detail and so for now just
307   leave it out.
308
309.. productionlist::
310   SimpleValue: "(" `DagArg` [`DagArgList`] ")"
311   DagArgList: `DagArg` ("," `DagArg`)*
312   DagArg: `Value` [":" `TokVarName`] | `TokVarName`
313
314The initial :token:`DagArg` is called the "operator" of the dag.
315
316.. productionlist::
317   SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")"
318
319Bodies
320------
321
322.. productionlist::
323   ObjectBody: `BaseClassList` `Body`
324   BaseClassList: [":" `BaseClassListNE`]
325   BaseClassListNE: `SubClassRef` ("," `SubClassRef`)*
326   SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"]
327   DefmID: `TokIdentifier`
328
329The version with the :token:`MultiClassID` is only valid in the
330:token:`BaseClassList` of a ``defm``.
331The :token:`MultiClassID` should be the name of a ``multiclass``.
332
333.. put this somewhere else
334
335It is after parsing the base class list that the "let stack" is applied.
336
337.. productionlist::
338   Body: ";" | "{" BodyList "}"
339   BodyList: BodyItem*
340   BodyItem: `Declaration` ";"
341           :| "let" `TokIdentifier` [ "{" `RangeList` "}" ] "=" `Value` ";"
342
343The ``let`` form allows overriding the value of an inherited field.
344
345``def``
346-------
347
348.. productionlist::
349   Def: "def" [`Value`] `ObjectBody`
350
351Defines a record whose name is given by the optional :token:`Value`. The value
352is parsed in a special mode where global identifiers (records and variables
353defined by ``defset``) are not recognized, and all unrecognized identifiers
354are interpreted as strings.
355
356If no name is given, the record is anonymous. The final name of anonymous
357records is undefined, but globally unique.
358
359Special handling occurs if this ``def`` appears inside a ``multiclass`` or
360a ``foreach``.
361
362When a non-anonymous record is defined in a multiclass and the given name
363does not contain a reference to the implicit template argument ``NAME``, such
364a reference will automatically be prepended. That is, the following are
365equivalent inside a multiclass::
366
367    def Foo;
368    def NAME#Foo;
369
370``defm``
371--------
372
373.. productionlist::
374   Defm: "defm" [`Value`] ":" `BaseClassListNE` ";"
375
376The :token:`BaseClassList` is a list of at least one ``multiclass`` and any
377number of ``class``'s. The ``multiclass``'s must occur before any ``class``'s.
378
379Instantiates all records defined in all given ``multiclass``'s and adds the
380given ``class``'s as superclasses.
381
382The name is parsed in the same special mode used by ``def``. If the name is
383missing, a globally unique string is used instead (but instantiated records
384are not considered to be anonymous, unless they were originally defined by an
385anonymous ``def``) That is, the following have different semantics::
386
387    defm : SomeMultiClass<...>;    // some globally unique name
388    defm "" : SomeMultiClass<...>; // empty name string
389
390When it occurs inside a multiclass, the second variant is equivalent to
391``defm NAME : ...``. More generally, when ``defm`` occurs in a multiclass and
392its name does not contain a reference to the implicit template argument
393``NAME``, such a reference will automatically be prepended. That is, the
394following are equivalent inside a multiclass::
395
396    defm Foo : SomeMultiClass<...>;
397    defm NAME#Foo : SomeMultiClass<...>;
398
399``defset``
400----------
401.. productionlist::
402   Defset: "defset" `Type` `TokIdentifier` "=" "{" `Object`* "}"
403
404All records defined inside the braces via ``def`` and ``defm`` are collected
405in a globally accessible list of the given name (in addition to being added
406to the global collection of records as usual). Anonymous records created inside
407initializier expressions using the ``Class<args...>`` syntax are never collected
408in a defset.
409
410The given type must be ``list<A>``, where ``A`` is some class. It is an error
411to define a record (via ``def`` or ``defm``) inside the braces which doesn't
412derive from ``A``.
413
414``foreach``
415-----------
416
417.. productionlist::
418   Foreach: "foreach" `ForeachDeclaration` "in" "{" `Object`* "}"
419          :| "foreach" `ForeachDeclaration` "in" `Object`
420   ForeachDeclaration: ID "=" ( "{" `RangeList` "}" | `RangePiece` | `Value` )
421
422The value assigned to the variable in the declaration is iterated over and
423the object or object list is reevaluated with the variable set at each
424iterated value.
425
426Note that the productions involving RangeList and RangePiece have precedence
427over the more generic value parsing based on the first token.
428
429Top-Level ``let``
430-----------------
431
432.. productionlist::
433   Let:  "let" `LetList` "in" "{" `Object`* "}"
434      :| "let" `LetList` "in" `Object`
435   LetList: `LetItem` ("," `LetItem`)*
436   LetItem: `TokIdentifier` [`RangeList`] "=" `Value`
437
438This is effectively equivalent to ``let`` inside the body of a record
439except that it applies to multiple records at a time. The bindings are
440applied at the end of parsing the base classes of a record.
441
442``multiclass``
443--------------
444
445.. productionlist::
446   MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`]
447             : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}"
448   BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)*
449   MultiClassID: `TokIdentifier`
450   MultiClassObject: `Def` | `Defm` | `Let` | `Foreach`
451