README.md
1# CDDL Compiler
2
3This CDDL compiler takes a CDDL specification as input and produces a C++ header
4and source file which contain structs, enums, encode functions, and decode
5functions. This simplifies the process of taking a CDDL message definition from
6the OSP control protocol spec and making it usable in C++. Additionally, it
7simplifies adding new messages or changing existing messages during development.
8
9This compiler is not intended to support all or even most of the CDDL spec.
10CDDL allows many patterns that are not useful, practical, or efficient when
11considering a C++ implementation of CDDL messages. Our specialization for enums
12is a good example, but more details are given below.
13
14## Usage Overview
15
16This section gives some examples of CDDL syntax that is supported and what the
17generated C++ looks like. For the complete set of messages currently supported
18for OSP, see [//msgs/osp.cddl](../../msgs/osp.cddl).
19
20### Maps
21
22The following example shows a map in CDDL:
23``` cddl
24x = {
25 alpha: uint,
26 beta: text,
27}
28```
29
30This translates into a normal C++ struct (i.e. not `std::map`):
31``` c++
32struct X {
33 uint64_t alpha;
34 std::string beta;
35};
36```
37
38The string keys are handled only by the encoding and decoding functions.
39
40### Heterogenous Arrays
41
42An array of heterogeneous (or indeed a fixed number of homogeneous types) such
43as
44``` cddl
45x = [
46 alpha: uint,
47 beta: text,
48]
49```
50also translates into a plain C++ struct.
51``` c++
52struct X {
53 uint64_t alpha;
54 std::string beta;
55};
56```
57In the array case, the field keys are only used as variable names and no strings
58are used in encoding.
59
60Because these must be implemented as a C++ struct and we don't want to define an
61automatic naming scheme, all array fields must have a key. For example, CDDL
62would allow this definition:
63``` cddl
64x = [
65 uint,
66 text,
67]
68```
69but this is not allowed by our compiler.
70
71### Homogeneous Arrays
72
73An array of unspecified length containing only one type:
74``` cddl
75x = [* uint]
76```
77is translated to a `std::vector`. In this case, a key for the single array
78field isn't necessary. It's currently not supported to put length constraints
79(e.g. `x = [2*5 uint]`) on the array length.
80
81### Group Inclusion
82
83If common fields are placed in a separate CDDL group (which is not a map or
84array), it can be included directly in another map, array, or group type. So
85``` cddl
86x = (alpha: uint)
87y = {
88 x,
89 beta: text,
90}
91```
92will translate to the following C++ struct:
93``` c++
94struct Y {
95 uint64_t alpha;
96 std::string beta;
97};
98```
99If you prefer that a group is included explicitly as its own struct type, you
100should make it a map or array. For example,
101``` cddl
102x = {alpha: uint}
103y = {
104 x: x,
105 beta: text,
106}
107```
108will translate to the following C++ struct:
109``` c++
110struct X {
111 uint64_t alpha;
112};
113struct Y {
114 X x;
115 std::string beta;
116};
117```
118
119### Optional Fields
120
121Fields that are not required are prefixed with a '?' in CDDL.
122``` cddl
123x = { ? alpha: uint }
124```
125These are translated to a bool flag and value pair:
126``` c++
127struct X {
128 bool has_alpha;
129 uint64_t alpha;
130};
131```
132
133### Choice from a Group as an Enum
134
135CDDL allows specifying a type as one of any member of a group:
136``` cddl
137x = &(
138 alpha: 0,
139 beta: 1,
140)
141```
142This is implemented as an enum in C++:
143``` c++
144enum X {
145 kAlpha = 0,
146 kBeta = 1,
147};
148```
149Recursive group inclusion in choices handled by the simple fact that plain enum
150constants are global and not type-checked. This leads to a global definition
151caveat that is explained below, but here is an example of such an inclusion:
152``` cddl
153x = ( alpha: 0, beta: 1 )
154y = &( x, gamma: 2 )
155```
156``` c++
157enum X {
158 kAlpha = 0,
159 kBeta = 1,
160};
161enum Y {
162 // union: enum X
163 kGamma = 2,
164};
165```
166
167### Type Choice as a Discriminated Union
168
169Specifying multiple possible types for a value in CDDL
170``` cddl
171x = { alpha: text / uint }
172```
173is translated to a discriminated union in C++:
174``` c++
175struct X {
176 X();
177 ~X(); // NOTE: This requires defining a ctor/dtor to deal with the union.
178 enum class WhichAlpha {
179 kString,
180 kUint64,
181 } which_alpha;
182 union {
183 std::string str;
184 uint64_t uint;
185 } alpha;
186};
187```
188Currently, only `uint`, `text`, and `bytes` are allowed here. Additionally, as
189an implementation note, a placeholder `bool` is also included in the union so it
190can always be created as "uninitialized". This means that no destructor is
191necessary before the first proper member assignment.
192
193### Tagged Types
194
195This example
196``` cddl
197x = #6.1234(uint)
198```
199translates to a single `uint64_t` variable. The 1234 tag is placed before it
200during encoding and the same tag is checked during decoding.
201
202### Caveats
203
204In addition to completely unsupported aspects of CDDL, there are some places
205where there are additional constraints placed on accepted CDDL forms. The
206following sections describe these additional constraints.
207
208#### Naming
209
210CDDL allows identifiers to use characters from the set `[a-zA-Z0-9_-@$.]`, but
211these do not correspond to valid C++ identifiers or typenames. As a result, we
212need to either restrict the CDDL identifier character set or define a mapping to
213C++ identifiers and typenames. We chose the latter, since CDDL prefers '-' over
214'\_'. The mapping to C++ identifiers is done by converting '-' to '\_' and the
215mapping to C++ typenames is done by converting to camel case on words delimited
216by '-'. As a result, `[@$.]` are still disallowed in CDDL identifiers.
217Additionally, the names `dead_beef` and `dead-beef` would translate to the same
218C++ identifier/typename.
219
220#### Enums
221
222In order to simplify the sharing of enumeration values across messages (see
223example below), they are implemented in C++ as enums and not enum classes. As a
224result, the enum constant names are global and cannot be defined more than once.
225The example below illustrates how to handle a case where you have odd enum set
226intersections.
227
228``` cddl
229result = (
230 success: 0,
231 timeout: 1,
232 unknown-error: 2,
233)
234
235message1 = {
236 result: &(
237 result,
238 invalid-input: 10,
239 internal-error: 20,
240 )
241}
242
243message2 = {
244 result: &(
245 result,
246 invalid-input: 10, ; ERROR - redefinition of enum constant in resulting C++
247 cancelled: 30,
248 )
249}
250```
251
252``` cddl
253result = (
254 success: 0,
255 timeout: 1,
256 unknown-error: 2,
257)
258
259invalid-input = (
260 invalid-input: 10,
261)
262
263message1 = {
264 result: &(
265 result,
266 invalid-input,
267 internal-error: 20,
268 )
269}
270
271message2 = {
272 result: &(
273 result,
274 invalid-input, ; OK - reference existing enum in resulting C++
275 cancelled: 30,
276 )
277}
278```
279
280As a corollary, care should be taken to not allow duplicate enum constant
281_values_ in enums that are used together.
282
283**TODO(btolsch): Make this a compiler check.**
284
285## Implementation Overview
286
287The implementation is broken up into the following files:
288 - [main.cc](main.cc): Compiler driver. Command line arguments are:
289 - `--header <filename>`: Specify the filename of the output header file.
290 This is also the name that will be used for the include guard and as the
291 include path in the source file.
292 - `--cc <filename>`: Specify the filename of the output source file.
293 - `--gen-dir <filename>`: Specify the directory prefix that should be added
294 to the output header and source file.
295 - A filename (in any position) without a preceding flag specifies the input
296 file which contains the CDDL spec.
297 - [cddl.py](cddl.py): Python adapter to allow the tool to be invoked as a GN
298 action.
299 - [parse.cc](parse.cc): Parser which produces a tree of `AstNode`s
300 corresponding to the input's derivation in the grammar.
301 - [sema.cc](sema.cc): "Semantic analysis" step (named for clang's semantic
302 analysis layer) which generates a table of `CppType`s. `CppType` represents
303 something that will become a C++ type in the final output.
304 - [codegen.cc](codegen.cc): C++ generation step which outputs struct, enum,
305 and function declarations to the specified header file and function
306 definitions to the specified source file.
307
308### Grammar
309
310Since CDDL is still an IETF draft spec and the grammar has changed at least a
311few times, the grammar used for this implementation is duplicated in
312[grammar.md](grammar.md).
313