1Writing a schema {#flatbuffers_guide_writing_schema} 2================ 3 4The syntax of the schema language (aka IDL, [Interface Definition Language][]) 5should look quite familiar to users of any of the C family of 6languages, and also to users of other IDLs. Let's look at an example 7first: 8 9 // example IDL file 10 11 namespace MyGame; 12 13 attribute "priority"; 14 15 enum Color : byte { Red = 1, Green, Blue } 16 17 union Any { Monster, Weapon, Pickup } 18 19 struct Vec3 { 20 x:float; 21 y:float; 22 z:float; 23 } 24 25 table Monster { 26 pos:Vec3; 27 mana:short = 150; 28 hp:short = 100; 29 name:string; 30 friendly:bool = false (deprecated, priority: 1); 31 inventory:[ubyte]; 32 color:Color = Blue; 33 test:Any; 34 } 35 36 root_type Monster; 37 38(`Weapon` & `Pickup` not defined as part of this example). 39 40### Tables 41 42Tables are the main way of defining objects in FlatBuffers, and consist 43of a name (here `Monster`) and a list of fields. Each field has a name, 44a type, and optionally a default value (if omitted, it defaults to `0` / 45`NULL`). 46 47Each field is optional: It does not have to appear in the wire 48representation, and you can choose to omit fields for each individual 49object. As a result, you have the flexibility to add fields without fear of 50bloating your data. This design is also FlatBuffer's mechanism for forward 51and backwards compatibility. Note that: 52 53- You can add new fields in the schema ONLY at the end of a table 54 definition. Older data will still 55 read correctly, and give you the default value when read. Older code 56 will simply ignore the new field. 57 If you want to have flexibility to use any order for fields in your 58 schema, you can manually assign ids (much like Protocol Buffers), 59 see the `id` attribute below. 60 61- You cannot delete fields you don't use anymore from the schema, 62 but you can simply 63 stop writing them into your data for almost the same effect. 64 Additionally you can mark them as `deprecated` as in the example 65 above, which will prevent the generation of accessors in the 66 generated C++, as a way to enforce the field not being used any more. 67 (careful: this may break code!). 68 69- You may change field names and table names, if you're ok with your 70 code breaking until you've renamed them there too. 71 72See "Schema evolution examples" below for more on this 73topic. 74 75### Structs 76 77Similar to a table, only now none of the fields are optional (so no defaults 78either), and fields may not be added or be deprecated. Structs may only contain 79scalars or other structs. Use this for 80simple objects where you are very sure no changes will ever be made 81(as quite clear in the example `Vec3`). Structs use less memory than 82tables and are even faster to access (they are always stored in-line in their 83parent object, and use no virtual table). 84 85### Types 86 87Built-in scalar types are: 88 89- 8 bit: `byte`, `ubyte`, `bool` 90 91- 16 bit: `short`, `ushort` 92 93- 32 bit: `int`, `uint`, `float` 94 95- 64 bit: `long`, `ulong`, `double` 96 97Built-in non-scalar types: 98 99- Vector of any other type (denoted with `[type]`). Nesting vectors 100 is not supported, instead you can wrap the inner vector in a table. 101 102- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings 103 or general binary data use vectors (`[byte]` or `[ubyte]`) instead. 104 105- References to other tables or structs, enums or unions (see 106 below). 107 108You can't change types of fields once they're used, with the exception 109of same-size data where a `reinterpret_cast` would give you a desirable result, 110e.g. you could change a `uint` to an `int` if no values in current data use the 111high bit yet. 112 113### (Default) Values 114 115Values are a sequence of digits. Values may be optionally followed by a decimal 116point (`.`) and more digits, for float constants, or optionally prefixed by 117a `-`. Floats may also be in scientific notation; optionally ending with an `e` 118or `E`, followed by a `+` or `-` and more digits. 119 120Only scalar values can have defaults, non-scalar (string/vector/table) fields 121default to `NULL` when not present. 122 123You generally do not want to change default values after they're initially 124defined. Fields that have the default value are not actually stored in the 125serialized data (see also Gotchas below) but are generated in code, 126so when you change the default, you'd 127now get a different value than from code generated from an older version of 128the schema. There are situations, however, where this may be 129desirable, especially if you can ensure a simultaneous rebuild of 130all code. 131 132### Enums 133 134Define a sequence of named constants, each with a given value, or 135increasing by one from the previous one. The default first value 136is `0`. As you can see in the enum declaration, you specify the underlying 137integral type of the enum with `:` (in this case `byte`), which then determines 138the type of any fields declared with this enum type. 139 140Typically, enum values should only ever be added, never removed (there is no 141deprecation for enums). This requires code to handle forwards compatibility 142itself, by handling unknown enum values. 143 144### Unions 145 146Unions share a lot of properties with enums, but instead of new names 147for constants, you use names of tables. You can then declare 148a union field, which can hold a reference to any of those types, and 149additionally a hidden field with the suffix `_type` is generated that 150holds the corresponding enum value, allowing you to know which type to 151cast to at runtime. 152 153Unions are a good way to be able to send multiple message types as a FlatBuffer. 154Note that because a union field is really two fields, it must always be 155part of a table, it cannot be the root of a FlatBuffer by itself. 156 157If you have a need to distinguish between different FlatBuffers in a more 158open-ended way, for example for use as files, see the file identification 159feature below. 160 161There is an experimental support only in C++ for a vector of unions 162(and types). In the example IDL file above, use [Any] to add a 163vector of Any to Monster table. 164 165### Namespaces 166 167These will generate the corresponding namespace in C++ for all helper 168code, and packages in Java. You can use `.` to specify nested namespaces / 169packages. 170 171### Includes 172 173You can include other schemas files in your current one, e.g.: 174 175 include "mydefinitions.fbs"; 176 177This makes it easier to refer to types defined elsewhere. `include` 178automatically ensures each file is parsed just once, even when referred to 179more than once. 180 181When using the `flatc` compiler to generate code for schema definitions, 182only definitions in the current file will be generated, not those from the 183included files (those you still generate separately). 184 185### Root type 186 187This declares what you consider to be the root table (or struct) of the 188serialized data. This is particularly important for parsing JSON data, 189which doesn't include object type information. 190 191### File identification and extension 192 193Typically, a FlatBuffer binary buffer is not self-describing, i.e. it 194needs you to know its schema to parse it correctly. But if you 195want to use a FlatBuffer as a file format, it would be convenient 196to be able to have a "magic number" in there, like most file formats 197have, to be able to do a sanity check to see if you're reading the 198kind of file you're expecting. 199 200Now, you can always prefix a FlatBuffer with your own file header, 201but FlatBuffers has a built-in way to add an identifier to a 202FlatBuffer that takes up minimal space, and keeps the buffer 203compatible with buffers that don't have such an identifier. 204 205You can specify in a schema, similar to `root_type`, that you intend 206for this type of FlatBuffer to be used as a file format: 207 208 file_identifier "MYFI"; 209 210Identifiers must always be exactly 4 characters long. These 4 characters 211will end up as bytes at offsets 4-7 (inclusive) in the buffer. 212 213For any schema that has such an identifier, `flatc` will automatically 214add the identifier to any binaries it generates (with `-b`), 215and generated calls like `FinishMonsterBuffer` also add the identifier. 216If you have specified an identifier and wish to generate a buffer 217without one, you can always still do so by calling 218`FlatBufferBuilder::Finish` explicitly. 219 220After loading a buffer, you can use a call like 221`MonsterBufferHasIdentifier` to check if the identifier is present. 222 223Note that this is best for open-ended uses such as files. If you simply wanted 224to send one of a set of possible messages over a network for example, you'd 225be better off with a union. 226 227Additionally, by default `flatc` will output binary files as `.bin`. 228This declaration in the schema will change that to whatever you want: 229 230 file_extension "ext"; 231 232### RPC interface declarations 233 234You can declare RPC calls in a schema, that define a set of functions 235that take a FlatBuffer as an argument (the request) and return a FlatBuffer 236as the response (both of which must be table types): 237 238 rpc_service MonsterStorage { 239 Store(Monster):StoreResponse; 240 Retrieve(MonsterId):Monster; 241 } 242 243What code this produces and how it is used depends on language and RPC system 244used, there is preliminary support for GRPC through the `--grpc` code generator, 245see `grpc/tests` for an example. 246 247### Comments & documentation 248 249May be written as in most C-based languages. Additionally, a triple 250comment (`///`) on a line by itself signals that a comment is documentation 251for whatever is declared on the line after it 252(table/struct/field/enum/union/element), and the comment is output 253in the corresponding C++ code. Multiple such lines per item are allowed. 254 255### Attributes 256 257Attributes may be attached to a declaration, behind a field, or after 258the name of a table/struct/enum/union. These may either have a value or 259not. Some attributes like `deprecated` are understood by the compiler; 260user defined ones need to be declared with the attribute declaration 261(like `priority` in the example above), and are 262available to query if you parse the schema at runtime. 263This is useful if you write your own code generators/editors etc., and 264you wish to add additional information specific to your tool (such as a 265help text). 266 267Current understood attributes: 268 269- `id: n` (on a table field): manually set the field identifier to `n`. 270 If you use this attribute, you must use it on ALL fields of this table, 271 and the numbers must be a contiguous range from 0 onwards. 272 Additionally, since a union type effectively adds two fields, its 273 id must be that of the second field (the first field is the type 274 field and not explicitly declared in the schema). 275 For example, if the last field before the union field had id 6, 276 the union field should have id 8, and the unions type field will 277 implicitly be 7. 278 IDs allow the fields to be placed in any order in the schema. 279 When a new field is added to the schema it must use the next available ID. 280- `deprecated` (on a field): do not generate accessors for this field 281 anymore, code should stop using this data. 282- `required` (on a non-scalar table field): this field must always be set. 283 By default, all fields are optional, i.e. may be left out. This is 284 desirable, as it helps with forwards/backwards compatibility, and 285 flexibility of data structures. It is also a burden on the reading code, 286 since for non-scalar fields it requires you to check against NULL and 287 take appropriate action. By specifying this field, you force code that 288 constructs FlatBuffers to ensure this field is initialized, so the reading 289 code may access it directly, without checking for NULL. If the constructing 290 code does not initialize this field, they will get an assert, and also 291 the verifier will fail on buffers that have missing required fields. 292- `force_align: size` (on a struct): force the alignment of this struct 293 to be something higher than what it is naturally aligned to. Causes 294 these structs to be aligned to that amount inside a buffer, IF that 295 buffer is allocated with that alignment (which is not necessarily 296 the case for buffers accessed directly inside a `FlatBufferBuilder`). 297- `bit_flags` (on an enum): the values of this field indicate bits, 298 meaning that any value N specified in the schema will end up 299 representing 1<<N, or if you don't specify values at all, you'll get 300 the sequence 1, 2, 4, 8, ... 301- `nested_flatbuffer: "table_name"` (on a field): this indicates that the field 302 (which must be a vector of ubyte) contains flatbuffer data, for which the 303 root type is given by `table_name`. The generated code will then produce 304 a convenient accessor for the nested FlatBuffer. 305- `key` (on a field): this field is meant to be used as a key when sorting 306 a vector of the type of table it sits in. Can be used for in-place 307 binary search. 308- `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose 309 value during JSON parsing is allowed to be a string, which will then be 310 stored as its hash. The value of attribute is the hashing algorithm to 311 use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`. 312- `original_order` (on a table): since elements in a table do not need 313 to be stored in any particular order, they are often optimized for 314 space by sorting them to size. This attribute stops that from happening. 315 There should generally not be any reason to use this flag. 316- 'native_*'. Several attributes have been added to support the [C++ object 317 Based API](@ref flatbuffers_cpp_object_based_api). All such attributes 318 are prefixed with the term "native_". 319 320 321## JSON Parsing 322 323The same parser that parses the schema declarations above is also able 324to parse JSON objects that conform to this schema. So, unlike other JSON 325parsers, this parser is strongly typed, and parses directly into a FlatBuffer 326(see the compiler documentation on how to do this from the command line, or 327the C++ documentation on how to do this at runtime). 328 329Besides needing a schema, there are a few other changes to how it parses 330JSON: 331 332- It accepts field names with and without quotes, like many JSON parsers 333 already do. It outputs them without quotes as well, though can be made 334 to output them using the `strict_json` flag. 335- If a field has an enum type, the parser will recognize symbolic enum 336 values (with or without quotes) instead of numbers, e.g. 337 `field: EnumVal`. If a field is of integral type, you can still use 338 symbolic names, but values need to be prefixed with their type and 339 need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums 340 representing flags, you may place multiple inside a string 341 separated by spaces to OR them, e.g. 342 `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`. 343- Similarly, for unions, these need to specified with two fields much like 344 you do when serializing from code. E.g. for a field `foo`, you must 345 add a field `foo_type: FooOne` right before the `foo` field, where 346 `FooOne` would be the table out of the union you want to use. 347- A field that has the value `null` (e.g. `field: null`) is intended to 348 have the default value for that field (thus has the same effect as if 349 that field wasn't specified at all). 350- It has some built in conversion functions, so you can write for example 351 `rad(180)` where ever you'd normally write `3.14159`. 352 Currently supports the following functions: `rad`, `deg`, `cos`, `sin`, 353 `tan`, `acos`, `asin`, `atan`. 354 355When parsing JSON, it recognizes the following escape codes in strings: 356 357- `\n` - linefeed. 358- `\t` - tab. 359- `\r` - carriage return. 360- `\b` - backspace. 361- `\f` - form feed. 362- `\"` - double quote. 363- `\\` - backslash. 364- `\/` - forward slash. 365- `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8 366 representation. 367- `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is 368 not in the JSON spec (see http://json.org/), but is needed to be able to 369 encode arbitrary binary in strings to text and back without losing 370 information (e.g. the byte 0xFF can't be represented in standard JSON). 371 372It also generates these escape codes back again when generating JSON from a 373binary representation. 374 375## Guidelines 376 377### Efficiency 378 379FlatBuffers is all about efficiency, but to realize that efficiency you 380require an efficient schema. There are usually multiple choices on 381how to represent data that have vastly different size characteristics. 382 383It is very common nowadays to represent any kind of data as dictionaries 384(as in e.g. JSON), because of its flexibility and extensibility. While 385it is possible to emulate this in FlatBuffers (as a vector 386of tables with key and value(s)), this is a bad match for a strongly 387typed system like FlatBuffers, leading to relatively large binaries. 388FlatBuffer tables are more flexible than classes/structs in most systems, 389since having a large number of fields only few of which are actually 390used is still efficient. You should thus try to organize your data 391as much as possible such that you can use tables where you might be 392tempted to use a dictionary. 393 394Similarly, strings as values should only be used when they are 395truely open-ended. If you can, always use an enum instead. 396 397FlatBuffers doesn't have inheritance, so the way to represent a set 398of related data structures is a union. Unions do have a cost however, 399so an alternative to a union is to have a single table that has 400all the fields of all the data structures you are trying to 401represent, if they are relatively similar / share many fields. 402Again, this is efficient because optional fields are cheap. 403 404FlatBuffers supports the full range of integer sizes, so try to pick 405the smallest size needed, rather than defaulting to int/long. 406 407Remember that you can share data (refer to the same string/table 408within a buffer), so factoring out repeating data into its own 409data structure may be worth it. 410 411### Style guide 412 413Identifiers in a schema are meant to translate to many different programming 414languages, so using the style of your "main" language is generally a bad idea. 415 416For this reason, below is a suggested style guide to adhere to, to keep schemas 417consistent for interoperation regardless of the target language. 418 419Where possible, the code generators for specific languages will generate 420identifiers that adhere to the language style, based on the schema identifiers. 421 422- Table, struct, enum and rpc names (types): UpperCamelCase. 423- Table and struct field names: snake_case. This is translated to lowerCamelCase 424 automatically for some languages, e.g. Java. 425- Enum values: UpperCamelCase. 426- namespaces: UpperCamelCase. 427 428Formatting (this is less important, but still worth adhering to): 429 430- Opening brace: on the same line as the start of the declaration. 431- Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`. 432 433For an example, see the schema at the top of this file. 434 435## Gotchas 436 437### Schemas and version control 438 439FlatBuffers relies on new field declarations being added at the end, and earlier 440declarations to not be removed, but be marked deprecated when needed. We think 441this is an improvement over the manual number assignment that happens in 442Protocol Buffers (and which is still an option using the `id` attribute 443mentioned above). 444 445One place where this is possibly problematic however is source control. If user 446A adds a field, generates new binary data with this new schema, then tries to 447commit both to source control after user B already committed a new field also, 448and just auto-merges the schema, the binary files are now invalid compared to 449the new schema. 450 451The solution of course is that you should not be generating binary data before 452your schema changes have been committed, ensuring consistency with the rest of 453the world. If this is not practical for you, use explicit field ids, which 454should always generate a merge conflict if two people try to allocate the same 455id. 456 457### Schema evolution examples 458 459Some examples to clarify what happens as you change a schema: 460 461If we have the following original schema: 462 463 table { a:int; b:int; } 464 465And we extend it: 466 467 table { a:int; b:int; c:int; } 468 469This is ok. Code compiled with the old schema reading data generated with the 470new one will simply ignore the presence of the new field. Code compiled with the 471new schema reading old data will get the default value for `c` (which is 0 472in this case, since it is not specified). 473 474 table { a:int (deprecated); b:int; } 475 476This is also ok. Code compiled with the old schema reading newer data will now 477always get the default value for `a` since it is not present. Code compiled 478with the new schema now cannot read nor write `a` anymore (any existing code 479that tries to do so will result in compile errors), but can still read 480old data (they will ignore the field). 481 482 table { c:int a:int; b:int; } 483 484This is NOT ok, as this makes the schemas incompatible. Old code reading newer 485data will interpret `c` as if it was `a`, and new code reading old data 486accessing `a` will instead receive `b`. 487 488 table { c:int (id: 2); a:int (id: 0); b:int (id: 1); } 489 490This is ok. If your intent was to order/group fields in a way that makes sense 491semantically, you can do so using explicit id assignment. Now we are compatible 492with the original schema, and the fields can be ordered in any way, as long as 493we keep the sequence of ids. 494 495 table { b:int; } 496 497NOT ok. We can only remove a field by deprecation, regardless of wether we use 498explicit ids or not. 499 500 table { a:uint; b:uint; } 501 502This is MAYBE ok, and only in the case where the type change is the same size, 503like here. If old data never contained any negative numbers, this will be 504safe to do. 505 506 table { a:int = 1; b:int = 2; } 507 508Generally NOT ok. Any older data written that had 0 values were not written to 509the buffer, and rely on the default value to be recreated. These will now have 510those values appear to `1` and `2` instead. There may be cases in which this 511is ok, but care must be taken. 512 513 table { aa:int; bb:int; } 514 515Occasionally ok. You've renamed fields, which will break all code (and JSON 516files!) that use this schema, but as long as the change is obvious, this is not 517incompatible with the actual binary buffers, since those only ever address 518fields by id/offset. 519<br> 520 521### Testing whether a field is present in a table 522 523Most serialization formats (e.g. JSON or Protocol Buffers) make it very 524explicit in the format whether a field is present in an object or not, 525allowing you to use this as "extra" information. 526 527In FlatBuffers, this also holds for everything except scalar values. 528 529FlatBuffers by default will not write fields that are equal to the default 530value (for scalars), sometimes resulting in a significant space savings. 531 532However, this also means testing whether a field is "present" is somewhat 533meaningless, since it does not tell you if the field was actually written by 534calling `add_field` style calls, unless you're only interested in this 535information for non-default values. 536 537Some `FlatBufferBuilder` implementations have an option called `force_defaults` 538that circumvents this behavior, and writes fields even if they are equal to 539the default. You can then use `IsFieldPresent` to query this. 540 541Another option that works in all languages is to wrap a scalar field in a 542struct. This way it will return null if it is not present. The cool thing 543is that structs don't take up any more space than the scalar they represent. 544 545 [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language 546