1FlatBuffers white paper    {#flatbuffers_white_paper}
2=======================
3
4This document tries to shed some light on to the "why" of FlatBuffers, a
5new serialization library.
6
7## Motivation
8
9Back in the good old days, performance was all about instructions and
10cycles. Nowadays, processing units have run so far ahead of the memory
11subsystem, that making an efficient application should start and finish
12with thinking about memory. How much you use of it. How you lay it out
13and access it. How you allocate it. When you copy it.
14
15Serialization is a pervasive activity in a lot programs, and a common
16source of memory inefficiency, with lots of temporary data structures
17needed to parse and represent data, and inefficient allocation patterns
18and locality.
19
20If it would be possible to do serialization with no temporary objects,
21no additional allocation, no copying, and good locality, this could be
22of great value. The reason serialization systems usually don't manage
23this is because it goes counter to forwards/backwards compatability, and
24platform specifics like endianness and alignment.
25
26FlatBuffers is what you get if you try anyway.
27
28In particular, FlatBuffers focus is on mobile hardware (where memory
29size and memory bandwidth is even more constrained than on desktop
30hardware), and applications that have the highest performance needs:
31games.
32
33## FlatBuffers
34
35*This is a summary of FlatBuffers functionality, with some rationale.
36A more detailed description can be found in the FlatBuffers
37documentation.*
38
39### Summary
40
41A FlatBuffer is a binary buffer containing nested objects (structs,
42tables, vectors,..) organized using offsets so that the data can be
43traversed in-place just like any pointer-based data structure. Unlike
44most in-memory data structures however, it uses strict rules of
45alignment and endianness (always little) to ensure these buffers are
46cross platform. Additionally, for objects that are tables, FlatBuffers
47provides forwards/backwards compatibility and general optionality of
48fields, to support most forms of format evolution.
49
50You define your object types in a schema, which can then be compiled to
51C++ or Java for low to zero overhead reading & writing.
52Optionally, JSON data can be dynamically parsed into buffers.
53
54### Tables
55
56Tables are the cornerstone of FlatBuffers, since format evolution is
57essential for most applications of serialization. Typically, dealing
58with format changes is something that can be done transparently during
59the parsing process of most serialization solutions out there.
60But a FlatBuffer isn't parsed before it is accessed.
61
62Tables get around this by using an extra indirection to access fields,
63through a *vtable*. Each table comes with a vtable (which may be shared
64between multiple tables with the same layout), and contains information
65where fields for this particular kind of instance of vtable are stored.
66The vtable may also indicate that the field is not present (because this
67FlatBuffer was written with an older version of the software, of simply
68because the information was not necessary for this instance, or deemed
69deprecated), in which case a default value is returned.
70
71Tables have a low overhead in memory (since vtables are small and
72shared) and in access cost (an extra indirection), but provide great
73flexibility. Tables may even cost less memory than the equivalent
74struct, since fields do not need to be stored when they are equal to
75their default.
76
77FlatBuffers additionally offers "naked" structs, which do not offer
78forwards/backwards compatibility, but can be even smaller (useful for
79very small objects that are unlikely to change, like e.g. a coordinate
80pair or a RGBA color).
81
82### Schemas
83
84While schemas reduce some generality (you can't just read any data
85without having its schema), they have a lot of upsides:
86
87-   Most information about the format can be factored into the generated
88    code, reducing memory needed to store data, and time to access it.
89
90-   The strong typing of the data definitions means less error
91    checking/handling at runtime (less can go wrong).
92
93-   A schema enables us to access a buffer without parsing.
94
95FlatBuffer schemas are fairly similar to those of the incumbent,
96Protocol Buffers, and generally should be readable to those familiar
97with the C family of languages. We chose to improve upon the features
98offered by .proto files in the following ways:
99
100-   Deprecation of fields instead of manual field id assignment.
101    Extending an object in a .proto means hunting for a free slot among
102    the numbers (preferring lower numbers since they have a more compact
103    representation). Besides being inconvenient, it also makes removing
104    fields problematic: you either have to keep them, not making it
105    obvious that this field shouldn't be read/written anymore, and still
106    generating accessors. Or you remove it, but now you risk that
107    there's still old data around that uses that field by the time
108    someone reuses that field id, with nasty consequences.
109
110-   Differentiating between tables and structs (see above). Effectively
111    all table fields are `optional`, and all struct fields are
112    `required`.
113
114-   Having a native vector type instead of `repeated`. This gives you a
115    length without having to collect all items, and in the case of
116    scalars provides for a more compact representation, and one that
117    guarantees adjacency.
118
119-   Having a native `union` type instead of using a series of `optional`
120    fields, all of which must be checked individually.
121
122-   Being able to define defaults for all scalars, instead of having to
123    deal with their optionality at each access.
124
125-   A parser that can deal with both schemas and data definitions (JSON
126    compatible) uniformly.
127
128<br>
129