1 2μpb Design 3---------- 4 5μpb has the following design goals: 6 7- C89 compatible. 8- small code size (both for the core library and generated messages). 9- fast performance (hundreds of MB/s). 10- idiomatic for C programs. 11- easy to wrap in high-level languages (Python, Ruby, Lua, etc) with 12 good performance and all standard protobuf features. 13- hands-off about memory management, allowing for easy integration 14 with existing VMs and/or garbage collectors. 15- offers binary ABI compatibility between apps, generated messages, and 16 the core library (doesn't require re-generating messages or recompiling 17 your application when the core library changes). 18- provides all features that users expect from a protobuf library 19 (generated messages in C, reflection, text format, etc.). 20- layered, so the core is small and doesn't require descriptors. 21- tidy about symbol references, so that any messages or features that 22 aren't used by a C program can have their code GC'd by the linker. 23- possible to use protobuf binary format without leaking message/field 24 names into the binary. 25 26μpb accomplishes these goals by keeping a very small core that does not contain 27descriptors. We need some way of knowing what fields are in each message and 28where they live, but instead of descriptors, we keep a small/lightweight summary 29of the .proto file. We call this a `upb_msglayout`. It contains the bare 30minimum of what we need to know to parse and serialize protobuf binary format 31into our internal representation for messages, `upb_msg`. 32 33The core then contains functions to parse/serialize a message, given a `upb_msg*` 34and a `const upb_msglayout*`. 35 36This approach is similar to [nanopb](https://github.com/nanopb/nanopb) which 37also compiles message definitions to a compact, internal representation without 38names. However nanopb does not aim to be a fully-featured library, and has no 39support for text format, JSON, or descriptors. μpb is unique in that it has a 40small core similar to nanopb (though not quite as small), but also offers a 41full-featured protobuf library for applications that want reflection, text 42format, JSON format, etc. 43 44Without descriptors, the core doesn't have access to field names, so it cannot 45parse/serialize to protobuf text format or JSON. Instead this functionality 46lives in separate modules that depend on the module implementing descriptors. 47With the descriptor module we can parse/serialize binary descriptors and 48validate that they follow all the rules of protobuf schemas. 49 50To provide binary compatibility, we version the structs that generated messages 51use to create a `upb_msglayout*`. The current initializers are 52`upb_msglayout_msginit_v1`, `upb_msglayout_fieldinit_v1`, etc. Then 53`upb_msglayout*` uses these as its internal representation. If upb changes its 54internal representation for a `upb_msglayout*`, it will also include code to 55convert the old representation to the new representation. This will use some 56more memory/CPU at runtime to convert between the two, but apps that statically 57link μpb will never need to worry about this. 58 59TODO 60---- 61 621. revise our generated code until it is in a state where we feel comfortable 63 committing to API/ABI stability for it. In particular there is an open 64 question of whether non-ABI-compatible field accesses should have a 65 fastpath different from the ABI-compatible field access. 661. Add missing features (maps, extensions, unknown fields). 671. Flesh out C++ wrappers. 681. *(lower-priority)*: revise all of the existing encoders/decoders and 69 handlers. We probably will want to keep handlers, since they let us decouple 70 encoders/decoders from `upb_msg`, but we need to simplify all of that a LOT. 71 Likely we will want to make handlers only per-message instead of per-field, 72 except for variable-length fields. 73