1======================
2Nanopb: Basic concepts
3======================
4
5.. include :: menu.rst
6
7The things outlined here are the underlying concepts of the nanopb design.
8
9.. contents::
10
11Proto files
12===========
13All Protocol Buffers implementations use .proto files to describe the message
14format. The point of these files is to be a portable interface description
15language.
16
17Compiling .proto files for nanopb
18---------------------------------
19Nanopb uses the Google's protoc compiler to parse the .proto file, and then a
20python script to generate the C header and source code from it::
21
22    user@host:~$ protoc -omessage.pb message.proto
23    user@host:~$ python ../generator/nanopb_generator.py message.pb
24    Writing to message.h and message.c
25    user@host:~$
26
27Modifying generator behaviour
28-----------------------------
29Using generator options, you can set maximum sizes for fields in order to
30allocate them statically. The preferred way to do this is to create an .options
31file with the same name as your .proto file::
32
33   # Foo.proto
34   message Foo {
35      required string name = 1;
36   }
37
38::
39
40   # Foo.options
41   Foo.name max_size:16
42
43For more information on this, see the `Proto file options`_ section in the
44reference manual.
45
46.. _`Proto file options`: reference.html#proto-file-options
47
48Streams
49=======
50
51Nanopb uses streams for accessing the data in encoded format.
52The stream abstraction is very lightweight, and consists of a structure (*pb_ostream_t* or *pb_istream_t*) which contains a pointer to a callback function.
53
54There are a few generic rules for callback functions:
55
56#) Return false on IO errors. The encoding or decoding process will abort immediately.
57#) Use state to store your own data, such as a file descriptor.
58#) *bytes_written* and *bytes_left* are updated by pb_write and pb_read.
59#) Your callback may be used with substreams. In this case *bytes_left*, *bytes_written* and *max_size* have smaller values than the original stream. Don't use these values to calculate pointers.
60#) Always read or write the full requested length of data. For example, POSIX *recv()* needs the *MSG_WAITALL* parameter to accomplish this.
61
62Output streams
63--------------
64
65::
66
67 struct _pb_ostream_t
68 {
69    bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
70    void *state;
71    size_t max_size;
72    size_t bytes_written;
73 };
74
75The *callback* for output stream may be NULL, in which case the stream simply counts the number of bytes written. In this case, *max_size* is ignored.
76
77Otherwise, if *bytes_written* + bytes_to_be_written is larger than *max_size*, pb_write returns false before doing anything else. If you don't want to limit the size of the stream, pass SIZE_MAX.
78
79**Example 1:**
80
81This is the way to get the size of the message without storing it anywhere::
82
83 Person myperson = ...;
84 pb_ostream_t sizestream = {0};
85 pb_encode(&sizestream, Person_fields, &myperson);
86 printf("Encoded size is %d\n", sizestream.bytes_written);
87
88**Example 2:**
89
90Writing to stdout::
91
92 bool callback(pb_ostream_t *stream, const uint8_t *buf, size_t count)
93 {
94    FILE *file = (FILE*) stream->state;
95    return fwrite(buf, 1, count, file) == count;
96 }
97
98 pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
99
100Input streams
101-------------
102For input streams, there is one extra rule:
103
104#) You don't need to know the length of the message in advance. After getting EOF error when reading, set bytes_left to 0 and return false. Pb_decode will detect this and if the EOF was in a proper position, it will return true.
105
106Here is the structure::
107
108 struct _pb_istream_t
109 {
110    bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
111    void *state;
112    size_t bytes_left;
113 };
114
115The *callback* must always be a function pointer. *Bytes_left* is an upper limit on the number of bytes that will be read. You can use SIZE_MAX if your callback handles EOF as described above.
116
117**Example:**
118
119This function binds an input stream to stdin:
120
121::
122
123 bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
124 {
125    FILE *file = (FILE*)stream->state;
126    bool status;
127
128    if (buf == NULL)
129    {
130        while (count-- && fgetc(file) != EOF);
131        return count == 0;
132    }
133
134    status = (fread(buf, 1, count, file) == count);
135
136    if (feof(file))
137        stream->bytes_left = 0;
138
139    return status;
140 }
141
142 pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
143
144Data types
145==========
146
147Most Protocol Buffers datatypes have directly corresponding C datatypes, such as int32 is int32_t, float is float and bool is bool. However, the variable-length datatypes are more complex:
148
1491) Strings, bytes and repeated fields of any type map to callback functions by default.
1502) If there is a special option *(nanopb).max_size* specified in the .proto file, string maps to null-terminated char array and bytes map to a structure containing a char array and a size field.
1513) If *(nanopb).type* is set to *FT_INLINE* and *(nanopb).max_size* is also set, then bytes map to an inline byte array of fixed size.
1523) If there is a special option *(nanopb).max_count* specified on a repeated field, it maps to an array of whatever type is being repeated. Another field will be created for the actual number of entries stored.
153
154=============================================================================== =======================
155      field in .proto                                                           autogenerated in .h
156=============================================================================== =======================
157required string name = 1;                                                       pb_callback_t name;
158required string name = 1 [(nanopb).max_size = 40];                              char name[40];
159repeated string name = 1 [(nanopb).max_size = 40];                              pb_callback_t name;
160repeated string name = 1 [(nanopb).max_size = 40, (nanopb).max_count = 5];      | size_t name_count;
161                                                                                | char name[5][40];
162required bytes data = 1 [(nanopb).max_size = 40];                               | typedef struct {
163                                                                                |    size_t size;
164                                                                                |    pb_byte_t bytes[40];
165                                                                                | } Person_data_t;
166                                                                                | Person_data_t data;
167required bytes data = 1 [(nanopb).max_size = 40, (nanopb.type) = FT_INLINE];    | pb_byte_t data[40];
168=============================================================================== =======================
169
170The maximum lengths are checked in runtime. If string/bytes/array exceeds the allocated length, *pb_decode* will return false.
171
172Note: for the *bytes* datatype, the field length checking may not be exact.
173The compiler may add some padding to the *pb_bytes_t* structure, and the nanopb runtime doesn't know how much of the structure size is padding. Therefore it uses the whole length of the structure for storing data, which is not very smart but shouldn't cause problems. In practise, this means that if you specify *(nanopb).max_size=5* on a *bytes* field, you may be able to store 6 bytes there. For the *string* field type, the length limit is exact.
174
175Field callbacks
176===============
177When a field has dynamic length, nanopb cannot statically allocate storage for it. Instead, it allows you to handle the field in whatever way you want, using a callback function.
178
179The `pb_callback_t`_ structure contains a function pointer and a *void* pointer called *arg* you can use for passing data to the callback. If the function pointer is NULL, the field will be skipped. A pointer to the *arg* is passed to the function, so that it can modify it and retrieve the value.
180
181The actual behavior of the callback function is different in encoding and decoding modes. In encoding mode, the callback is called once and should write out everything, including field tags. In decoding mode, the callback is called repeatedly for every data item.
182
183.. _`pb_callback_t`: reference.html#pb-callback-t
184
185Encoding callbacks
186------------------
187::
188
189    bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, void * const *arg);
190
191When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
192
193Usually you can use `pb_encode_tag_for_field`_ to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ instead to specify a wire type of *PB_WT_STRING*.
194
195If the callback is used in a submessage, it will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, it is called only once.
196
197.. _`pb_encode`: reference.html#pb-encode
198.. _`pb_encode_tag_for_field`: reference.html#pb-encode-tag-for-field
199.. _`pb_encode_tag`: reference.html#pb-encode-tag
200
201This callback writes out a dynamically sized string::
202
203    bool write_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
204    {
205        char *str = get_string_from_somewhere();
206        if (!pb_encode_tag_for_field(stream, field))
207            return false;
208
209        return pb_encode_string(stream, (uint8_t*)str, strlen(str));
210    }
211
212Decoding callbacks
213------------------
214::
215
216    bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void **arg);
217
218When decoding, the callback receives a length-limited substring that reads the contents of a single field. The field tag has already been read. For *string* and *bytes*, the length value has already been parsed, and is available at *stream->bytes_left*.
219
220The callback will be called multiple times for repeated fields. For packed fields, you can either read multiple values until the stream ends, or leave it to `pb_decode`_ to call your function over and over until all values have been read.
221
222.. _`pb_decode`: reference.html#pb-decode
223
224This callback reads multiple integers and prints them::
225
226    bool read_ints(pb_istream_t *stream, const pb_field_t *field, void **arg)
227    {
228        while (stream->bytes_left)
229        {
230            uint64_t value;
231            if (!pb_decode_varint(stream, &value))
232                return false;
233            printf("%lld\n", value);
234        }
235        return true;
236    }
237
238Field description array
239=======================
240
241For using the *pb_encode* and *pb_decode* functions, you need an array of pb_field_t constants describing the structure you wish to encode. This description is usually autogenerated from .proto file.
242
243For example this submessage in the Person.proto file::
244
245 message Person {
246    message PhoneNumber {
247        required string number = 1 [(nanopb).max_size = 40];
248        optional PhoneType type = 2 [default = HOME];
249    }
250 }
251
252generates this field description array for the structure *Person_PhoneNumber*::
253
254 const pb_field_t Person_PhoneNumber_fields[3] = {
255    PB_FIELD(  1, STRING  , REQUIRED, STATIC, Person_PhoneNumber, number, number, 0),
256    PB_FIELD(  2, ENUM    , OPTIONAL, STATIC, Person_PhoneNumber, type, number, &Person_PhoneNumber_type_default),
257    PB_LAST_FIELD
258 };
259
260Oneof
261=====
262Protocol Buffers supports `oneof`_ sections. Here is an example of ``oneof`` usage::
263
264 message MsgType1 {
265     required int32 value = 1;
266 }
267
268 message MsgType2 {
269     required bool value = 1;
270 }
271
272 message MsgType3 {
273     required int32 value1 = 1;
274     required int32 value2 = 2;
275 }
276
277 message MyMessage {
278     required uint32 uid = 1;
279     required uint32 pid = 2;
280     required uint32 utime = 3;
281
282     oneof payload {
283         MsgType1 msg1 = 4;
284         MsgType2 msg2 = 5;
285         MsgType3 msg3 = 6;
286     }
287 }
288
289Nanopb will generate ``payload`` as a C union and add an additional field ``which_payload``::
290
291  typedef struct _MyMessage {
292    uint32_t uid;
293    uint32_t pid;
294    uint32_t utime;
295    pb_size_t which_payload;
296    union {
297        MsgType1 msg1;
298        MsgType2 msg2;
299        MsgType3 msg3;
300    } payload;
301  /* @@protoc_insertion_point(struct:MyMessage) */
302  } MyMessage;
303
304``which_payload`` indicates which of the ``oneof`` fields is actually set.
305The user is expected to set the filed manually using the correct field tag::
306
307  MyMessage msg = MyMessage_init_zero;
308  msg.payload.msg2.value = true;
309  msg.which_payload = MyMessage_msg2_tag;
310
311Notice that neither ``which_payload`` field nor the unused fileds in ``payload``
312will consume any space in the resulting encoded message.
313
314.. _`oneof`: https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field
315
316Extension fields
317================
318Protocol Buffers supports a concept of `extension fields`_, which are
319additional fields to a message, but defined outside the actual message.
320The definition can even be in a completely separate .proto file.
321
322The base message is declared as extensible by keyword *extensions* in
323the .proto file::
324
325 message MyMessage {
326     .. fields ..
327     extensions 100 to 199;
328 }
329
330For each extensible message, *nanopb_generator.py* declares an additional
331callback field called *extensions*. The field and associated datatype
332*pb_extension_t* forms a linked list of handlers. When an unknown field is
333encountered, the decoder calls each handler in turn until either one of them
334handles the field, or the list is exhausted.
335
336The actual extensions are declared using the *extend* keyword in the .proto,
337and are in the global namespace::
338
339 extend MyMessage {
340     optional int32 myextension = 100;
341 }
342
343For each extension, *nanopb_generator.py* creates a constant of type
344*pb_extension_type_t*. To link together the base message and the extension,
345you have to:
346
3471. Allocate storage for your field, matching the datatype in the .proto.
348   For example, for a *int32* field, you need a *int32_t* variable to store
349   the value.
3502. Create a *pb_extension_t* constant, with pointers to your variable and
351   to the generated *pb_extension_type_t*.
3523. Set the *message.extensions* pointer to point to the *pb_extension_t*.
353
354An example of this is available in *tests/test_encode_extensions.c* and
355*tests/test_decode_extensions.c*.
356
357.. _`extension fields`: https://developers.google.com/protocol-buffers/docs/proto#extensions
358
359Message framing
360===============
361Protocol Buffers does not specify a method of framing the messages for transmission.
362This is something that must be provided by the library user, as there is no one-size-fits-all
363solution. Typical needs for a framing format are to:
364
3651. Encode the message length.
3662. Encode the message type.
3673. Perform any synchronization and error checking that may be needed depending on application.
368
369For example UDP packets already fullfill all the requirements, and TCP streams typically only
370need a way to identify the message length and type. Lower level interfaces such as serial ports
371may need a more robust frame format, such as HDLC (high-level data link control).
372
373Nanopb provides a few helpers to facilitate implementing framing formats:
374
3751. Functions *pb_encode_delimited* and *pb_decode_delimited* prefix the message data with a varint-encoded length.
3762. Union messages and oneofs are supported in order to implement top-level container messages.
3773. Message IDs can be specified using the *(nanopb_msgopt).msgid* option and can then be accessed from the header.
378
379Return values and error handling
380================================
381
382Most functions in nanopb return bool: *true* means success, *false* means failure. There is also some support for error messages for debugging purposes: the error messages go in *stream->errmsg*.
383
384The error messages help in guessing what is the underlying cause of the error. The most common error conditions are:
385
3861) Running out of memory, i.e. stack overflow.
3872) Invalid field descriptors (would usually mean a bug in the generator).
3883) IO errors in your own stream callbacks.
3894) Errors that happen in your callback functions.
3905) Exceeding the max_size or bytes_left of a stream.
3916) Exceeding the max_size of a string or array field
3927) Invalid protocol buffers binary message.
393