1====================== 2Nanopb: Basic concepts 3====================== 4 5.. include :: menu.rst 6 7The things outlined here are the underlying concepts of the nanopb design. 8 9.. contents:: 10 11Proto files 12=========== 13All Protocol Buffers implementations use .proto files to describe the message 14format. The point of these files is to be a portable interface description 15language. 16 17Compiling .proto files for nanopb 18--------------------------------- 19Nanopb uses the Google's protoc compiler to parse the .proto file, and then a 20python script to generate the C header and source code from it:: 21 22 user@host:~$ protoc -omessage.pb message.proto 23 user@host:~$ python ../generator/nanopb_generator.py message.pb 24 Writing to message.h and message.c 25 user@host:~$ 26 27Modifying generator behaviour 28----------------------------- 29Using generator options, you can set maximum sizes for fields in order to 30allocate them statically. The preferred way to do this is to create an .options 31file with the same name as your .proto file:: 32 33 # Foo.proto 34 message Foo { 35 required string name = 1; 36 } 37 38:: 39 40 # Foo.options 41 Foo.name max_size:16 42 43For more information on this, see the `Proto file options`_ section in the 44reference manual. 45 46.. _`Proto file options`: reference.html#proto-file-options 47 48Streams 49======= 50 51Nanopb uses streams for accessing the data in encoded format. 52The stream abstraction is very lightweight, and consists of a structure (*pb_ostream_t* or *pb_istream_t*) which contains a pointer to a callback function. 53 54There are a few generic rules for callback functions: 55 56#) Return false on IO errors. The encoding or decoding process will abort immediately. 57#) Use state to store your own data, such as a file descriptor. 58#) *bytes_written* and *bytes_left* are updated by pb_write and pb_read. 59#) Your callback may be used with substreams. In this case *bytes_left*, *bytes_written* and *max_size* have smaller values than the original stream. Don't use these values to calculate pointers. 60#) Always read or write the full requested length of data. For example, POSIX *recv()* needs the *MSG_WAITALL* parameter to accomplish this. 61 62Output streams 63-------------- 64 65:: 66 67 struct _pb_ostream_t 68 { 69 bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count); 70 void *state; 71 size_t max_size; 72 size_t bytes_written; 73 }; 74 75The *callback* for output stream may be NULL, in which case the stream simply counts the number of bytes written. In this case, *max_size* is ignored. 76 77Otherwise, if *bytes_written* + bytes_to_be_written is larger than *max_size*, pb_write returns false before doing anything else. If you don't want to limit the size of the stream, pass SIZE_MAX. 78 79**Example 1:** 80 81This is the way to get the size of the message without storing it anywhere:: 82 83 Person myperson = ...; 84 pb_ostream_t sizestream = {0}; 85 pb_encode(&sizestream, Person_fields, &myperson); 86 printf("Encoded size is %d\n", sizestream.bytes_written); 87 88**Example 2:** 89 90Writing to stdout:: 91 92 bool callback(pb_ostream_t *stream, const uint8_t *buf, size_t count) 93 { 94 FILE *file = (FILE*) stream->state; 95 return fwrite(buf, 1, count, file) == count; 96 } 97 98 pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0}; 99 100Input streams 101------------- 102For input streams, there is one extra rule: 103 104#) You don't need to know the length of the message in advance. After getting EOF error when reading, set bytes_left to 0 and return false. Pb_decode will detect this and if the EOF was in a proper position, it will return true. 105 106Here is the structure:: 107 108 struct _pb_istream_t 109 { 110 bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count); 111 void *state; 112 size_t bytes_left; 113 }; 114 115The *callback* must always be a function pointer. *Bytes_left* is an upper limit on the number of bytes that will be read. You can use SIZE_MAX if your callback handles EOF as described above. 116 117**Example:** 118 119This function binds an input stream to stdin: 120 121:: 122 123 bool callback(pb_istream_t *stream, uint8_t *buf, size_t count) 124 { 125 FILE *file = (FILE*)stream->state; 126 bool status; 127 128 if (buf == NULL) 129 { 130 while (count-- && fgetc(file) != EOF); 131 return count == 0; 132 } 133 134 status = (fread(buf, 1, count, file) == count); 135 136 if (feof(file)) 137 stream->bytes_left = 0; 138 139 return status; 140 } 141 142 pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX}; 143 144Data types 145========== 146 147Most Protocol Buffers datatypes have directly corresponding C datatypes, such as int32 is int32_t, float is float and bool is bool. However, the variable-length datatypes are more complex: 148 1491) Strings, bytes and repeated fields of any type map to callback functions by default. 1502) If there is a special option *(nanopb).max_size* specified in the .proto file, string maps to null-terminated char array and bytes map to a structure containing a char array and a size field. 1513) If *(nanopb).fixed_length* is set to *true* and *(nanopb).max_size* is also set, then bytes map to an inline byte array of fixed size. 1524) If there is a special option *(nanopb).max_count* specified on a repeated field, it maps to an array of whatever type is being repeated. Another field will be created for the actual number of entries stored. 1535) If *(nanopb).fixed_count* is set to *true* and *(nanopb).max_count* is also set, the field for the actual number of entries will not by created as the count is always assumed to be max count. 154 155=============================================================================== ======================= 156 field in .proto autogenerated in .h 157=============================================================================== ======================= 158required string name = 1; pb_callback_t name; 159required string name = 1 [(nanopb).max_size = 40]; char name[40]; 160repeated string name = 1 [(nanopb).max_size = 40]; pb_callback_t name; 161repeated string name = 1 [(nanopb).max_size = 40, (nanopb).max_count = 5]; | size_t name_count; 162 | char name[5][40]; 163required bytes data = 1 [(nanopb).max_size = 40]; | typedef struct { 164 | size_t size; 165 | pb_byte_t bytes[40]; 166 | } Person_data_t; 167 | Person_data_t data; 168required bytes data = 1 [(nanopb).max_size = 40, (nanopb).fixed_length = true]; | pb_byte_t data[40]; 169repeated int32 data = 1 [(nanopb).max_count = 5, (nanopb).fixed_count true]; | int32_t data[5]; 170=============================================================================== ======================= 171 172The maximum lengths are checked in runtime. If string/bytes/array exceeds the allocated length, *pb_decode* will return false. 173 174Note: For the *bytes* datatype, the field length checking may not be exact. 175The compiler may add some padding to the *pb_bytes_t* structure, and the nanopb runtime doesn't know how much of the structure size is padding. Therefore it uses the whole length of the structure for storing data, which is not very smart but shouldn't cause problems. In practise, this means that if you specify *(nanopb).max_size=5* on a *bytes* field, you may be able to store 6 bytes there. For the *string* field type, the length limit is exact. 176 177Note: When using the *fixed_count* option, the decoder assumes the repeated elements are 178received sequentially or that repeated elements for a non-packed field will not be interleaved with 179another *fixed_count* non-packed field. 180 181Field callbacks 182=============== 183When a field has dynamic length, nanopb cannot statically allocate storage for it. Instead, it allows you to handle the field in whatever way you want, using a callback function. 184 185The `pb_callback_t`_ structure contains a function pointer and a *void* pointer called *arg* you can use for passing data to the callback. If the function pointer is NULL, the field will be skipped. A pointer to the *arg* is passed to the function, so that it can modify it and retrieve the value. 186 187The actual behavior of the callback function is different in encoding and decoding modes. In encoding mode, the callback is called once and should write out everything, including field tags. In decoding mode, the callback is called repeatedly for every data item. 188 189.. _`pb_callback_t`: reference.html#pb-callback-t 190 191Encoding callbacks 192------------------ 193:: 194 195 bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, void * const *arg); 196 197When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call. 198 199Usually you can use `pb_encode_tag_for_field`_ to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ instead to specify a wire type of *PB_WT_STRING*. 200 201If the callback is used in a submessage, it will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, it is called only once. 202 203.. _`pb_encode`: reference.html#pb-encode 204.. _`pb_encode_tag_for_field`: reference.html#pb-encode-tag-for-field 205.. _`pb_encode_tag`: reference.html#pb-encode-tag 206 207This callback writes out a dynamically sized string:: 208 209 bool write_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg) 210 { 211 char *str = get_string_from_somewhere(); 212 if (!pb_encode_tag_for_field(stream, field)) 213 return false; 214 215 return pb_encode_string(stream, (uint8_t*)str, strlen(str)); 216 } 217 218Decoding callbacks 219------------------ 220:: 221 222 bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void **arg); 223 224When decoding, the callback receives a length-limited substring that reads the contents of a single field. The field tag has already been read. For *string* and *bytes*, the length value has already been parsed, and is available at *stream->bytes_left*. 225 226The callback will be called multiple times for repeated fields. For packed fields, you can either read multiple values until the stream ends, or leave it to `pb_decode`_ to call your function over and over until all values have been read. 227 228.. _`pb_decode`: reference.html#pb-decode 229 230This callback reads multiple integers and prints them:: 231 232 bool read_ints(pb_istream_t *stream, const pb_field_t *field, void **arg) 233 { 234 while (stream->bytes_left) 235 { 236 uint64_t value; 237 if (!pb_decode_varint(stream, &value)) 238 return false; 239 printf("%lld\n", value); 240 } 241 return true; 242 } 243 244Field description array 245======================= 246 247For using the *pb_encode* and *pb_decode* functions, you need an array of pb_field_t constants describing the structure you wish to encode. This description is usually autogenerated from .proto file. 248 249For example this submessage in the Person.proto file:: 250 251 message Person { 252 message PhoneNumber { 253 required string number = 1 [(nanopb).max_size = 40]; 254 optional PhoneType type = 2 [default = HOME]; 255 } 256 } 257 258generates this field description array for the structure *Person_PhoneNumber*:: 259 260 const pb_field_t Person_PhoneNumber_fields[3] = { 261 PB_FIELD( 1, STRING , REQUIRED, STATIC, Person_PhoneNumber, number, number, 0), 262 PB_FIELD( 2, ENUM , OPTIONAL, STATIC, Person_PhoneNumber, type, number, &Person_PhoneNumber_type_default), 263 PB_LAST_FIELD 264 }; 265 266Oneof 267===== 268Protocol Buffers supports `oneof`_ sections. Here is an example of ``oneof`` usage:: 269 270 message MsgType1 { 271 required int32 value = 1; 272 } 273 274 message MsgType2 { 275 required bool value = 1; 276 } 277 278 message MsgType3 { 279 required int32 value1 = 1; 280 required int32 value2 = 2; 281 } 282 283 message MyMessage { 284 required uint32 uid = 1; 285 required uint32 pid = 2; 286 required uint32 utime = 3; 287 288 oneof payload { 289 MsgType1 msg1 = 4; 290 MsgType2 msg2 = 5; 291 MsgType3 msg3 = 6; 292 } 293 } 294 295Nanopb will generate ``payload`` as a C union and add an additional field ``which_payload``:: 296 297 typedef struct _MyMessage { 298 uint32_t uid; 299 uint32_t pid; 300 uint32_t utime; 301 pb_size_t which_payload; 302 union { 303 MsgType1 msg1; 304 MsgType2 msg2; 305 MsgType3 msg3; 306 } payload; 307 /* @@protoc_insertion_point(struct:MyMessage) */ 308 } MyMessage; 309 310``which_payload`` indicates which of the ``oneof`` fields is actually set. 311The user is expected to set the filed manually using the correct field tag:: 312 313 MyMessage msg = MyMessage_init_zero; 314 msg.payload.msg2.value = true; 315 msg.which_payload = MyMessage_msg2_tag; 316 317Notice that neither ``which_payload`` field nor the unused fileds in ``payload`` 318will consume any space in the resulting encoded message. 319 320.. _`oneof`: https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field 321 322Extension fields 323================ 324Protocol Buffers supports a concept of `extension fields`_, which are 325additional fields to a message, but defined outside the actual message. 326The definition can even be in a completely separate .proto file. 327 328The base message is declared as extensible by keyword *extensions* in 329the .proto file:: 330 331 message MyMessage { 332 .. fields .. 333 extensions 100 to 199; 334 } 335 336For each extensible message, *nanopb_generator.py* declares an additional 337callback field called *extensions*. The field and associated datatype 338*pb_extension_t* forms a linked list of handlers. When an unknown field is 339encountered, the decoder calls each handler in turn until either one of them 340handles the field, or the list is exhausted. 341 342The actual extensions are declared using the *extend* keyword in the .proto, 343and are in the global namespace:: 344 345 extend MyMessage { 346 optional int32 myextension = 100; 347 } 348 349For each extension, *nanopb_generator.py* creates a constant of type 350*pb_extension_type_t*. To link together the base message and the extension, 351you have to: 352 3531. Allocate storage for your field, matching the datatype in the .proto. 354 For example, for a *int32* field, you need a *int32_t* variable to store 355 the value. 3562. Create a *pb_extension_t* constant, with pointers to your variable and 357 to the generated *pb_extension_type_t*. 3583. Set the *message.extensions* pointer to point to the *pb_extension_t*. 359 360An example of this is available in *tests/test_encode_extensions.c* and 361*tests/test_decode_extensions.c*. 362 363.. _`extension fields`: https://developers.google.com/protocol-buffers/docs/proto#extensions 364 365Default values 366============== 367Protobuf has two syntax variants, proto2 and proto3. Of these proto2 has user 368definable default values that can be given in .proto file:: 369 370 message MyMessage { 371 optional bytes foo = 1 [default = "ABC\x01\x02\x03"]; 372 optional string bar = 2 [default = "åäö"]; 373 } 374 375Nanopb will generate both static and runtime initialization for the default 376values. In `myproto.pb.h` there will be a `#define MyMessage_init_default` that 377can be used to initialize whole message into default values:: 378 379 MyMessage msg = MyMessage_init_default; 380 381In addition to this, `pb_decode()` will initialize message fields to defaults 382at runtime. If this is not desired, `pb_decode_noinit()` can be used instead. 383 384Message framing 385=============== 386Protocol Buffers does not specify a method of framing the messages for transmission. 387This is something that must be provided by the library user, as there is no one-size-fits-all 388solution. Typical needs for a framing format are to: 389 3901. Encode the message length. 3912. Encode the message type. 3923. Perform any synchronization and error checking that may be needed depending on application. 393 394For example UDP packets already fullfill all the requirements, and TCP streams typically only 395need a way to identify the message length and type. Lower level interfaces such as serial ports 396may need a more robust frame format, such as HDLC (high-level data link control). 397 398Nanopb provides a few helpers to facilitate implementing framing formats: 399 4001. Functions *pb_encode_delimited* and *pb_decode_delimited* prefix the message data with a varint-encoded length. 4012. Union messages and oneofs are supported in order to implement top-level container messages. 4023. Message IDs can be specified using the *(nanopb_msgopt).msgid* option and can then be accessed from the header. 403 404Return values and error handling 405================================ 406 407Most functions in nanopb return bool: *true* means success, *false* means failure. There is also some support for error messages for debugging purposes: the error messages go in *stream->errmsg*. 408 409The error messages help in guessing what is the underlying cause of the error. The most common error conditions are: 410 4111) Running out of memory, i.e. stack overflow. 4122) Invalid field descriptors (would usually mean a bug in the generator). 4133) IO errors in your own stream callbacks. 4144) Errors that happen in your callback functions. 4155) Exceeding the max_size or bytes_left of a stream. 4166) Exceeding the max_size/max_count of a string or array field 4177) Invalid protocol buffers binary message. 418