1:mod:`struct` --- Interpret bytes as packed binary data 2======================================================= 3 4.. module:: struct 5 :synopsis: Interpret bytes as packed binary data. 6 7**Source code:** :source:`Lib/struct.py` 8 9.. index:: 10 pair: C; structures 11 triple: packing; binary; data 12 13-------------- 14 15This module performs conversions between Python values and C structs represented 16as Python :class:`bytes` objects. This can be used in handling binary data 17stored in files or from network connections, among other sources. It uses 18:ref:`struct-format-strings` as compact descriptions of the layout of the C 19structs and the intended conversion to/from Python values. 20 21.. note:: 22 23 By default, the result of packing a given C struct includes pad bytes in 24 order to maintain proper alignment for the C types involved; similarly, 25 alignment is taken into account when unpacking. This behavior is chosen so 26 that the bytes of a packed struct correspond exactly to the layout in memory 27 of the corresponding C struct. To handle platform-independent data formats 28 or omit implicit pad bytes, use ``standard`` size and alignment instead of 29 ``native`` size and alignment: see :ref:`struct-alignment` for details. 30 31Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer* 32argument. This refers to objects that implement the :ref:`bufferobjects` and 33provide either a readable or read-writable buffer. The most common types used 34for that purpose are :class:`bytes` and :class:`bytearray`, but many other types 35that can be viewed as an array of bytes implement the buffer protocol, so that 36they can be read/filled without additional copying from a :class:`bytes` object. 37 38 39Functions and Exceptions 40------------------------ 41 42The module defines the following exception and functions: 43 44 45.. exception:: error 46 47 Exception raised on various occasions; argument is a string describing what 48 is wrong. 49 50 51.. function:: pack(format, v1, v2, ...) 52 53 Return a bytes object containing the values *v1*, *v2*, ... packed according 54 to the format string *format*. The arguments must match the values required by 55 the format exactly. 56 57 58.. function:: pack_into(format, buffer, offset, v1, v2, ...) 59 60 Pack the values *v1*, *v2*, ... according to the format string *format* and 61 write the packed bytes into the writable buffer *buffer* starting at 62 position *offset*. Note that *offset* is a required argument. 63 64 65.. function:: unpack(format, buffer) 66 67 Unpack from the buffer *buffer* (presumably packed by ``pack(format, ...)``) 68 according to the format string *format*. The result is a tuple even if it 69 contains exactly one item. The buffer's size in bytes must match the 70 size required by the format, as reflected by :func:`calcsize`. 71 72 73.. function:: unpack_from(format, buffer, offset=0) 74 75 Unpack from *buffer* starting at position *offset*, according to the format 76 string *format*. The result is a tuple even if it contains exactly one 77 item. The buffer's size in bytes, minus *offset*, must be at least 78 the size required by the format, as reflected by :func:`calcsize`. 79 80 81.. function:: iter_unpack(format, buffer) 82 83 Iteratively unpack from the buffer *buffer* according to the format 84 string *format*. This function returns an iterator which will read 85 equally-sized chunks from the buffer until all its contents have been 86 consumed. The buffer's size in bytes must be a multiple of the size 87 required by the format, as reflected by :func:`calcsize`. 88 89 Each iteration yields a tuple as specified by the format string. 90 91 .. versionadded:: 3.4 92 93 94.. function:: calcsize(format) 95 96 Return the size of the struct (and hence of the bytes object produced by 97 ``pack(format, ...)``) corresponding to the format string *format*. 98 99 100.. _struct-format-strings: 101 102Format Strings 103-------------- 104 105Format strings are the mechanism used to specify the expected layout when 106packing and unpacking data. They are built up from :ref:`format-characters`, 107which specify the type of data being packed/unpacked. In addition, there are 108special characters for controlling the :ref:`struct-alignment`. 109 110 111.. _struct-alignment: 112 113Byte Order, Size, and Alignment 114^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 115 116By default, C types are represented in the machine's native format and byte 117order, and properly aligned by skipping pad bytes if necessary (according to the 118rules used by the C compiler). 119 120.. index:: 121 single: @ (at); in struct format strings 122 single: = (equals); in struct format strings 123 single: < (less); in struct format strings 124 single: > (greater); in struct format strings 125 single: ! (exclamation); in struct format strings 126 127Alternatively, the first character of the format string can be used to indicate 128the byte order, size and alignment of the packed data, according to the 129following table: 130 131+-----------+------------------------+----------+-----------+ 132| Character | Byte order | Size | Alignment | 133+===========+========================+==========+===========+ 134| ``@`` | native | native | native | 135+-----------+------------------------+----------+-----------+ 136| ``=`` | native | standard | none | 137+-----------+------------------------+----------+-----------+ 138| ``<`` | little-endian | standard | none | 139+-----------+------------------------+----------+-----------+ 140| ``>`` | big-endian | standard | none | 141+-----------+------------------------+----------+-----------+ 142| ``!`` | network (= big-endian) | standard | none | 143+-----------+------------------------+----------+-----------+ 144 145If the first character is not one of these, ``'@'`` is assumed. 146 147Native byte order is big-endian or little-endian, depending on the host 148system. For example, Intel x86 and AMD64 (x86-64) are little-endian; 149Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature 150switchable endianness (bi-endian). Use ``sys.byteorder`` to check the 151endianness of your system. 152 153Native size and alignment are determined using the C compiler's 154``sizeof`` expression. This is always combined with native byte order. 155 156Standard size depends only on the format character; see the table in 157the :ref:`format-characters` section. 158 159Note the difference between ``'@'`` and ``'='``: both use native byte order, but 160the size and alignment of the latter is standardized. 161 162The form ``'!'`` is available for those poor souls who claim they can't remember 163whether network byte order is big-endian or little-endian. 164 165There is no way to indicate non-native byte order (force byte-swapping); use the 166appropriate choice of ``'<'`` or ``'>'``. 167 168Notes: 169 170(1) Padding is only automatically added between successive structure members. 171 No padding is added at the beginning or the end of the encoded struct. 172 173(2) No padding is added when using non-native size and alignment, e.g. 174 with '<', '>', '=', and '!'. 175 176(3) To align the end of a structure to the alignment requirement of a 177 particular type, end the format with the code for that type with a repeat 178 count of zero. See :ref:`struct-examples`. 179 180 181.. _format-characters: 182 183Format Characters 184^^^^^^^^^^^^^^^^^ 185 186Format characters have the following meaning; the conversion between C and 187Python values should be obvious given their types. The 'Standard size' column 188refers to the size of the packed value in bytes when using standard size; that 189is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or 190``'='``. When using native size, the size of the packed value is 191platform-dependent. 192 193+--------+--------------------------+--------------------+----------------+------------+ 194| Format | C Type | Python type | Standard size | Notes | 195+========+==========================+====================+================+============+ 196| ``x`` | pad byte | no value | | | 197+--------+--------------------------+--------------------+----------------+------------+ 198| ``c`` | :c:type:`char` | bytes of length 1 | 1 | | 199+--------+--------------------------+--------------------+----------------+------------+ 200| ``b`` | :c:type:`signed char` | integer | 1 | \(1),\(3) | 201+--------+--------------------------+--------------------+----------------+------------+ 202| ``B`` | :c:type:`unsigned char` | integer | 1 | \(3) | 203+--------+--------------------------+--------------------+----------------+------------+ 204| ``?`` | :c:type:`_Bool` | bool | 1 | \(1) | 205+--------+--------------------------+--------------------+----------------+------------+ 206| ``h`` | :c:type:`short` | integer | 2 | \(3) | 207+--------+--------------------------+--------------------+----------------+------------+ 208| ``H`` | :c:type:`unsigned short` | integer | 2 | \(3) | 209+--------+--------------------------+--------------------+----------------+------------+ 210| ``i`` | :c:type:`int` | integer | 4 | \(3) | 211+--------+--------------------------+--------------------+----------------+------------+ 212| ``I`` | :c:type:`unsigned int` | integer | 4 | \(3) | 213+--------+--------------------------+--------------------+----------------+------------+ 214| ``l`` | :c:type:`long` | integer | 4 | \(3) | 215+--------+--------------------------+--------------------+----------------+------------+ 216| ``L`` | :c:type:`unsigned long` | integer | 4 | \(3) | 217+--------+--------------------------+--------------------+----------------+------------+ 218| ``q`` | :c:type:`long long` | integer | 8 | \(2), \(3) | 219+--------+--------------------------+--------------------+----------------+------------+ 220| ``Q`` | :c:type:`unsigned long | integer | 8 | \(2), \(3) | 221| | long` | | | | 222+--------+--------------------------+--------------------+----------------+------------+ 223| ``n`` | :c:type:`ssize_t` | integer | | \(4) | 224+--------+--------------------------+--------------------+----------------+------------+ 225| ``N`` | :c:type:`size_t` | integer | | \(4) | 226+--------+--------------------------+--------------------+----------------+------------+ 227| ``e`` | \(7) | float | 2 | \(5) | 228+--------+--------------------------+--------------------+----------------+------------+ 229| ``f`` | :c:type:`float` | float | 4 | \(5) | 230+--------+--------------------------+--------------------+----------------+------------+ 231| ``d`` | :c:type:`double` | float | 8 | \(5) | 232+--------+--------------------------+--------------------+----------------+------------+ 233| ``s`` | :c:type:`char[]` | bytes | | | 234+--------+--------------------------+--------------------+----------------+------------+ 235| ``p`` | :c:type:`char[]` | bytes | | | 236+--------+--------------------------+--------------------+----------------+------------+ 237| ``P`` | :c:type:`void \*` | integer | | \(6) | 238+--------+--------------------------+--------------------+----------------+------------+ 239 240.. versionchanged:: 3.3 241 Added support for the ``'n'`` and ``'N'`` formats. 242 243.. versionchanged:: 3.6 244 Added support for the ``'e'`` format. 245 246 247Notes: 248 249(1) 250 .. index:: single: ? (question mark); in struct format strings 251 252 The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by 253 C99. If this type is not available, it is simulated using a :c:type:`char`. In 254 standard mode, it is always represented by one byte. 255 256(2) 257 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if 258 the platform C compiler supports C :c:type:`long long`, or, on Windows, 259 :c:type:`__int64`. They are always available in standard modes. 260 261(3) 262 When attempting to pack a non-integer using any of the integer conversion 263 codes, if the non-integer has a :meth:`__index__` method then that method is 264 called to convert the argument to an integer before packing. 265 266 .. versionchanged:: 3.2 267 Use of the :meth:`__index__` method for non-integers is new in 3.2. 268 269(4) 270 The ``'n'`` and ``'N'`` conversion codes are only available for the native 271 size (selected as the default or with the ``'@'`` byte order character). 272 For the standard size, you can use whichever of the other integer formats 273 fits your application. 274 275(5) 276 For the ``'f'``, ``'d'`` and ``'e'`` conversion codes, the packed 277 representation uses the IEEE 754 binary32, binary64 or binary16 format (for 278 ``'f'``, ``'d'`` or ``'e'`` respectively), regardless of the floating-point 279 format used by the platform. 280 281(6) 282 The ``'P'`` format character is only available for the native byte ordering 283 (selected as the default or with the ``'@'`` byte order character). The byte 284 order character ``'='`` chooses to use little- or big-endian ordering based 285 on the host system. The struct module does not interpret this as native 286 ordering, so the ``'P'`` format is not available. 287 288(7) 289 The IEEE 754 binary16 "half precision" type was introduced in the 2008 290 revision of the `IEEE 754 standard <ieee 754 standard_>`_. It has a sign 291 bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), 292 and can represent numbers between approximately ``6.1e-05`` and ``6.5e+04`` 293 at full precision. This type is not widely supported by C compilers: on a 294 typical machine, an unsigned short can be used for storage, but not for math 295 operations. See the Wikipedia page on the `half-precision floating-point 296 format <half precision format_>`_ for more information. 297 298 299A format character may be preceded by an integral repeat count. For example, 300the format string ``'4h'`` means exactly the same as ``'hhhh'``. 301 302Whitespace characters between formats are ignored; a count and its format must 303not contain whitespace though. 304 305For the ``'s'`` format character, the count is interpreted as the length of the 306bytes, not a repeat count like for the other format characters; for example, 307``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. 308If a count is not given, it defaults to 1. For packing, the string is 309truncated or padded with null bytes as appropriate to make it fit. For 310unpacking, the resulting bytes object always has exactly the specified number 311of bytes. As a special case, ``'0s'`` means a single, empty string (while 312``'0c'`` means 0 characters). 313 314When packing a value ``x`` using one of the integer formats (``'b'``, 315``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``, 316``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format 317then :exc:`struct.error` is raised. 318 319.. versionchanged:: 3.1 320 In 3.0, some of the integer formats wrapped out-of-range values and 321 raised :exc:`DeprecationWarning` instead of :exc:`struct.error`. 322 323The ``'p'`` format character encodes a "Pascal string", meaning a short 324variable-length string stored in a *fixed number of bytes*, given by the count. 325The first byte stored is the length of the string, or 255, whichever is 326smaller. The bytes of the string follow. If the string passed in to 327:func:`pack` is too long (longer than the count minus 1), only the leading 328``count-1`` bytes of the string are stored. If the string is shorter than 329``count-1``, it is padded with null bytes so that exactly count bytes in all 330are used. Note that for :func:`unpack`, the ``'p'`` format character consumes 331``count`` bytes, but that the string returned can never contain more than 255 332bytes. 333 334.. index:: single: ? (question mark); in struct format strings 335 336For the ``'?'`` format character, the return value is either :const:`True` or 337:const:`False`. When packing, the truth value of the argument object is used. 338Either 0 or 1 in the native or standard bool representation will be packed, and 339any non-zero value will be ``True`` when unpacking. 340 341 342 343.. _struct-examples: 344 345Examples 346^^^^^^^^ 347 348.. note:: 349 All examples assume a native byte order, size, and alignment with a 350 big-endian machine. 351 352A basic example of packing/unpacking three integers:: 353 354 >>> from struct import * 355 >>> pack('hhl', 1, 2, 3) 356 b'\x00\x01\x00\x02\x00\x00\x00\x03' 357 >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03') 358 (1, 2, 3) 359 >>> calcsize('hhl') 360 8 361 362Unpacked fields can be named by assigning them to variables or by wrapping 363the result in a named tuple:: 364 365 >>> record = b'raymond \x32\x12\x08\x01\x08' 366 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) 367 368 >>> from collections import namedtuple 369 >>> Student = namedtuple('Student', 'name serialnum school gradelevel') 370 >>> Student._make(unpack('<10sHHb', record)) 371 Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) 372 373The ordering of format characters may have an impact on size since the padding 374needed to satisfy alignment requirements is different:: 375 376 >>> pack('ci', b'*', 0x12131415) 377 b'*\x00\x00\x00\x12\x13\x14\x15' 378 >>> pack('ic', 0x12131415, b'*') 379 b'\x12\x13\x14\x15*' 380 >>> calcsize('ci') 381 8 382 >>> calcsize('ic') 383 5 384 385The following format ``'llh0l'`` specifies two pad bytes at the end, assuming 386longs are aligned on 4-byte boundaries:: 387 388 >>> pack('llh0l', 1, 2, 3) 389 b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00' 390 391This only works when native size and alignment are in effect; standard size and 392alignment does not enforce any alignment. 393 394 395.. seealso:: 396 397 Module :mod:`array` 398 Packed binary storage of homogeneous data. 399 400 Module :mod:`xdrlib` 401 Packing and unpacking of XDR data. 402 403 404.. _struct-objects: 405 406Classes 407------- 408 409The :mod:`struct` module also defines the following type: 410 411 412.. class:: Struct(format) 413 414 Return a new Struct object which writes and reads binary data according to 415 the format string *format*. Creating a Struct object once and calling its 416 methods is more efficient than calling the :mod:`struct` functions with the 417 same format since the format string only needs to be compiled once. 418 419 .. note:: 420 421 The compiled versions of the most recent format strings passed to 422 :class:`Struct` and the module-level functions are cached, so programs 423 that use only a few format strings needn't worry about reusing a single 424 :class:`Struct` instance. 425 426 Compiled Struct objects support the following methods and attributes: 427 428 .. method:: pack(v1, v2, ...) 429 430 Identical to the :func:`pack` function, using the compiled format. 431 (``len(result)`` will equal :attr:`size`.) 432 433 434 .. method:: pack_into(buffer, offset, v1, v2, ...) 435 436 Identical to the :func:`pack_into` function, using the compiled format. 437 438 439 .. method:: unpack(buffer) 440 441 Identical to the :func:`unpack` function, using the compiled format. 442 The buffer's size in bytes must equal :attr:`size`. 443 444 445 .. method:: unpack_from(buffer, offset=0) 446 447 Identical to the :func:`unpack_from` function, using the compiled format. 448 The buffer's size in bytes, minus *offset*, must be at least 449 :attr:`size`. 450 451 452 .. method:: iter_unpack(buffer) 453 454 Identical to the :func:`iter_unpack` function, using the compiled format. 455 The buffer's size in bytes must be a multiple of :attr:`size`. 456 457 .. versionadded:: 3.4 458 459 .. attribute:: format 460 461 The format string used to construct this Struct object. 462 463 .. versionchanged:: 3.7 464 The format string type is now :class:`str` instead of :class:`bytes`. 465 466 .. attribute:: size 467 468 The calculated size of the struct (and hence of the bytes object produced 469 by the :meth:`pack` method) corresponding to :attr:`format`. 470 471 472.. _half precision format: https://en.wikipedia.org/wiki/Half-precision_floating-point_format 473 474.. _ieee 754 standard: https://en.wikipedia.org/wiki/IEEE_floating_point#IEEE_754-2008 475