1:mod:`pickle` --- Python object serialization
2=============================================
3
4.. index::
5   single: persistence
6   pair: persistent; objects
7   pair: serializing; objects
8   pair: marshalling; objects
9   pair: flattening; objects
10   pair: pickling; objects
11
12.. module:: pickle
13   :synopsis: Convert Python objects to streams of bytes and back.
14.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
15.. sectionauthor:: Barry Warsaw <barry@zope.com>
16
17The :mod:`pickle` module implements a fundamental, but powerful algorithm for
18serializing and de-serializing a Python object structure.  "Pickling" is the
19process whereby a Python object hierarchy is converted into a byte stream, and
20"unpickling" is the inverse operation, whereby a byte stream is converted back
21into an object hierarchy.  Pickling (and unpickling) is alternatively known as
22"serialization", "marshalling," [#]_ or "flattening", however, to avoid
23confusion, the terms used here are "pickling" and "unpickling".
24
25This documentation describes both the :mod:`pickle` module and the
26:mod:`cPickle` module.
27
28.. warning::
29
30   The :mod:`pickle` module is not secure against erroneous or maliciously
31   constructed data.  Never unpickle data received from an untrusted or
32   unauthenticated source.
33
34
35Relationship to other Python modules
36------------------------------------
37
38The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
39module.  As its name implies, :mod:`cPickle` is written in C, so it can be up to
401000 times faster than :mod:`pickle`.  However it does not support subclassing
41of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
42these are functions, not classes.  Most applications have no need for this
43functionality, and can benefit from the improved performance of :mod:`cPickle`.
44Other than that, the interfaces of the two modules are nearly identical; the
45common interface is described in this manual and differences are pointed out
46where necessary.  In the following discussions, we use the term "pickle" to
47collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
48
49The data streams the two modules produce are guaranteed to be interchangeable.
50
51Python has a more primitive serialization module called :mod:`marshal`, but in
52general :mod:`pickle` should always be the preferred way to serialize Python
53objects.  :mod:`marshal` exists primarily to support Python's :file:`.pyc`
54files.
55
56The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
57
58* The :mod:`pickle` module keeps track of the objects it has already serialized,
59  so that later references to the same object won't be serialized again.
60  :mod:`marshal` doesn't do this.
61
62  This has implications both for recursive objects and object sharing.  Recursive
63  objects are objects that contain references to themselves.  These are not
64  handled by marshal, and in fact, attempting to marshal recursive objects will
65  crash your Python interpreter.  Object sharing happens when there are multiple
66  references to the same object in different places in the object hierarchy being
67  serialized.  :mod:`pickle` stores such objects only once, and ensures that all
68  other references point to the master copy.  Shared objects remain shared, which
69  can be very important for mutable objects.
70
71* :mod:`marshal` cannot be used to serialize user-defined classes and their
72  instances.  :mod:`pickle` can save and restore class instances transparently,
73  however the class definition must be importable and live in the same module as
74  when the object was stored.
75
76* The :mod:`marshal` serialization format is not guaranteed to be portable
77  across Python versions.  Because its primary job in life is to support
78  :file:`.pyc` files, the Python implementers reserve the right to change the
79  serialization format in non-backwards compatible ways should the need arise.
80  The :mod:`pickle` serialization format is guaranteed to be backwards compatible
81  across Python releases.
82
83Note that serialization is a more primitive notion than persistence; although
84:mod:`pickle` reads and writes file objects, it does not handle the issue of
85naming persistent objects, nor the (even more complicated) issue of concurrent
86access to persistent objects.  The :mod:`pickle` module can transform a complex
87object into a byte stream and it can transform the byte stream into an object
88with the same internal structure.  Perhaps the most obvious thing to do with
89these byte streams is to write them onto a file, but it is also conceivable to
90send them across a network or store them in a database.  The module
91:mod:`shelve` provides a simple interface to pickle and unpickle objects on
92DBM-style database files.
93
94
95Data stream format
96------------------
97
98.. index::
99   single: XDR
100   single: External Data Representation
101
102The data format used by :mod:`pickle` is Python-specific.  This has the
103advantage that there are no restrictions imposed by external standards such as
104XDR (which can't represent pointer sharing); however it means that non-Python
105programs may not be able to reconstruct pickled Python objects.
106
107By default, the :mod:`pickle` data format uses a printable ASCII representation.
108This is slightly more voluminous than a binary representation.  The big
109advantage of using printable ASCII (and of some other characteristics of
110:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
111possible for a human to read the pickled file with a standard text editor.
112
113There are currently 3 different protocols which can be used for pickling.
114
115* Protocol version 0 is the original ASCII protocol and is backwards compatible
116  with earlier versions of Python.
117
118* Protocol version 1 is the old binary format which is also compatible with
119  earlier versions of Python.
120
121* Protocol version 2 was introduced in Python 2.3.  It provides much more
122  efficient pickling of :term:`new-style class`\es.
123
124Refer to :pep:`307` for more information.
125
126If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
127as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
128available will be used.
129
130.. versionchanged:: 2.3
131   Introduced the *protocol* parameter.
132
133A binary format, which is slightly more efficient, can be chosen by specifying a
134*protocol* version >= 1.
135
136
137Usage
138-----
139
140To serialize an object hierarchy, you first create a pickler, then you call the
141pickler's :meth:`dump` method.  To de-serialize a data stream, you first create
142an unpickler, then you call the unpickler's :meth:`load` method.  The
143:mod:`pickle` module provides the following constant:
144
145
146.. data:: HIGHEST_PROTOCOL
147
148   The highest protocol version available.  This value can be passed as a
149   *protocol* value.
150
151   .. versionadded:: 2.3
152
153.. note::
154
155   Be sure to always open pickle files created with protocols >= 1 in binary mode.
156   For the old ASCII-based pickle protocol 0 you can use either text mode or binary
157   mode as long as you stay consistent.
158
159   A pickle file written with protocol 0 in binary mode will contain lone linefeeds
160   as line terminators and therefore will look "funny" when viewed in Notepad or
161   other editors which do not support this format.
162
163The :mod:`pickle` module provides the following functions to make the pickling
164process more convenient:
165
166
167.. function:: dump(obj, file[, protocol])
168
169   Write a pickled representation of *obj* to the open file object *file*.  This is
170   equivalent to ``Pickler(file, protocol).dump(obj)``.
171
172   If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
173   specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
174   version will be used.
175
176   .. versionchanged:: 2.3
177      Introduced the *protocol* parameter.
178
179   *file* must have a :meth:`write` method that accepts a single string argument.
180   It can thus be a file object opened for writing, a :mod:`StringIO` object, or
181   any other custom object that meets this interface.
182
183
184.. function:: load(file)
185
186   Read a string from the open file object *file* and interpret it as a pickle data
187   stream, reconstructing and returning the original object hierarchy.  This is
188   equivalent to ``Unpickler(file).load()``.
189
190   *file* must have two methods, a :meth:`read` method that takes an integer
191   argument, and a :meth:`readline` method that requires no arguments.  Both
192   methods should return a string.  Thus *file* can be a file object opened for
193   reading, a :mod:`StringIO` object, or any other custom object that meets this
194   interface.
195
196   This function automatically determines whether the data stream was written in
197   binary mode or not.
198
199
200.. function:: dumps(obj[, protocol])
201
202   Return the pickled representation of the object as a string, instead of writing
203   it to a file.
204
205   If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
206   specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
207   version will be used.
208
209   .. versionchanged:: 2.3
210      The *protocol* parameter was added.
211
212
213.. function:: loads(string)
214
215   Read a pickled object hierarchy from a string.  Characters in the string past
216   the pickled object's representation are ignored.
217
218The :mod:`pickle` module also defines three exceptions:
219
220
221.. exception:: PickleError
222
223   A common base class for the other exceptions defined below.  This inherits from
224   :exc:`Exception`.
225
226
227.. exception:: PicklingError
228
229   This exception is raised when an unpicklable object is passed to the
230   :meth:`dump` method.
231
232
233.. exception:: UnpicklingError
234
235   This exception is raised when there is a problem unpickling an object. Note that
236   other exceptions may also be raised during unpickling, including (but not
237   necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
238   :exc:`ImportError`, and :exc:`IndexError`.
239
240The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
241:class:`Unpickler`:
242
243
244.. class:: Pickler(file[, protocol])
245
246   This takes a file-like object to which it will write a pickle data stream.
247
248   If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
249   specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
250   protocol version will be used.
251
252   .. versionchanged:: 2.3
253      Introduced the *protocol* parameter.
254
255   *file* must have a :meth:`write` method that accepts a single string argument.
256   It can thus be an open file object, a :mod:`StringIO` object, or any other
257   custom object that meets this interface.
258
259   :class:`Pickler` objects define one (or two) public methods:
260
261
262   .. method:: dump(obj)
263
264      Write a pickled representation of *obj* to the open file object given in the
265      constructor.  Either the binary or ASCII format will be used, depending on the
266      value of the *protocol* argument passed to the constructor.
267
268
269   .. method:: clear_memo()
270
271      Clears the pickler's "memo".  The memo is the data structure that remembers
272      which objects the pickler has already seen, so that shared or recursive objects
273      pickled by reference and not by value.  This method is useful when re-using
274      picklers.
275
276      .. note::
277
278         Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
279         created by :mod:`cPickle`.  In the :mod:`pickle` module, picklers have an
280         instance variable called :attr:`memo` which is a Python dictionary.  So to clear
281         the memo for a :mod:`pickle` module pickler, you could do the following::
282
283            mypickler.memo.clear()
284
285         Code that does not need to support older versions of Python should simply use
286         :meth:`clear_memo`.
287
288It is possible to make multiple calls to the :meth:`dump` method of the same
289:class:`Pickler` instance.  These must then be matched to the same number of
290calls to the :meth:`load` method of the corresponding :class:`Unpickler`
291instance.  If the same object is pickled by multiple :meth:`dump` calls, the
292:meth:`load` will all yield references to the same object. [#]_
293
294:class:`Unpickler` objects are defined as:
295
296
297.. class:: Unpickler(file)
298
299   This takes a file-like object from which it will read a pickle data stream.
300   This class automatically determines whether the data stream was written in
301   binary mode or not, so it does not need a flag as in the :class:`Pickler`
302   factory.
303
304   *file* must have two methods, a :meth:`read` method that takes an integer
305   argument, and a :meth:`readline` method that requires no arguments.  Both
306   methods should return a string.  Thus *file* can be a file object opened for
307   reading, a :mod:`StringIO` object, or any other custom object that meets this
308   interface.
309
310   :class:`Unpickler` objects have one (or two) public methods:
311
312
313   .. method:: load()
314
315      Read a pickled object representation from the open file object given in
316      the constructor, and return the reconstituted object hierarchy specified
317      therein.
318
319      This method automatically determines whether the data stream was written
320      in binary mode or not.
321
322
323   .. method:: noload()
324
325      This is just like :meth:`load` except that it doesn't actually create any
326      objects.  This is useful primarily for finding what's called "persistent
327      ids" that may be referenced in a pickle data stream.  See section
328      :ref:`pickle-protocol` below for more details.
329
330      **Note:** the :meth:`noload` method is currently only available on
331      :class:`Unpickler` objects created with the :mod:`cPickle` module.
332      :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
333      method.
334
335
336What can be pickled and unpickled?
337----------------------------------
338
339The following types can be pickled:
340
341* ``None``, ``True``, and ``False``
342
343* integers, long integers, floating point numbers, complex numbers
344
345* normal and Unicode strings
346
347* tuples, lists, sets, and dictionaries containing only picklable objects
348
349* functions defined at the top level of a module
350
351* built-in functions defined at the top level of a module
352
353* classes that are defined at the top level of a module
354
355* instances of such classes whose :attr:`~object.__dict__` or the result of
356  calling :meth:`__getstate__` is picklable  (see section :ref:`pickle-protocol`
357  for details).
358
359Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
360exception; when this happens, an unspecified number of bytes may have already
361been written to the underlying file. Trying to pickle a highly recursive data
362structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
363raised in this case. You can carefully raise this limit with
364:func:`sys.setrecursionlimit`.
365
366Note that functions (built-in and user-defined) are pickled by "fully qualified"
367name reference, not by value.  This means that only the function name is
368pickled, along with the name of the module the function is defined in.  Neither
369the function's code, nor any of its function attributes are pickled.  Thus the
370defining module must be importable in the unpickling environment, and the module
371must contain the named object, otherwise an exception will be raised. [#]_
372
373Similarly, classes are pickled by named reference, so the same restrictions in
374the unpickling environment apply.  Note that none of the class's code or data is
375pickled, so in the following example the class attribute ``attr`` is not
376restored in the unpickling environment::
377
378   class Foo:
379       attr = 'a class attr'
380
381   picklestring = pickle.dumps(Foo)
382
383These restrictions are why picklable functions and classes must be defined in
384the top level of a module.
385
386Similarly, when class instances are pickled, their class's code and data are not
387pickled along with them.  Only the instance data are pickled.  This is done on
388purpose, so you can fix bugs in a class or add methods to the class and still
389load objects that were created with an earlier version of the class.  If you
390plan to have long-lived objects that will see many versions of a class, it may
391be worthwhile to put a version number in the objects so that suitable
392conversions can be made by the class's :meth:`__setstate__` method.
393
394
395.. _pickle-protocol:
396
397The pickle protocol
398-------------------
399
400.. currentmodule:: None
401
402This section describes the "pickling protocol" that defines the interface
403between the pickler/unpickler and the objects that are being serialized.  This
404protocol provides a standard way for you to define, customize, and control how
405your objects are serialized and de-serialized.  The description in this section
406doesn't cover specific customizations that you can employ to make the unpickling
407environment slightly safer from untrusted pickle data streams; see section
408:ref:`pickle-sub` for more details.
409
410
411.. _pickle-inst:
412
413Pickling and unpickling normal class instances
414^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
415
416.. method:: object.__getinitargs__()
417
418   When a pickled class instance is unpickled, its :meth:`__init__` method is
419   normally *not* invoked.  If it is desirable that the :meth:`__init__` method
420   be called on unpickling, an old-style class can define a method
421   :meth:`__getinitargs__`, which should return a *tuple* of positional
422   arguments to be passed to the class constructor (:meth:`__init__` for
423   example).  Keyword arguments are not supported.  The :meth:`__getinitargs__`
424   method is called at pickle time; the tuple it returns is incorporated in the
425   pickle for the instance.
426
427.. method:: object.__getnewargs__()
428
429   New-style types can provide a :meth:`__getnewargs__` method that is used for
430   protocol 2.  Implementing this method is needed if the type establishes some
431   internal invariants when the instance is created, or if the memory allocation
432   is affected by the values passed to the :meth:`__new__` method for the type
433   (as it is for tuples and strings).  Instances of a :term:`new-style class`
434   ``C`` are created using ::
435
436      obj = C.__new__(C, *args)
437
438   where *args* is the result of calling :meth:`__getnewargs__` on the original
439   object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
440
441.. method:: object.__getstate__()
442
443   Classes can further influence how their instances are pickled; if the class
444   defines the method :meth:`__getstate__`, it is called and the return state is
445   pickled as the contents for the instance, instead of the contents of the
446   instance's dictionary.  If there is no :meth:`__getstate__` method, the
447   instance's :attr:`~object.__dict__` is pickled.
448
449.. method:: object.__setstate__(state)
450
451   Upon unpickling, if the class also defines the method :meth:`__setstate__`,
452   it is called with the unpickled state. [#]_ If there is no
453   :meth:`__setstate__` method, the pickled state must be a dictionary and its
454   items are assigned to the new instance's dictionary.  If a class defines both
455   :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
456   dictionary and these methods can do what they want. [#]_
457
458   .. note::
459
460      For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
461      value, the :meth:`__setstate__` method will not be called.
462
463.. note::
464
465   At unpickling time, some methods like :meth:`__getattr__`,
466   :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
467   instance.  In case those methods rely on some internal invariant being
468   true, the type should implement either :meth:`__getinitargs__` or
469   :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
470   :meth:`__new__` nor :meth:`__init__` will be called.
471
472
473Pickling and unpickling extension types
474^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
475
476.. method:: object.__reduce__()
477
478   When the :class:`Pickler` encounters an object of a type it knows nothing
479   about --- such as an extension type --- it looks in two places for a hint of
480   how to pickle it.  One alternative is for the object to implement a
481   :meth:`__reduce__` method.  If provided, at pickling time :meth:`__reduce__`
482   will be called with no arguments, and it must return either a string or a
483   tuple.
484
485   If a string is returned, it names a global variable whose contents are
486   pickled as normal.  The string returned by :meth:`__reduce__` should be the
487   object's local name relative to its module; the pickle module searches the
488   module namespace to determine the object's module.
489
490   When a tuple is returned, it must be between two and five elements long.
491   Optional elements can either be omitted, or ``None`` can be provided as their
492   value.  The contents of this tuple are pickled as normal and used to
493   reconstruct the object at unpickling time.  The semantics of each element
494   are:
495
496   * A callable object that will be called to create the initial version of the
497     object.  The next element of the tuple will provide arguments for this
498     callable, and later elements provide additional state information that will
499     subsequently be used to fully reconstruct the pickled data.
500
501     In the unpickling environment this object must be either a class, a
502     callable registered as a "safe constructor" (see below), or it must have an
503     attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
504     :exc:`UnpicklingError` will be raised in the unpickling environment.  Note
505     that as usual, the callable itself is pickled by name.
506
507   * A tuple of arguments for the callable object.
508
509     .. versionchanged:: 2.5
510        Formerly, this argument could also be ``None``.
511
512   * Optionally, the object's state, which will be passed to the object's
513     :meth:`__setstate__` method as described in section :ref:`pickle-inst`.  If
514     the object has no :meth:`__setstate__` method, then, as above, the value
515     must be a dictionary and it will be added to the object's
516     :attr:`~object.__dict__`.
517
518   * Optionally, an iterator (and not a sequence) yielding successive list
519     items.  These list items will be pickled, and appended to the object using
520     either ``obj.append(item)`` or ``obj.extend(list_of_items)``.  This is
521     primarily used for list subclasses, but may be used by other classes as
522     long as they have :meth:`append` and :meth:`extend` methods with the
523     appropriate signature.  (Whether :meth:`append` or :meth:`extend` is used
524     depends on which pickle protocol version is used as well as the number of
525     items to append, so both must be supported.)
526
527   * Optionally, an iterator (not a sequence) yielding successive dictionary
528     items, which should be tuples of the form ``(key, value)``.  These items
529     will be pickled and stored to the object using ``obj[key] = value``. This
530     is primarily used for dictionary subclasses, but may be used by other
531     classes as long as they implement :meth:`__setitem__`.
532
533.. method:: object.__reduce_ex__(protocol)
534
535   It is sometimes useful to know the protocol version when implementing
536   :meth:`__reduce__`.  This can be done by implementing a method named
537   :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
538   when it exists, is called in preference over :meth:`__reduce__` (you may
539   still provide :meth:`__reduce__` for backwards compatibility).  The
540   :meth:`__reduce_ex__` method will be called with a single integer argument,
541   the protocol version.
542
543   The :class:`object` class implements both :meth:`__reduce__` and
544   :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
545   but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
546   detects this and calls :meth:`__reduce__`.
547
548An alternative to implementing a :meth:`__reduce__` method on the object to be
549pickled, is to register the callable with the :mod:`copy_reg` module.  This
550module provides a way for programs to register "reduction functions" and
551constructors for user-defined types.   Reduction functions have the same
552semantics and interface as the :meth:`__reduce__` method described above, except
553that they are called with a single argument, the object to be pickled.
554
555The registered constructor is deemed a "safe constructor" for purposes of
556unpickling as described above.
557
558
559Pickling and unpickling external objects
560^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
561
562.. index::
563   single: persistent_id (pickle protocol)
564   single: persistent_load (pickle protocol)
565
566For the benefit of object persistence, the :mod:`pickle` module supports the
567notion of a reference to an object outside the pickled data stream.  Such
568objects are referenced by a "persistent id", which is just an arbitrary string
569of printable ASCII characters. The resolution of such names is not defined by
570the :mod:`pickle` module; it will delegate this resolution to user defined
571functions on the pickler and unpickler. [#]_
572
573To define external persistent id resolution, you need to set the
574:attr:`~Pickler.persistent_id` attribute of the pickler object and the
575:attr:`~Unpickler.persistent_load` attribute of the unpickler object.
576
577To pickle objects that have an external persistent id, the pickler must have a
578custom :func:`~Pickler.persistent_id` method that takes an object as an
579argument and returns either ``None`` or the persistent id for that object.
580When ``None`` is returned, the pickler simply pickles the object as normal.
581When a persistent id string is returned, the pickler will pickle that string,
582along with a marker so that the unpickler will recognize the string as a
583persistent id.
584
585To unpickle external objects, the unpickler must have a custom
586:func:`~Unpickler.persistent_load` function that takes a persistent id string
587and returns the referenced object.
588
589Here's a silly example that *might* shed more light::
590
591   import pickle
592   from cStringIO import StringIO
593
594   src = StringIO()
595   p = pickle.Pickler(src)
596
597   def persistent_id(obj):
598       if hasattr(obj, 'x'):
599           return 'the value %d' % obj.x
600       else:
601           return None
602
603   p.persistent_id = persistent_id
604
605   class Integer:
606       def __init__(self, x):
607           self.x = x
608       def __str__(self):
609           return 'My name is integer %d' % self.x
610
611   i = Integer(7)
612   print i
613   p.dump(i)
614
615   datastream = src.getvalue()
616   print repr(datastream)
617   dst = StringIO(datastream)
618
619   up = pickle.Unpickler(dst)
620
621   class FancyInteger(Integer):
622       def __str__(self):
623           return 'I am the integer %d' % self.x
624
625   def persistent_load(persid):
626       if persid.startswith('the value '):
627           value = int(persid.split()[2])
628           return FancyInteger(value)
629       else:
630           raise pickle.UnpicklingError, 'Invalid persistent id'
631
632   up.persistent_load = persistent_load
633
634   j = up.load()
635   print j
636
637In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load`
638attribute can also be set to a Python list, in which case, when the unpickler
639reaches a persistent id, the persistent id string will simply be appended to
640this list.  This functionality exists so that a pickle data stream can be
641"sniffed" for object references without actually instantiating all the objects
642in a pickle.
643[#]_  Setting :attr:`~Unpickler.persistent_load` to a list is usually used in
644conjunction with the :meth:`~Unpickler.noload` method on the Unpickler.
645
646.. BAW: Both pickle and cPickle support something called inst_persistent_id()
647   which appears to give unknown types a second shot at producing a persistent
648   id.  Since Jim Fulton can't remember why it was added or what it's for, I'm
649   leaving it undocumented.
650
651
652.. _pickle-sub:
653
654Subclassing Unpicklers
655----------------------
656
657.. index::
658   single: load_global() (pickle protocol)
659   single: find_global() (pickle protocol)
660
661By default, unpickling will import any class that it finds in the pickle data.
662You can control exactly what gets unpickled and what gets called by customizing
663your unpickler.  Unfortunately, exactly how you do this is different depending
664on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
665
666In the :mod:`pickle` module, you need to derive a subclass from
667:class:`Unpickler`, overriding the :meth:`load_global` method.
668:meth:`load_global` should read two lines from the pickle data stream where the
669first line will the name of the module containing the class and the second line
670will be the name of the instance's class.  It then looks up the class, possibly
671importing the module and digging out the attribute, then it appends what it
672finds to the unpickler's stack.  Later on, this class will be assigned to the
673:attr:`__class__` attribute of an empty class, as a way of magically creating an
674instance without calling its class's :meth:`__init__`. Your job (should you
675choose to accept it), would be to have :meth:`load_global` push onto the
676unpickler's stack, a known safe version of any class you deem safe to unpickle.
677It is up to you to produce such a class.  Or you could raise an error if you
678want to disallow all unpickling of instances.  If this sounds like a hack,
679you're right.  Refer to the source code to make this work.
680
681Things are a little cleaner with :mod:`cPickle`, but not by much. To control
682what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global`
683attribute to a function or ``None``.  If it is ``None`` then any attempts to
684unpickle instances will raise an :exc:`UnpicklingError`.  If it is a function,
685then it should accept a module name and a class name, and return the
686corresponding class object.  It is responsible for looking up the class and
687performing any necessary imports, and it may raise an error to prevent
688instances of the class from being unpickled.
689
690The moral of the story is that you should be really careful about the source of
691the strings your application unpickles.
692
693
694.. _pickle-example:
695
696Example
697-------
698
699For the simplest code, use the :func:`dump` and :func:`load` functions.  Note
700that a self-referencing list is pickled and restored correctly. ::
701
702   import pickle
703
704   data1 = {'a': [1, 2.0, 3, 4+6j],
705            'b': ('string', u'Unicode string'),
706            'c': None}
707
708   selfref_list = [1, 2, 3]
709   selfref_list.append(selfref_list)
710
711   output = open('data.pkl', 'wb')
712
713   # Pickle dictionary using protocol 0.
714   pickle.dump(data1, output)
715
716   # Pickle the list using the highest protocol available.
717   pickle.dump(selfref_list, output, -1)
718
719   output.close()
720
721The following example reads the resulting pickled data.  When reading a
722pickle-containing file, you should open the file in binary mode because you
723can't be sure if the ASCII or binary format was used. ::
724
725   import pprint, pickle
726
727   pkl_file = open('data.pkl', 'rb')
728
729   data1 = pickle.load(pkl_file)
730   pprint.pprint(data1)
731
732   data2 = pickle.load(pkl_file)
733   pprint.pprint(data2)
734
735   pkl_file.close()
736
737Here's a larger example that shows how to modify pickling behavior for a class.
738The :class:`TextReader` class opens a text file, and returns the line number and
739line contents each time its :meth:`!readline` method is called. If a
740:class:`TextReader` instance is pickled, all attributes *except* the file object
741member are saved. When the instance is unpickled, the file is reopened, and
742reading resumes from the last location. The :meth:`__setstate__` and
743:meth:`__getstate__` methods are used to implement this behavior. ::
744
745   #!/usr/local/bin/python
746
747   class TextReader:
748       """Print and number lines in a text file."""
749       def __init__(self, file):
750           self.file = file
751           self.fh = open(file)
752           self.lineno = 0
753
754       def readline(self):
755           self.lineno = self.lineno + 1
756           line = self.fh.readline()
757           if not line:
758               return None
759           if line.endswith("\n"):
760               line = line[:-1]
761           return "%d: %s" % (self.lineno, line)
762
763       def __getstate__(self):
764           odict = self.__dict__.copy() # copy the dict since we change it
765           del odict['fh']              # remove filehandle entry
766           return odict
767
768       def __setstate__(self, dict):
769           fh = open(dict['file'])      # reopen file
770           count = dict['lineno']       # read from file...
771           while count:                 # until line count is restored
772               fh.readline()
773               count = count - 1
774           self.__dict__.update(dict)   # update attributes
775           self.fh = fh                 # save the file object
776
777A sample usage might be something like this::
778
779   >>> import TextReader
780   >>> obj = TextReader.TextReader("TextReader.py")
781   >>> obj.readline()
782   '1: #!/usr/local/bin/python'
783   >>> obj.readline()
784   '2: '
785   >>> obj.readline()
786   '3: class TextReader:'
787   >>> import pickle
788   >>> pickle.dump(obj, open('save.p', 'wb'))
789
790If you want to see that :mod:`pickle` works across Python processes, start
791another Python session, before continuing.  What follows can happen from either
792the same process or a new process. ::
793
794   >>> import pickle
795   >>> reader = pickle.load(open('save.p', 'rb'))
796   >>> reader.readline()
797   '4:     """Print and number lines in a text file."""'
798
799
800.. seealso::
801
802   Module :mod:`copy_reg`
803      Pickle interface constructor registration for extension types.
804
805   Module :mod:`shelve`
806      Indexed databases of objects; uses :mod:`pickle`.
807
808   Module :mod:`copy`
809      Shallow and deep object copying.
810
811   Module :mod:`marshal`
812      High-performance serialization of built-in types.
813
814
815:mod:`cPickle` --- A faster :mod:`pickle`
816=========================================
817
818.. module:: cPickle
819   :synopsis: Faster version of pickle, but not subclassable.
820.. moduleauthor:: Jim Fulton <jim@zope.com>
821.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
822
823
824.. index:: module: pickle
825
826The :mod:`cPickle` module supports serialization and de-serialization of Python
827objects, providing an interface and functionality nearly identical to the
828:mod:`pickle` module.  There are several differences, the most important being
829performance and subclassability.
830
831First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
832the former is implemented in C.  Second, in the :mod:`cPickle` module the
833callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
834This means that you cannot use them to derive custom pickling and unpickling
835subclasses.  Most applications have no need for this functionality and should
836benefit from the greatly improved performance of the :mod:`cPickle` module.
837
838The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
839identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
840interchangeably with existing pickles. [#]_
841
842There are additional minor differences in API between :mod:`cPickle` and
843:mod:`pickle`, however for most applications, they are interchangeable.  More
844documentation is provided in the :mod:`pickle` module documentation, which
845includes a list of the documented differences.
846
847.. rubric:: Footnotes
848
849.. [#] Don't confuse this with the :mod:`marshal` module
850
851.. [#] In the :mod:`pickle` module these callables are classes, which you could
852   subclass to customize the behavior.  However, in the :mod:`cPickle` module these
853   callables are factory functions and so cannot be subclassed.  One common reason
854   to subclass is to control what objects can actually be unpickled.  See section
855   :ref:`pickle-sub` for more details.
856
857.. [#] *Warning*: this is intended for pickling multiple objects without intervening
858   modifications to the objects or their parts.  If you modify an object and then
859   pickle it again using the same :class:`Pickler` instance, the object is not
860   pickled again --- a reference to it is pickled and the :class:`Unpickler` will
861   return the old value, not the modified one. There are two problems here: (1)
862   detecting changes, and (2) marshalling a minimal set of changes.  Garbage
863   Collection may also become a problem here.
864
865.. [#] The exception raised will likely be an :exc:`ImportError` or an
866   :exc:`AttributeError` but it could be something else.
867
868.. [#] These methods can also be used to implement copying class instances.
869
870.. [#] This protocol is also used by the shallow and deep copying operations defined in
871   the :mod:`copy` module.
872
873.. [#] The actual mechanism for associating these user defined functions is slightly
874   different for :mod:`pickle` and :mod:`cPickle`.  The description given here
875   works the same for both implementations.  Users of the :mod:`pickle` module
876   could also use subclassing to effect the same results, overriding the
877   :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
878   classes.
879
880.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
881   in their living rooms.
882
883.. [#] A word of caution: the mechanisms described here use internal attributes and
884   methods, which are subject to change in future versions of Python.  We intend to
885   someday provide a common interface for controlling this behavior, which will
886   work in either :mod:`pickle` or :mod:`cPickle`.
887
888.. [#] Since the pickle data format is actually a tiny stack-oriented programming
889   language, and some freedom is taken in the encodings of certain objects, it is
890   possible that the two modules produce different data streams for the same input
891   objects.  However it is guaranteed that they will always be able to read each
892   other's data streams.
893
894