1.. _pyporting-howto:
2
3*********************************
4Porting Python 2 Code to Python 3
5*********************************
6
7:author: Brett Cannon
8
9.. topic:: Abstract
10
11   With Python 3 being the future of Python while Python 2 is still in active
12   use, it is good to have your project available for both major releases of
13   Python. This guide is meant to help you figure out how best to support both
14   Python 2 & 3 simultaneously.
15
16   If you are looking to port an extension module instead of pure Python code,
17   please see :ref:`cporting-howto`.
18
19   If you would like to read one core Python developer's take on why Python 3
20   came into existence, you can read Nick Coghlan's `Python 3 Q & A`_ or
21   Brett Cannon's `Why Python 3 exists`_.
22
23   For help with porting, you can email the python-porting_ mailing list with
24   questions.
25
26The Short Explanation
27=====================
28
29To make your project be single-source Python 2/3 compatible, the basic steps
30are:
31
32#. Only worry about supporting Python 2.7
33#. Make sure you have good test coverage (coverage.py_ can help;
34   ``pip install coverage``)
35#. Learn the differences between Python 2 & 3
36#. Use Futurize_ (or Modernize_) to update your code (e.g. ``pip install future``)
37#. Use Pylint_ to help make sure you don't regress on your Python 3 support
38   (``pip install pylint``)
39#. Use caniusepython3_ to find out which of your dependencies are blocking your
40   use of Python 3 (``pip install caniusepython3``)
41#. Once your dependencies are no longer blocking you, use continuous integration
42   to make sure you stay compatible with Python 2 & 3 (tox_ can help test
43   against multiple versions of Python; ``pip install tox``)
44#. Consider using optional static type checking to make sure your type usage
45   works in both Python 2 & 3 (e.g. use mypy_ to check your typing under both
46   Python 2 & Python 3).
47
48
49Details
50=======
51
52A key point about supporting Python 2 & 3 simultaneously is that you can start
53**today**! Even if your dependencies are not supporting Python 3 yet that does
54not mean you can't modernize your code **now** to support Python 3. Most changes
55required to support Python 3 lead to cleaner code using newer practices even in
56Python 2 code.
57
58Another key point is that modernizing your Python 2 code to also support
59Python 3 is largely automated for you. While you might have to make some API
60decisions thanks to Python 3 clarifying text data versus binary data, the
61lower-level work is now mostly done for you and thus can at least benefit from
62the automated changes immediately.
63
64Keep those key points in mind while you read on about the details of porting
65your code to support Python 2 & 3 simultaneously.
66
67
68Drop support for Python 2.6 and older
69-------------------------------------
70
71While you can make Python 2.5 work with Python 3, it is **much** easier if you
72only have to work with Python 2.7. If dropping Python 2.5 is not an
73option then the six_ project can help you support Python 2.5 & 3 simultaneously
74(``pip install six``). Do realize, though, that nearly all the projects listed
75in this HOWTO will not be available to you.
76
77If you are able to skip Python 2.5 and older, then the required changes
78to your code should continue to look and feel like idiomatic Python code. At
79worst you will have to use a function instead of a method in some instances or
80have to import a function instead of using a built-in one, but otherwise the
81overall transformation should not feel foreign to you.
82
83But you should aim for only supporting Python 2.7. Python 2.6 is no longer
84freely supported and thus is not receiving bugfixes. This means **you** will have
85to work around any issues you come across with Python 2.6. There are also some
86tools mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint_),
87and this will become more commonplace as time goes on. It will simply be easier
88for you if you only support the versions of Python that you have to support.
89
90
91Make sure you specify the proper version support in your ``setup.py`` file
92--------------------------------------------------------------------------
93
94In your ``setup.py`` file you should have the proper `trove classifier`_
95specifying what versions of Python you support. As your project does not support
96Python 3 yet you should at least have
97``Programming Language :: Python :: 2 :: Only`` specified. Ideally you should
98also specify each major/minor version of Python that you do support, e.g.
99``Programming Language :: Python :: 2.7``.
100
101
102Have good test coverage
103-----------------------
104
105Once you have your code supporting the oldest version of Python 2 you want it
106to, you will want to make sure your test suite has good coverage. A good rule of
107thumb is that if you want to be confident enough in your test suite that any
108failures that appear after having tools rewrite your code are actual bugs in the
109tools and not in your code. If you want a number to aim for, try to get over 80%
110coverage (and don't feel bad if you find it hard to get better than 90%
111coverage). If you don't already have a tool to measure test coverage then
112coverage.py_ is recommended.
113
114
115Learn the differences between Python 2 & 3
116-------------------------------------------
117
118Once you have your code well-tested you are ready to begin porting your code to
119Python 3! But to fully understand how your code is going to change and what
120you want to look out for while you code, you will want to learn what changes
121Python 3 makes in terms of Python 2. Typically the two best ways of doing that
122is reading the `"What's New"`_ doc for each release of Python 3 and the
123`Porting to Python 3`_ book (which is free online). There is also a handy
124`cheat sheet`_ from the Python-Future project.
125
126
127Update your code
128----------------
129
130Once you feel like you know what is different in Python 3 compared to Python 2,
131it's time to update your code! You have a choice between two tools in porting
132your code automatically: Futurize_ and Modernize_. Which tool you choose will
133depend on how much like Python 3 you want your code to be. Futurize_ does its
134best to make Python 3 idioms and practices exist in Python 2, e.g. backporting
135the ``bytes`` type from Python 3 so that you have semantic parity between the
136major versions of Python. Modernize_,
137on the other hand, is more conservative and targets a Python 2/3 subset of
138Python, directly relying on six_ to help provide compatibility. As Python 3 is
139the future, it might be best to consider Futurize to begin adjusting to any new
140practices that Python 3 introduces which you are not accustomed to yet.
141
142Regardless of which tool you choose, they will update your code to run under
143Python 3 while staying compatible with the version of Python 2 you started with.
144Depending on how conservative you want to be, you may want to run the tool over
145your test suite first and visually inspect the diff to make sure the
146transformation is accurate. After you have transformed your test suite and
147verified that all the tests still pass as expected, then you can transform your
148application code knowing that any tests which fail is a translation failure.
149
150Unfortunately the tools can't automate everything to make your code work under
151Python 3 and so there are a handful of things you will need to update manually
152to get full Python 3 support (which of these steps are necessary vary between
153the tools). Read the documentation for the tool you choose to use to see what it
154fixes by default and what it can do optionally to know what will (not) be fixed
155for you and what you may have to fix on your own (e.g. using ``io.open()`` over
156the built-in ``open()`` function is off by default in Modernize). Luckily,
157though, there are only a couple of things to watch out for which can be
158considered large issues that may be hard to debug if not watched for.
159
160
161Division
162++++++++
163
164In Python 3, ``5 / 2 == 2.5`` and not ``2``; all division between ``int`` values
165result in a ``float``. This change has actually been planned since Python 2.2
166which was released in 2002. Since then users have been encouraged to add
167``from __future__ import division`` to any and all files which use the ``/`` and
168``//`` operators or to be running the interpreter with the ``-Q`` flag. If you
169have not been doing this then you will need to go through your code and do two
170things:
171
172#. Add ``from __future__ import division`` to your files
173#. Update any division operator as necessary to either use ``//`` to use floor
174   division or continue using ``/`` and expect a float
175
176The reason that ``/`` isn't simply translated to ``//`` automatically is that if
177an object defines a ``__truediv__`` method but not ``__floordiv__`` then your
178code would begin to fail (e.g. a user-defined class that uses ``/`` to
179signify some operation but not ``//`` for the same thing or at all).
180
181
182Text versus binary data
183+++++++++++++++++++++++
184
185In Python 2 you could use the ``str`` type for both text and binary data.
186Unfortunately this confluence of two different concepts could lead to brittle
187code which sometimes worked for either kind of data, sometimes not. It also
188could lead to confusing APIs if people didn't explicitly state that something
189that accepted ``str`` accepted either text or binary data instead of one
190specific type. This complicated the situation especially for anyone supporting
191multiple languages as APIs wouldn't bother explicitly supporting ``unicode``
192when they claimed text data support.
193
194To make the distinction between text and binary data clearer and more
195pronounced, Python 3 did what most languages created in the age of the internet
196have done and made text and binary data distinct types that cannot blindly be
197mixed together (Python predates widespread access to the internet). For any code
198that deals only with text or only binary data, this separation doesn't pose an
199issue. But for code that has to deal with both, it does mean you might have to
200now care about when you are using text compared to binary data, which is why
201this cannot be entirely automated.
202
203To start, you will need to decide which APIs take text and which take binary
204(it is **highly** recommended you don't design APIs that can take both due to
205the difficulty of keeping the code working; as stated earlier it is difficult to
206do well). In Python 2 this means making sure the APIs that take text can work
207with ``unicode`` and those that work with binary data work with the
208``bytes`` type from Python 3 (which is a subset of ``str`` in Python 2 and acts
209as an alias for ``bytes`` type in Python 2). Usually the biggest issue is
210realizing which methods exist on which types in Python 2 & 3 simultaneously
211(for text that's ``unicode`` in Python 2 and ``str`` in Python 3, for binary
212that's ``str``/``bytes`` in Python 2 and ``bytes`` in Python 3). The following
213table lists the **unique** methods of each data type across Python 2 & 3
214(e.g., the ``decode()`` method is usable on the equivalent binary data type in
215either Python 2 or 3, but it can't be used by the textual data type consistently
216between Python 2 and 3 because ``str`` in Python 3 doesn't have the method). Do
217note that as of Python 3.5 the ``__mod__`` method was added to the bytes type.
218
219======================== =====================
220**Text data**            **Binary data**
221------------------------ ---------------------
222\                        decode
223------------------------ ---------------------
224encode
225------------------------ ---------------------
226format
227------------------------ ---------------------
228isdecimal
229------------------------ ---------------------
230isnumeric
231======================== =====================
232
233Making the distinction easier to handle can be accomplished by encoding and
234decoding between binary data and text at the edge of your code. This means that
235when you receive text in binary data, you should immediately decode it. And if
236your code needs to send text as binary data then encode it as late as possible.
237This allows your code to work with only text internally and thus eliminates
238having to keep track of what type of data you are working with.
239
240The next issue is making sure you know whether the string literals in your code
241represent text or binary data. You should add a ``b`` prefix to any
242literal that presents binary data. For text you should add a ``u`` prefix to
243the text literal. (there is a :mod:`__future__` import to force all unspecified
244literals to be Unicode, but usage has shown it isn't as effective as adding a
245``b`` or ``u`` prefix to all literals explicitly)
246
247As part of this dichotomy you also need to be careful about opening files.
248Unless you have been working on Windows, there is a chance you have not always
249bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for
250binary reading).  Under Python 3, binary files and text files are clearly
251distinct and mutually incompatible; see the :mod:`io` module for details.
252Therefore, you **must** make a decision of whether a file will be used for
253binary access (allowing binary data to be read and/or written) or textual access
254(allowing text data to be read and/or written). You should also use :func:`io.open`
255for opening files instead of the built-in :func:`open` function as the :mod:`io`
256module is consistent from Python 2 to 3 while the built-in :func:`open` function
257is not (in Python 3 it's actually :func:`io.open`). Do not bother with the
258outdated practice of using :func:`codecs.open` as that's only necessary for
259keeping compatibility with Python 2.5.
260
261The constructors of both ``str`` and ``bytes`` have different semantics for the
262same arguments between Python 2 & 3. Passing an integer to ``bytes`` in Python 2
263will give you the string representation of the integer: ``bytes(3) == '3'``.
264But in Python 3, an integer argument to ``bytes`` will give you a bytes object
265as long as the integer specified, filled with null bytes:
266``bytes(3) == b'\x00\x00\x00'``. A similar worry is necessary when passing a
267bytes object to ``str``. In Python 2 you just get the bytes object back:
268``str(b'3') == b'3'``. But in Python 3 you get the string representation of the
269bytes object: ``str(b'3') == "b'3'"``.
270
271Finally, the indexing of binary data requires careful handling (slicing does
272**not** require any special handling). In Python 2,
273``b'123'[1] == b'2'`` while in Python 3 ``b'123'[1] == 50``. Because binary data
274is simply a collection of binary numbers, Python 3 returns the integer value for
275the byte you index on. But in Python 2 because ``bytes == str``, indexing
276returns a one-item slice of bytes. The six_ project has a function
277named ``six.indexbytes()`` which will return an integer like in Python 3:
278``six.indexbytes(b'123', 1)``.
279
280To summarize:
281
282#. Decide which of your APIs take text and which take binary data
283#. Make sure that your code that works with text also works with ``unicode`` and
284   code for binary data works with ``bytes`` in Python 2 (see the table above
285   for what methods you cannot use for each type)
286#. Mark all binary literals with a ``b`` prefix, textual literals with a ``u``
287   prefix
288#. Decode binary data to text as soon as possible, encode text as binary data as
289   late as possible
290#. Open files using :func:`io.open` and make sure to specify the ``b`` mode when
291   appropriate
292#. Be careful when indexing into binary data
293
294
295Use feature detection instead of version detection
296++++++++++++++++++++++++++++++++++++++++++++++++++
297
298Inevitably you will have code that has to choose what to do based on what
299version of Python is running. The best way to do this is with feature detection
300of whether the version of Python you're running under supports what you need.
301If for some reason that doesn't work then you should make the version check be
302against Python 2 and not Python 3. To help explain this, let's look at an
303example.
304
305Let's pretend that you need access to a feature of importlib_ that
306is available in Python's standard library since Python 3.3 and available for
307Python 2 through importlib2_ on PyPI. You might be tempted to write code to
308access e.g. the ``importlib.abc`` module by doing the following::
309
310  import sys
311
312  if sys.version_info[0] == 3:
313      from importlib import abc
314  else:
315      from importlib2 import abc
316
317The problem with this code is what happens when Python 4 comes out? It would
318be better to treat Python 2 as the exceptional case instead of Python 3 and
319assume that future Python versions will be more compatible with Python 3 than
320Python 2::
321
322  import sys
323
324  if sys.version_info[0] > 2:
325      from importlib import abc
326  else:
327      from importlib2 import abc
328
329The best solution, though, is to do no version detection at all and instead rely
330on feature detection. That avoids any potential issues of getting the version
331detection wrong and helps keep you future-compatible::
332
333  try:
334      from importlib import abc
335  except ImportError:
336      from importlib2 import abc
337
338
339Prevent compatibility regressions
340---------------------------------
341
342Once you have fully translated your code to be compatible with Python 3, you
343will want to make sure your code doesn't regress and stop working under
344Python 3. This is especially true if you have a dependency which is blocking you
345from actually running under Python 3 at the moment.
346
347To help with staying compatible, any new modules you create should have
348at least the following block of code at the top of it::
349
350    from __future__ import absolute_import
351    from __future__ import division
352    from __future__ import print_function
353
354You can also run Python 2 with the ``-3`` flag to be warned about various
355compatibility issues your code triggers during execution. If you turn warnings
356into errors with ``-Werror`` then you can make sure that you don't accidentally
357miss a warning.
358
359You can also use the Pylint_ project and its ``--py3k`` flag to lint your code
360to receive warnings when your code begins to deviate from Python 3
361compatibility. This also prevents you from having to run Modernize_ or Futurize_
362over your code regularly to catch compatibility regressions. This does require
363you only support Python 2.7 and Python 3.4 or newer as that is Pylint's
364minimum Python version support.
365
366
367Check which dependencies block your transition
368----------------------------------------------
369
370**After** you have made your code compatible with Python 3 you should begin to
371care about whether your dependencies have also been ported. The caniusepython3_
372project was created to help you determine which projects
373-- directly or indirectly -- are blocking you from supporting Python 3. There
374is both a command-line tool as well as a web interface at
375https://caniusepython3.com.
376
377The project also provides code which you can integrate into your test suite so
378that you will have a failing test when you no longer have dependencies blocking
379you from using Python 3. This allows you to avoid having to manually check your
380dependencies and to be notified quickly when you can start running on Python 3.
381
382
383Update your ``setup.py`` file to denote Python 3 compatibility
384--------------------------------------------------------------
385
386Once your code works under Python 3, you should update the classifiers in
387your ``setup.py`` to contain ``Programming Language :: Python :: 3`` and to not
388specify sole Python 2 support. This will tell anyone using your code that you
389support Python 2 **and** 3. Ideally you will also want to add classifiers for
390each major/minor version of Python you now support.
391
392
393Use continuous integration to stay compatible
394---------------------------------------------
395
396Once you are able to fully run under Python 3 you will want to make sure your
397code always works under both Python 2 & 3. Probably the best tool for running
398your tests under multiple Python interpreters is tox_. You can then integrate
399tox with your continuous integration system so that you never accidentally break
400Python 2 or 3 support.
401
402You may also want to use the ``-bb`` flag with the Python 3 interpreter to
403trigger an exception when you are comparing bytes to strings or bytes to an int
404(the latter is available starting in Python 3.5). By default type-differing
405comparisons simply return ``False``, but if you made a mistake in your
406separation of text/binary data handling or indexing on bytes you wouldn't easily
407find the mistake. This flag will raise an exception when these kinds of
408comparisons occur, making the mistake much easier to track down.
409
410And that's mostly it! At this point your code base is compatible with both
411Python 2 and 3 simultaneously. Your testing will also be set up so that you
412don't accidentally break Python 2 or 3 compatibility regardless of which version
413you typically run your tests under while developing.
414
415
416Consider using optional static type checking
417--------------------------------------------
418
419Another way to help port your code is to use a static type checker like
420mypy_ or pytype_ on your code. These tools can be used to analyze your code as
421if it's being run under Python 2, then you can run the tool a second time as if
422your code is running under Python 3. By running a static type checker twice like
423this you can discover if you're e.g. misusing binary data type in one version
424of Python compared to another. If you add optional type hints to your code you
425can also explicitly state whether your APIs use textual or binary data, helping
426to make sure everything functions as expected in both versions of Python.
427
428
429.. _2to3: https://docs.python.org/3/library/2to3.html
430.. _caniusepython3: https://pypi.org/project/caniusepython3
431.. _cheat sheet: http://python-future.org/compatible_idioms.html
432.. _coverage.py: https://pypi.org/project/coverage
433.. _Futurize: http://python-future.org/automatic_conversion.html
434.. _importlib: https://docs.python.org/3/library/importlib.html#module-importlib
435.. _importlib2: https://pypi.org/project/importlib2
436.. _Modernize: https://python-modernize.readthedocs.org/en/latest/
437.. _mypy: http://mypy-lang.org/
438.. _Porting to Python 3: http://python3porting.com/
439.. _Pylint: https://pypi.org/project/pylint
440
441.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html
442
443.. _pytype: https://github.com/google/pytype
444.. _python-future: http://python-future.org/
445.. _python-porting: https://mail.python.org/mailman/listinfo/python-porting
446.. _six: https://pypi.org/project/six
447.. _tox: https://pypi.org/project/tox
448.. _trove classifier: https://pypi.org/classifiers
449
450.. _"What's New": https://docs.python.org/3/whatsnew/index.html
451
452.. _Why Python 3 exists: http://www.snarky.ca/why-python-3-exists
453