1:mod:`gettext` --- Multilingual internationalization services
2=============================================================
3
4.. module:: gettext
5   :synopsis: Multilingual internationalization services.
6
7.. moduleauthor:: Barry A. Warsaw <barry@python.org>
8.. sectionauthor:: Barry A. Warsaw <barry@python.org>
9
10**Source code:** :source:`Lib/gettext.py`
11
12--------------
13
14The :mod:`gettext` module provides internationalization (I18N) and localization
15(L10N) services for your Python modules and applications. It supports both the
16GNU ``gettext`` message catalog API and a higher level, class-based API that may
17be more appropriate for Python files.  The interface described below allows you
18to write your module and application messages in one natural language, and
19provide a catalog of translated messages for running under different natural
20languages.
21
22Some hints on localizing your Python modules and applications are also given.
23
24
25GNU :program:`gettext` API
26--------------------------
27
28The :mod:`gettext` module defines the following API, which is very similar to
29the GNU :program:`gettext` API.  If you use this API you will affect the
30translation of your entire application globally.  Often this is what you want if
31your application is monolingual, with the choice of language dependent on the
32locale of your user.  If you are localizing a Python module, or if your
33application needs to switch languages on the fly, you probably want to use the
34class-based API instead.
35
36
37.. function:: bindtextdomain(domain, localedir=None)
38
39   Bind the *domain* to the locale directory *localedir*.  More concretely,
40   :mod:`gettext` will look for binary :file:`.mo` files for the given domain using
41   the path (on Unix): :file:`localedir/language/LC_MESSAGES/domain.mo`, where
42   *languages* is searched for in the environment variables :envvar:`LANGUAGE`,
43   :envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and :envvar:`LANG` respectively.
44
45   If *localedir* is omitted or ``None``, then the current binding for *domain* is
46   returned. [#]_
47
48
49.. function:: bind_textdomain_codeset(domain, codeset=None)
50
51   Bind the *domain* to *codeset*, changing the encoding of byte strings
52   returned by the :func:`lgettext`, :func:`ldgettext`, :func:`lngettext`
53   and :func:`ldngettext` functions.
54   If *codeset* is omitted, then the current binding is returned.
55
56
57.. function:: textdomain(domain=None)
58
59   Change or query the current global domain.  If *domain* is ``None``, then the
60   current global domain is returned, otherwise the global domain is set to
61   *domain*, which is returned.
62
63
64.. index:: single: _ (underscore); gettext
65.. function:: gettext(message)
66
67   Return the localized translation of *message*, based on the current global
68   domain, language, and locale directory.  This function is usually aliased as
69   :func:`_` in the local namespace (see examples below).
70
71
72.. function:: dgettext(domain, message)
73
74   Like :func:`.gettext`, but look the message up in the specified *domain*.
75
76
77.. function:: ngettext(singular, plural, n)
78
79   Like :func:`.gettext`, but consider plural forms. If a translation is found,
80   apply the plural formula to *n*, and return the resulting message (some
81   languages have more than two plural forms). If no translation is found, return
82   *singular* if *n* is 1; return *plural* otherwise.
83
84   The Plural formula is taken from the catalog header. It is a C or Python
85   expression that has a free variable *n*; the expression evaluates to the index
86   of the plural in the catalog. See
87   `the GNU gettext documentation <https://www.gnu.org/software/gettext/manual/gettext.html>`__
88   for the precise syntax to be used in :file:`.po` files and the
89   formulas for a variety of languages.
90
91
92.. function:: dngettext(domain, singular, plural, n)
93
94   Like :func:`ngettext`, but look the message up in the specified *domain*.
95
96
97.. function:: lgettext(message)
98.. function:: ldgettext(domain, message)
99.. function:: lngettext(singular, plural, n)
100.. function:: ldngettext(domain, singular, plural, n)
101
102   Equivalent to the corresponding functions without the ``l`` prefix
103   (:func:`.gettext`, :func:`dgettext`, :func:`ngettext` and :func:`dngettext`),
104   but the translation is returned as a byte string encoded in the preferred
105   system encoding if no other encoding was explicitly set with
106   :func:`bind_textdomain_codeset`.
107
108   .. warning::
109
110      These functions should be avoided in Python 3, because they return
111      encoded bytes.  It's much better to use alternatives which return
112      Unicode strings instead, since most Python applications will want to
113      manipulate human readable text as strings instead of bytes.  Further,
114      it's possible that you may get unexpected Unicode-related exceptions
115      if there are encoding problems with the translated strings.  It is
116      possible that the ``l*()`` functions will be deprecated in future Python
117      versions due to their inherent problems and limitations.
118
119
120Note that GNU :program:`gettext` also defines a :func:`dcgettext` method, but
121this was deemed not useful and so it is currently unimplemented.
122
123Here's an example of typical usage for this API::
124
125   import gettext
126   gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
127   gettext.textdomain('myapplication')
128   _ = gettext.gettext
129   # ...
130   print(_('This is a translatable string.'))
131
132
133Class-based API
134---------------
135
136The class-based API of the :mod:`gettext` module gives you more flexibility and
137greater convenience than the GNU :program:`gettext` API.  It is the recommended
138way of localizing your Python applications and modules.  :mod:`!gettext` defines
139a "translations" class which implements the parsing of GNU :file:`.mo` format
140files, and has methods for returning strings. Instances of this "translations"
141class can also install themselves in the built-in namespace as the function
142:func:`_`.
143
144
145.. function:: find(domain, localedir=None, languages=None, all=False)
146
147   This function implements the standard :file:`.mo` file search algorithm.  It
148   takes a *domain*, identical to what :func:`textdomain` takes.  Optional
149   *localedir* is as in :func:`bindtextdomain`  Optional *languages* is a list of
150   strings, where each string is a language code.
151
152   If *localedir* is not given, then the default system locale directory is used.
153   [#]_  If *languages* is not given, then the following environment variables are
154   searched: :envvar:`LANGUAGE`, :envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and
155   :envvar:`LANG`.  The first one returning a non-empty value is used for the
156   *languages* variable. The environment variables should contain a colon separated
157   list of languages, which will be split on the colon to produce the expected list
158   of language code strings.
159
160   :func:`find` then expands and normalizes the languages, and then iterates
161   through them, searching for an existing file built of these components:
162
163   :file:`{localedir}/{language}/LC_MESSAGES/{domain}.mo`
164
165   The first such file name that exists is returned by :func:`find`. If no such
166   file is found, then ``None`` is returned. If *all* is given, it returns a list
167   of all file names, in the order in which they appear in the languages list or
168   the environment variables.
169
170
171.. function:: translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None)
172
173   Return a :class:`Translations` instance based on the *domain*, *localedir*,
174   and *languages*, which are first passed to :func:`find` to get a list of the
175   associated :file:`.mo` file paths.  Instances with identical :file:`.mo` file
176   names are cached.  The actual class instantiated is either *class_* if
177   provided, otherwise :class:`GNUTranslations`.  The class's constructor must
178   take a single :term:`file object` argument.  If provided, *codeset* will change
179   the charset used to encode translated strings in the
180   :meth:`~NullTranslations.lgettext` and :meth:`~NullTranslations.lngettext`
181   methods.
182
183   If multiple files are found, later files are used as fallbacks for earlier ones.
184   To allow setting the fallback, :func:`copy.copy` is used to clone each
185   translation object from the cache; the actual instance data is still shared with
186   the cache.
187
188   If no :file:`.mo` file is found, this function raises :exc:`OSError` if
189   *fallback* is false (which is the default), and returns a
190   :class:`NullTranslations` instance if *fallback* is true.
191
192   .. versionchanged:: 3.3
193      :exc:`IOError` used to be raised instead of :exc:`OSError`.
194
195
196.. function:: install(domain, localedir=None, codeset=None, names=None)
197
198   This installs the function :func:`_` in Python's builtins namespace, based on
199   *domain*, *localedir*, and *codeset* which are passed to the function
200   :func:`translation`.
201
202   For the *names* parameter, please see the description of the translation
203   object's :meth:`~NullTranslations.install` method.
204
205   As seen below, you usually mark the strings in your application that are
206   candidates for translation, by wrapping them in a call to the :func:`_`
207   function, like this::
208
209      print(_('This string will be translated.'))
210
211   For convenience, you want the :func:`_` function to be installed in Python's
212   builtins namespace, so it is easily accessible in all modules of your
213   application.
214
215
216The :class:`NullTranslations` class
217^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
218
219Translation classes are what actually implement the translation of original
220source file message strings to translated message strings. The base class used
221by all translation classes is :class:`NullTranslations`; this provides the basic
222interface you can use to write your own specialized translation classes.  Here
223are the methods of :class:`!NullTranslations`:
224
225
226.. class:: NullTranslations(fp=None)
227
228   Takes an optional :term:`file object` *fp*, which is ignored by the base class.
229   Initializes "protected" instance variables *_info* and *_charset* which are set
230   by derived classes, as well as *_fallback*, which is set through
231   :meth:`add_fallback`.  It then calls ``self._parse(fp)`` if *fp* is not
232   ``None``.
233
234   .. method:: _parse(fp)
235
236      No-op'd in the base class, this method takes file object *fp*, and reads
237      the data from the file, initializing its message catalog.  If you have an
238      unsupported message catalog file format, you should override this method
239      to parse your format.
240
241
242   .. method:: add_fallback(fallback)
243
244      Add *fallback* as the fallback object for the current translation object.
245      A translation object should consult the fallback if it cannot provide a
246      translation for a given message.
247
248
249   .. method:: gettext(message)
250
251      If a fallback has been set, forward :meth:`!gettext` to the fallback.
252      Otherwise, return *message*.  Overridden in derived classes.
253
254
255   .. method:: ngettext(singular, plural, n)
256
257      If a fallback has been set, forward :meth:`!ngettext` to the fallback.
258      Otherwise, return *singular* if *n* is 1; return *plural* otherwise.
259      Overridden in derived classes.
260
261
262   .. method:: lgettext(message)
263   .. method:: lngettext(singular, plural, n)
264
265      Equivalent to :meth:`.gettext` and :meth:`.ngettext`, but the translation
266      is returned as a byte string encoded in the preferred system encoding
267      if no encoding was explicitly set with :meth:`set_output_charset`.
268      Overridden in derived classes.
269
270      .. warning::
271
272         These methods should be avoided in Python 3.  See the warning for the
273         :func:`lgettext` function.
274
275
276   .. method:: info()
277
278      Return the "protected" :attr:`_info` variable.
279
280
281   .. method:: charset()
282
283      Return the encoding of the message catalog file.
284
285
286   .. method:: output_charset()
287
288      Return the encoding used to return translated messages in :meth:`.lgettext`
289      and :meth:`.lngettext`.
290
291
292   .. method:: set_output_charset(charset)
293
294      Change the encoding used to return translated messages.
295
296
297   .. method:: install(names=None)
298
299      This method installs :meth:`.gettext` into the built-in namespace,
300      binding it to ``_``.
301
302      If the *names* parameter is given, it must be a sequence containing the
303      names of functions you want to install in the builtins namespace in
304      addition to :func:`_`.  Supported names are ``'gettext'``, ``'ngettext'``,
305      ``'lgettext'`` and ``'lngettext'``.
306
307      Note that this is only one way, albeit the most convenient way, to make
308      the :func:`_` function available to your application.  Because it affects
309      the entire application globally, and specifically the built-in namespace,
310      localized modules should never install :func:`_`. Instead, they should use
311      this code to make :func:`_` available to their module::
312
313         import gettext
314         t = gettext.translation('mymodule', ...)
315         _ = t.gettext
316
317      This puts :func:`_` only in the module's global namespace and so only
318      affects calls within this module.
319
320
321The :class:`GNUTranslations` class
322^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
323
324The :mod:`gettext` module provides one additional class derived from
325:class:`NullTranslations`: :class:`GNUTranslations`.  This class overrides
326:meth:`_parse` to enable reading GNU :program:`gettext` format :file:`.mo` files
327in both big-endian and little-endian format.
328
329:class:`GNUTranslations` parses optional meta-data out of the translation
330catalog.  It is convention with GNU :program:`gettext` to include meta-data as
331the translation for the empty string.  This meta-data is in :rfc:`822`\ -style
332``key: value`` pairs, and should contain the ``Project-Id-Version`` key.  If the
333key ``Content-Type`` is found, then the ``charset`` property is used to
334initialize the "protected" :attr:`_charset` instance variable, defaulting to
335``None`` if not found.  If the charset encoding is specified, then all message
336ids and message strings read from the catalog are converted to Unicode using
337this encoding, else ASCII encoding is assumed.
338
339Since message ids are read as Unicode strings too, all :meth:`*gettext` methods
340will assume message ids as Unicode strings, not byte strings.
341
342The entire set of key/value pairs are placed into a dictionary and set as the
343"protected" :attr:`_info` instance variable.
344
345If the :file:`.mo` file's magic number is invalid, the major version number is
346unexpected, or if other problems occur while reading the file, instantiating a
347:class:`GNUTranslations` class can raise :exc:`OSError`.
348
349.. class:: GNUTranslations
350
351   The following methods are overridden from the base class implementation:
352
353   .. method:: gettext(message)
354
355      Look up the *message* id in the catalog and return the corresponding message
356      string, as a Unicode string.  If there is no entry in the catalog for the
357      *message* id, and a fallback has been set, the look up is forwarded to the
358      fallback's :meth:`~NullTranslations.gettext` method.  Otherwise, the
359      *message* id is returned.
360
361
362   .. method:: ngettext(singular, plural, n)
363
364      Do a plural-forms lookup of a message id.  *singular* is used as the message id
365      for purposes of lookup in the catalog, while *n* is used to determine which
366      plural form to use.  The returned message string is a Unicode string.
367
368      If the message id is not found in the catalog, and a fallback is specified,
369      the request is forwarded to the fallback's :meth:`~NullTranslations.ngettext`
370      method.  Otherwise, when *n* is 1 *singular* is returned, and *plural* is
371      returned in all other cases.
372
373      Here is an example::
374
375         n = len(os.listdir('.'))
376         cat = GNUTranslations(somefile)
377         message = cat.ngettext(
378             'There is %(num)d file in this directory',
379             'There are %(num)d files in this directory',
380             n) % {'num': n}
381
382
383   .. method:: lgettext(message)
384   .. method:: lngettext(singular, plural, n)
385
386      Equivalent to :meth:`.gettext` and :meth:`.ngettext`, but the translation
387      is returned as a byte string encoded in the preferred system encoding
388      if no encoding  was explicitly set with
389      :meth:`~NullTranslations.set_output_charset`.
390
391      .. warning::
392
393         These methods should be avoided in Python 3.  See the warning for the
394         :func:`lgettext` function.
395
396
397Solaris message catalog support
398^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
399
400The Solaris operating system defines its own binary :file:`.mo` file format, but
401since no documentation can be found on this format, it is not supported at this
402time.
403
404
405The Catalog constructor
406^^^^^^^^^^^^^^^^^^^^^^^
407
408.. index:: single: GNOME
409
410GNOME uses a version of the :mod:`gettext` module by James Henstridge, but this
411version has a slightly different API.  Its documented usage was::
412
413   import gettext
414   cat = gettext.Catalog(domain, localedir)
415   _ = cat.gettext
416   print(_('hello world'))
417
418For compatibility with this older module, the function :func:`Catalog` is an
419alias for the :func:`translation` function described above.
420
421One difference between this module and Henstridge's: his catalog objects
422supported access through a mapping API, but this appears to be unused and so is
423not currently supported.
424
425
426Internationalizing your programs and modules
427--------------------------------------------
428
429Internationalization (I18N) refers to the operation by which a program is made
430aware of multiple languages.  Localization (L10N) refers to the adaptation of
431your program, once internationalized, to the local language and cultural habits.
432In order to provide multilingual messages for your Python programs, you need to
433take the following steps:
434
435#. prepare your program or module by specially marking translatable strings
436
437#. run a suite of tools over your marked files to generate raw messages catalogs
438
439#. create language specific translations of the message catalogs
440
441#. use the :mod:`gettext` module so that message strings are properly translated
442
443In order to prepare your code for I18N, you need to look at all the strings in
444your files.  Any string that needs to be translated should be marked by wrapping
445it in ``_('...')`` --- that is, a call to the function :func:`_`.  For example::
446
447   filename = 'mylog.txt'
448   message = _('writing a log message')
449   fp = open(filename, 'w')
450   fp.write(message)
451   fp.close()
452
453In this example, the string ``'writing a log message'`` is marked as a candidate
454for translation, while the strings ``'mylog.txt'`` and ``'w'`` are not.
455
456There are a few tools to extract the strings meant for translation.
457The original GNU :program:`gettext` only supported C or C++ source
458code but its extended version :program:`xgettext` scans code written
459in a number of languages, including Python, to find strings marked as
460translatable.  `Babel <http://babel.pocoo.org/>`__ is a Python
461internationalization library that includes a :file:`pybabel` script to
462extract and compile message catalogs.  François Pinard's program
463called :program:`xpot` does a similar job and is available as part of
464his `po-utils package <https://github.com/pinard/po-utils>`__.
465
466(Python also includes pure-Python versions of these programs, called
467:program:`pygettext.py` and :program:`msgfmt.py`; some Python distributions
468will install them for you.  :program:`pygettext.py` is similar to
469:program:`xgettext`, but only understands Python source code and
470cannot handle other programming languages such as C or C++.
471:program:`pygettext.py` supports a command-line interface similar to
472:program:`xgettext`; for details on its use, run ``pygettext.py
473--help``.  :program:`msgfmt.py` is binary compatible with GNU
474:program:`msgfmt`.  With these two programs, you may not need the GNU
475:program:`gettext` package to internationalize your Python
476applications.)
477
478:program:`xgettext`, :program:`pygettext`, and similar tools generate
479:file:`.po` files that are message catalogs.  They are structured
480human-readable files that contain every marked string in the source
481code, along with a placeholder for the translated versions of these
482strings.
483
484Copies of these :file:`.po` files are then handed over to the
485individual human translators who write translations for every
486supported natural language.  They send back the completed
487language-specific versions as a :file:`<language-name>.po` file that's
488compiled into a machine-readable :file:`.mo` binary catalog file using
489the :program:`msgfmt` program.  The :file:`.mo` files are used by the
490:mod:`gettext` module for the actual translation processing at
491run-time.
492
493How you use the :mod:`gettext` module in your code depends on whether you are
494internationalizing a single module or your entire application. The next two
495sections will discuss each case.
496
497
498Localizing your module
499^^^^^^^^^^^^^^^^^^^^^^
500
501If you are localizing your module, you must take care not to make global
502changes, e.g. to the built-in namespace.  You should not use the GNU ``gettext``
503API but instead the class-based API.
504
505Let's say your module is called "spam" and the module's various natural language
506translation :file:`.mo` files reside in :file:`/usr/share/locale` in GNU
507:program:`gettext` format.  Here's what you would put at the top of your
508module::
509
510   import gettext
511   t = gettext.translation('spam', '/usr/share/locale')
512   _ = t.gettext
513
514
515Localizing your application
516^^^^^^^^^^^^^^^^^^^^^^^^^^^
517
518If you are localizing your application, you can install the :func:`_` function
519globally into the built-in namespace, usually in the main driver file of your
520application.  This will let all your application-specific files just use
521``_('...')`` without having to explicitly install it in each file.
522
523In the simple case then, you need only add the following bit of code to the main
524driver file of your application::
525
526   import gettext
527   gettext.install('myapplication')
528
529If you need to set the locale directory, you can pass it into the
530:func:`install` function::
531
532   import gettext
533   gettext.install('myapplication', '/usr/share/locale')
534
535
536Changing languages on the fly
537^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
538
539If your program needs to support many languages at the same time, you may want
540to create multiple translation instances and then switch between them
541explicitly, like so::
542
543   import gettext
544
545   lang1 = gettext.translation('myapplication', languages=['en'])
546   lang2 = gettext.translation('myapplication', languages=['fr'])
547   lang3 = gettext.translation('myapplication', languages=['de'])
548
549   # start by using language1
550   lang1.install()
551
552   # ... time goes by, user selects language 2
553   lang2.install()
554
555   # ... more time goes by, user selects language 3
556   lang3.install()
557
558
559Deferred translations
560^^^^^^^^^^^^^^^^^^^^^
561
562In most coding situations, strings are translated where they are coded.
563Occasionally however, you need to mark strings for translation, but defer actual
564translation until later.  A classic example is::
565
566   animals = ['mollusk',
567              'albatross',
568              'rat',
569              'penguin',
570              'python', ]
571   # ...
572   for a in animals:
573       print(a)
574
575Here, you want to mark the strings in the ``animals`` list as being
576translatable, but you don't actually want to translate them until they are
577printed.
578
579Here is one way you can handle this situation::
580
581   def _(message): return message
582
583   animals = [_('mollusk'),
584              _('albatross'),
585              _('rat'),
586              _('penguin'),
587              _('python'), ]
588
589   del _
590
591   # ...
592   for a in animals:
593       print(_(a))
594
595This works because the dummy definition of :func:`_` simply returns the string
596unchanged.  And this dummy definition will temporarily override any definition
597of :func:`_` in the built-in namespace (until the :keyword:`del` command). Take
598care, though if you have a previous definition of :func:`_` in the local
599namespace.
600
601Note that the second use of :func:`_` will not identify "a" as being
602translatable to the :program:`gettext` program, because the parameter
603is not a string literal.
604
605Another way to handle this is with the following example::
606
607   def N_(message): return message
608
609   animals = [N_('mollusk'),
610              N_('albatross'),
611              N_('rat'),
612              N_('penguin'),
613              N_('python'), ]
614
615   # ...
616   for a in animals:
617       print(_(a))
618
619In this case, you are marking translatable strings with the function
620:func:`N_`, which won't conflict with any definition of :func:`_`.
621However, you will need to teach your message extraction program to
622look for translatable strings marked with :func:`N_`. :program:`xgettext`,
623:program:`pygettext`, ``pybabel extract``, and :program:`xpot` all
624support this through the use of the :option:`!-k` command-line switch.
625The choice of :func:`N_` here is totally arbitrary; it could have just
626as easily been :func:`MarkThisStringForTranslation`.
627
628
629Acknowledgements
630----------------
631
632The following people contributed code, feedback, design suggestions, previous
633implementations, and valuable experience to the creation of this module:
634
635* Peter Funk
636
637* James Henstridge
638
639* Juan David Ibáñez Palomar
640
641* Marc-André Lemburg
642
643* Martin von Löwis
644
645* François Pinard
646
647* Barry Warsaw
648
649* Gustavo Niemeyer
650
651.. rubric:: Footnotes
652
653.. [#] The default locale directory is system dependent; for example, on RedHat Linux
654   it is :file:`/usr/share/locale`, but on Solaris it is :file:`/usr/lib/locale`.
655   The :mod:`gettext` module does not try to support these system dependent
656   defaults; instead its default is :file:`sys.prefix/share/locale`. For this
657   reason, it is always best to call :func:`bindtextdomain` with an explicit
658   absolute path at the start of your application.
659
660.. [#] See the footnote for :func:`bindtextdomain` above.
661