Lines Matching full:unicode
4 Unicode HOWTO
9 This HOWTO discusses Python's support for the Unicode specification
11 people commonly encounter when trying to work with Unicode.
14 Introduction to Unicode
26 Python's string type uses the Unicode Standard for representing
30 Unicode (https://www.unicode.org/) is a specification that aims to
32 its own unique code. The Unicode specifications are continually
42 The Unicode standard describes how characters are represented by
49 The Unicode standard contains a lot of tables listing characters and
88 To summarize the previous section: a Unicode string is a sequence of
92 to 8-bit bytes. The rules for translating a Unicode string into a
126 defaults to using it. UTF stands for "Unicode Transformation Format",
137 1. It can handle any Unicode code point.
138 2. A Unicode string is turned into a sequence of bytes containing no embedded zero
154 The `Unicode Consortium site <http://www.unicode.org>`_ has character charts, a
155 glossary, and PDF versions of the Unicode specification. Be prepared for some
156 difficult reading. `A chronology <http://www.unicode.org/history/>`_ of the
157 origin and development of Unicode is also available on the site.
160 `discusses the history of Unicode and UTF-8 <https://www.youtube.com/watch?v=MijmeoH9LT4>`
164 guide <http://jkorpela.fi/unicode/guide.html>`_ to reading the
165 Unicode character tables.
167 …e-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-set…
177 Python's Unicode Support
180 Now that you've learned the rudiments of Unicode, we can look at Python's
181 Unicode features.
186 Since Python 3.0, the language's :class:`str` type contains Unicode
187 characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
188 rocks!'``, or the triple-quoted string syntax is stored as Unicode.
191 include a Unicode character in a string literal::
200 Side note: Python 3 also supports using Unicode characters in identifiers::
226 character out of the Unicode result), or ``'backslashreplace'`` (inserts a
248 One-character Unicode strings can also be created with the :func:`chr`
249 built-in function, which takes integers and returns a Unicode string of length 1
251 built-in :func:`ord` function that takes a one-character Unicode string and
263 which returns a :class:`bytes` representation of the Unicode string, encoded in the
303 Unicode Literals in Python Source Code
306 In Python source code, specific Unicode code points can be written using the
313 ... # ^^^^^^ four-digit Unicode escape
314 ... # ^^^^^^^^^^ eight-digit Unicode escape
349 Unicode Properties
352 The Unicode specification includes a database of information about
391 `the General Category Values section of the Unicode Character Database documentation <http://www.un…
398 Unicode adds some complication to comparing strings, because the same
408 case-insensitive form following an algorithm described by the Unicode
455 The Unicode Standard also specifies how to do caseless comparisons::
474 section 3.13 of the Unicode Standard for a discussion and an example.)
477 Unicode Regular Expressions
501 Similarly, ``\w`` matches a wide variety of Unicode characters but
503 and ``\s`` will match either Unicode whitespace characters or
510 .. comment should these be mentioned earlier, e.g. at the start of the "introduction to Unicode" fi…
512 Some good alternative discussions of Python's Unicode support are:
515 * `Pragmatic Unicode <https://nedbatchelder.com/text/unipain.html>`_, a PyCon 2012 presentation by …
524 Marc-André Lemburg gave `a presentation titled "Python and Unicode" (PDF slides)
525 <https://downloads.egenix.com/python/Unicode-EPC2002-Talk.pdf>`_ at
527 2's Unicode features (where the Unicode string type is called ``unicode`` and
531 Reading and Writing Unicode Data
534 Once you've written some code that works with Unicode data, the next problem is
535 input/output. How do you get Unicode strings into your program, and how do you
536 convert Unicode into a form suitable for storage or transmission?
540 your application support Unicode natively. XML parsers often return Unicode
541 data, for example. Many relational databases also support Unicode-valued
542 columns and can return Unicode values from an SQL query.
544 Unicode data is usually converted to a particular encoding before it gets
549 One problem is the multi-byte nature of encodings; one Unicode character can be
552 where only part of the bytes encoding a single Unicode character are read at the
557 string and its Unicode version in memory.)
562 that assumes the file's contents are in a specified encoding and accepts Unicode
568 Reading Unicode from a file is therefore simple::
570 with open('unicode.txt', encoding='utf-8') as f:
582 The Unicode character ``U+FEFF`` is used as a byte-order mark (BOM), and is often
597 Unicode filenames
601 that contain arbitrary Unicode characters. Usually this is
602 implemented by converting the Unicode string into some encoding that
613 usually just provide the Unicode string as the filename, and it will be
620 Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
624 the Unicode version of filenames, or should it return bytes containing
626 provided the directory path as bytes or a Unicode string. If you pass a
627 Unicode string as the path, filenames will be decoded using the filesystem's
628 encoding and a list of Unicode strings will be returned, while passing a byte
650 the Unicode versions.
653 Unicode with these APIs. The bytes APIs should only be used on
658 Tips for Writing Unicode-aware Programs
662 Unicode.
666 Software should only work with Unicode strings internally, decoding the input
669 If you attempt to write processing functions that accept both Unicode and byte
735 The `PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
737 <https://downloads.egenix.com/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>`_
741 `The Guts of Unicode in Python
742 <http://pyvideo.org/video/1768/the-guts-of-unicode-in-python>`_
743 is a PyCon 2013 talk by Benjamin Peterson that discusses the internal Unicode