unicode.rst - OpenGrok cross reference for /external/python/cpython3/Doc/c-api/unicode.rst

Lines Matching full:unicode
5 Unicode Objects and Codecs
11 Unicode Objects
14 Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
16 of Unicode characters while staying memory efficient.  There are special cases
18 points must be below 1114112 (which is the full Unicode range).
21 in the Unicode object.  The :c:type:`Py_UNICODE*` representation is deprecated
25 Due to the transition between the old APIs and the new APIs, unicode objects
28 * "canonical" unicode objects are all objects created by a non-deprecated
29   unicode API.  They use the most efficient representation allowed by the
32 * "legacy" unicode objects have been created through one of the deprecated
38 Unicode Type
41 These are the basic Unicode object types used for the Unicode implementation in
50    single Unicode characters, use :c:type:`Py_UCS4`.
62       whether you selected a "narrow" or "wide" Unicode version of Python at
70    These subtypes of :c:type:`PyObject` represent a Python Unicode object.  In
72    that deal with Unicode objects take and return :c:type:`PyObject` pointers.
79    This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
84 access internal read-only data of Unicode objects:
88    Return true if the object *o* is a Unicode object or an instance of a Unicode
94    Return true if the object *o* is a Unicode object, but not an instance of a
113    Return the length of the Unicode string, in code points.  *o* has to be a
114    Unicode object in the "canonical" representation (not checked).
145    bytes per character this Unicode object uses to store its data.  *o* has to
146    be a Unicode object in the "canonical" representation (not checked).
155    Return a void pointer to the raw unicode buffer.  *o* has to be a Unicode
184    Read a character from a Unicode object *o*, which must be in the "canonical"
209    Unicode object (not checked).
212       Part of the old-style Unicode API, please migrate to using
219    bytes.  *o* has to be a Unicode object (not checked).
222       Part of the old-style Unicode API, please migrate to using
234    a Unicode object (not checked).
244       Part of the old-style Unicode API, please migrate to using the
248 Unicode Character Properties
251 Unicode provides many different character properties. The most often needed ones
309    Nonprintable characters are those characters defined in the Unicode character
383 Creating and accessing Unicode strings
386 To create Unicode objects and access their basic sequence properties, use these
391    Create a new Unicode object.  *maxchar* should be the true maximum code point
395    This is the recommended way to allocate a new Unicode object.  Objects
404    Create a new Unicode object with the given *kind* (possible values are
414    Create a Unicode object from the char buffer *u*.  The bytes will be
426    Create a Unicode object from a UTF-8 encoded null-terminated char buffer
433    arguments, calculate the size of the resulting Python unicode string and return
507    | :attr:`%U`        | PyObject\*          | A unicode object.              |
509    | :attr:`%V`        | PyObject\*,         | A unicode object (which may be |
556    Decode an encoded object *obj* to a Unicode object.
564    All other objects, including Unicode objects, cause a :exc:`TypeError` to be
571 .. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
573    Return the length of the Unicode object, in code points.
584    Copy characters from one Unicode object into another.  This function performs
592 .. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
596    ``unicode[start:start+length]``.
607 .. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
611    :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable,
614    This function checks that *unicode* is a Unicode object, that the index is
621 .. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
623    Read a character from a string.  This function checks that *unicode* is a
624    Unicode object and the index is not out of bounds, in contrast to the macro
672    Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
678    Therefore, modification of the resulting Unicode object is only allowed when
689 .. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
691    Return a read-only pointer to the Unicode object's internal
706    Create a Unicode object by replacing all decimal digits in
711 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
722 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
724    Create a copy of a Unicode string ending with a null code point. Return *NULL*
736 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
746    Copy an instance of a Unicode subtype to a new true Unicode object if
747    necessary. If *obj* is already a true Unicode object (not a subtype),
750    Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
797 .. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
799    Encode a Unicode object to UTF-8 on Android, or to the current locale
803    *errors* is ``NULL``. Return a :class:`bytes` object. *unicode* cannot
902 .. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
904    Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
932    Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
938 .. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size)
940    Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*.  At most
951 .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
953    Convert the Unicode object to a wide character string. The output string
1010    Create a Unicode object by decoding *size* bytes of the encoded string *s*.
1017 .. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
1020    Encode a Unicode object and return the result as Python bytes object.
1022    name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
1032    parameters of the same name in the Unicode :meth:`~str.encode` method.  The codec
1049    Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
1062 .. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
1064    Encode a Unicode object using UTF-8 and return the result as Python bytes
1069 .. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
1071    Return a pointer to the UTF-8 encoding of the Unicode object, and
1080    This caches the UTF-8 representation of the string in the Unicode object, and
1090 .. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
1122    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
1134    not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1155 .. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
1165    Return a Python bytes object holding the UTF-32 encoded value of the Unicode
1172    If byteorder is ``0``, the output string will always start with the Unicode BOM
1195    corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
1207    not copied into the resulting Unicode string.  If ``*byteorder`` is ``-1`` or
1229 .. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
1239    Return a Python bytes object holding the UTF-16 encoded value of the Unicode
1246    If byteorder is ``0``, the output string will always start with the Unicode BOM
1268    Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
1298 Unicode-Escape Codecs
1301 These are the "Unicode Escape" codec APIs:
1307    Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
1311 .. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
1313    Encode a Unicode object using Unicode-Escape and return the result as a
1320    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
1328 Raw-Unicode-Escape Codecs
1331 These are the "Raw Unicode Escape" codec APIs:
1337    Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
1341 .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
1343    Encode a Unicode object using Raw-Unicode-Escape and return the result as
1351    Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
1363 These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
1369    Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
1373 .. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
1375    Encode a Unicode object using Latin-1 and return the result as Python bytes
1401    Create a Unicode object by decoding *size* bytes of the ASCII encoded string
1405 .. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
1407    Encode a Unicode object using ASCII and return the result as Python bytes
1438    Create a Unicode object by decoding *size* bytes of the encoded string *s*
1444    to Unicode strings, integers (which are then interpreted as Unicode
1451 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
1453    Encode a Unicode object using the given *mapping* object and return the
1457    The *mapping* object must map Unicode ordinal integers to bytes objects,
1476 The following codec API is special in that maps Unicode to Unicode.
1478 .. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
1481    Translate a Unicode object using the given *mapping* object and return the
1482    resulting Unicode object.  Return *NULL* if an exception was raised by the
1485    The *mapping* object must map Unicode ordinal integers to Unicode strings,
1486    integers (which are then interpreted as Unicode ordinals) or ``None``
1495    character *mapping* table to it and return the resulting Unicode object.
1514    Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
1527 .. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
1529    Encode a Unicode object using MBCS and return the result as Python bytes
1534 .. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *er…
1536    Encode the Unicode object using the specified code page and return a Python
1564 The following APIs are capable of handling Unicode objects and strings on input
1565 (we refer to them as strings in the descriptions) and return Unicode objects or
1573    Concat two strings giving a new Unicode string.
1578    Split a string giving a list of Unicode strings.  If *sep* is *NULL*, splitting
1586    Split a Unicode string at line breaks, returning a list of Unicode strings.
1595    resulting Unicode object.
1597    The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1611    Unicode string.
1658    return the resulting Unicode object. *maxcount* == ``-1`` means replace all
1673    Compare a unicode object, *uni*, with *string* and return ``-1``, ``0``, ``1`` for less
1683    Rich compare two unicode strings and return one of the following:
1704    *element* has to coerce to a one element Unicode string. ``-1`` is returned
1711    pointer variable pointing to a Python unicode string object.  If there is an
1724    :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string