Lines Matching full:unicode
5 Unicode Objects and Codecs
11 Unicode Objects
14 Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
16 of Unicode characters while staying memory efficient. There are special cases
18 points must be below 1114112 (which is the full Unicode range).
21 in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
25 Due to the transition between the old APIs and the new APIs, unicode objects
28 * "canonical" unicode objects are all objects created by a non-deprecated
29 unicode API. They use the most efficient representation allowed by the
32 * "legacy" unicode objects have been created through one of the deprecated
38 Unicode Type
41 These are the basic Unicode object types used for the Unicode implementation in
50 single Unicode characters, use :c:type:`Py_UCS4`.
62 whether you selected a "narrow" or "wide" Unicode version of Python at
70 These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
72 that deal with Unicode objects take and return :c:type:`PyObject` pointers.
79 This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
84 access internal read-only data of Unicode objects:
88 Return true if the object *o* is a Unicode object or an instance of a Unicode
94 Return true if the object *o* is a Unicode object, but not an instance of a
113 Return the length of the Unicode string, in code points. *o* has to be a
114 Unicode object in the "canonical" representation (not checked).
145 bytes per character this Unicode object uses to store its data. *o* has to
146 be a Unicode object in the "canonical" representation (not checked).
155 Return a void pointer to the raw unicode buffer. *o* has to be a Unicode
184 Read a character from a Unicode object *o*, which must be in the "canonical"
209 Unicode object (not checked).
212 Part of the old-style Unicode API, please migrate to using
219 bytes. *o* has to be a Unicode object (not checked).
222 Part of the old-style Unicode API, please migrate to using
234 a Unicode object (not checked).
244 Part of the old-style Unicode API, please migrate to using the
248 Unicode Character Properties
251 Unicode provides many different character properties. The most often needed ones
309 Nonprintable characters are those characters defined in the Unicode character
383 Creating and accessing Unicode strings
386 To create Unicode objects and access their basic sequence properties, use these
391 Create a new Unicode object. *maxchar* should be the true maximum code point
395 This is the recommended way to allocate a new Unicode object. Objects
404 Create a new Unicode object with the given *kind* (possible values are
414 Create a Unicode object from the char buffer *u*. The bytes will be
426 Create a Unicode object from a UTF-8 encoded null-terminated char buffer
433 arguments, calculate the size of the resulting Python unicode string and return
507 | :attr:`%U` | PyObject\* | A unicode object. |
509 | :attr:`%V` | PyObject\*, | A unicode object (which may be |
556 Decode an encoded object *obj* to a Unicode object.
564 All other objects, including Unicode objects, cause a :exc:`TypeError` to be
571 .. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
573 Return the length of the Unicode object, in code points.
584 Copy characters from one Unicode object into another. This function performs
592 .. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
596 ``unicode[start:start+length]``.
607 .. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
611 :c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
614 This function checks that *unicode* is a Unicode object, that the index is
621 .. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
623 Read a character from a string. This function checks that *unicode* is a
624 Unicode object and the index is not out of bounds, in contrast to the macro
672 Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
678 Therefore, modification of the resulting Unicode object is only allowed when
689 .. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
691 Return a read-only pointer to the Unicode object's internal
706 Create a Unicode object by replacing all decimal digits in
711 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
722 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
724 Create a copy of a Unicode string ending with a null code point. Return *NULL*
736 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
746 Copy an instance of a Unicode subtype to a new true Unicode object if
747 necessary. If *obj* is already a true Unicode object (not a subtype),
750 Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
797 .. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
799 Encode a Unicode object to UTF-8 on Android, or to the current locale
803 *errors* is ``NULL``. Return a :class:`bytes` object. *unicode* cannot
902 .. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
904 Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
932 Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
938 .. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size)
940 Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
951 .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
953 Convert the Unicode object to a wide character string. The output string
1010 Create a Unicode object by decoding *size* bytes of the encoded string *s*.
1017 .. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
1020 Encode a Unicode object and return the result as Python bytes object.
1022 name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
1032 parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
1049 Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
1062 .. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
1064 Encode a Unicode object using UTF-8 and return the result as Python bytes
1069 .. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
1071 Return a pointer to the UTF-8 encoding of the Unicode object, and
1080 This caches the UTF-8 representation of the string in the Unicode object, and
1090 .. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
1122 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
1134 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
1155 .. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
1165 Return a Python bytes object holding the UTF-32 encoded value of the Unicode
1172 If byteorder is ``0``, the output string will always start with the Unicode BOM
1195 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
1207 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
1229 .. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
1239 Return a Python bytes object holding the UTF-16 encoded value of the Unicode
1246 If byteorder is ``0``, the output string will always start with the Unicode BOM
1268 Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
1298 Unicode-Escape Codecs
1301 These are the "Unicode Escape" codec APIs:
1307 Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
1311 .. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
1313 Encode a Unicode object using Unicode-Escape and return the result as a
1320 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
1328 Raw-Unicode-Escape Codecs
1331 These are the "Raw Unicode Escape" codec APIs:
1337 Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
1341 .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
1343 Encode a Unicode object using Raw-Unicode-Escape and return the result as
1351 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
1363 These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
1369 Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
1373 .. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
1375 Encode a Unicode object using Latin-1 and return the result as Python bytes
1401 Create a Unicode object by decoding *size* bytes of the ASCII encoded string
1405 .. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
1407 Encode a Unicode object using ASCII and return the result as Python bytes
1438 Create a Unicode object by decoding *size* bytes of the encoded string *s*
1444 to Unicode strings, integers (which are then interpreted as Unicode
1451 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
1453 Encode a Unicode object using the given *mapping* object and return the
1457 The *mapping* object must map Unicode ordinal integers to bytes objects,
1476 The following codec API is special in that maps Unicode to Unicode.
1478 .. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
1481 Translate a Unicode object using the given *mapping* object and return the
1482 resulting Unicode object. Return *NULL* if an exception was raised by the
1485 The *mapping* object must map Unicode ordinal integers to Unicode strings,
1486 integers (which are then interpreted as Unicode ordinals) or ``None``
1495 character *mapping* table to it and return the resulting Unicode object.
1514 Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
1527 .. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
1529 Encode a Unicode object using MBCS and return the result as Python bytes
1534 .. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *er…
1536 Encode the Unicode object using the specified code page and return a Python
1564 The following APIs are capable of handling Unicode objects and strings on input
1565 (we refer to them as strings in the descriptions) and return Unicode objects or
1573 Concat two strings giving a new Unicode string.
1578 Split a string giving a list of Unicode strings. If *sep* is *NULL*, splitting
1586 Split a Unicode string at line breaks, returning a list of Unicode strings.
1595 resulting Unicode object.
1597 The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1611 Unicode string.
1658 return the resulting Unicode object. *maxcount* == ``-1`` means replace all
1673 Compare a unicode object, *uni*, with *string* and return ``-1``, ``0``, ``1`` for less
1683 Rich compare two unicode strings and return one of the following:
1704 *element* has to coerce to a one element Unicode string. ``-1`` is returned
1711 pointer variable pointing to a Python unicode string object. If there is an
1724 :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string