uchar.h - OpenGrok cross reference for /external/icu/icu4c/source/common/unicode/uchar.h

Lines Matching full:unicode
16 *   8/19/1999   srl         Upgraded scripts to Unicode 3.0
26 #include "unicode/utypes.h"
31 /* Unicode version number                                                   */
34  * Unicode version number, default for the current ICU version.
35  * The actual Unicode Character Database (UCD) data is stored in uprops.dat
36  * and may be generated from UCD files from a different Unicode version.
37  * Call u_getUnicodeVersion to get the actual Unicode version of the data.
46  * \brief C API: Unicode Properties
48  * This C API provides low-level access to the Unicode Character Database.
52  * Unicode assigns each code point (not just assigned character) values for
58  * "About the Unicode Character Database" (http://www.unicode.org/ucd/)
70  * Instead, Unicode properties should be used directly.
82  * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
83  * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
110  * - u_isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
123 /** The lowest Unicode code point value. Code points are non-negative. @stable ICU 2.0 */
127  * The highest Unicode code point value (scalar value) according to
128  * The Unicode Standard. This is a 21-bit value (20.1 bits, rounded up).
143  * Selection constants for Unicode properties.
145  * one of the Unicode properties.
147  * The properties APIs are intended to reflect Unicode properties as defined
148  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
149  * For details about the properties see http://www.unicode.org/ucd/ .
150  * For names of Unicode properties see the UCD file PropertyAliases.txt.
152  * Important: If ICU is built with UCD files from Unicode versions below, e.g., 3.2,
153  * then properties marked with "new in Unicode 3.2" are not or not fully available.
165      *     UCHAR_<Unicode property name>=<integer>,
176     /** First constant for binary Unicode properties. @stable ICU 2.1 */
191     /** Binary property Default_Ignorable_Code_Point (new in Unicode 3.2).
195     /** Binary property Deprecated (new in Unicode 3.2).
209     /** Binary property Grapheme_Base (new in Unicode 3.2).
213     /** Binary property Grapheme_Extend (new in Unicode 3.2).
217     /** Binary property Grapheme_Link (new in Unicode 3.2).
238     /** Binary property IDS_Binary_Operator (new in Unicode 3.2).
242     /** Binary property IDS_Trinary_Operator (new in Unicode 3.2).
249     /** Binary property Logical_Order_Exception (new in Unicode 3.2).
264     /** Binary property Radical (new in Unicode 3.2).
268     /** Binary property Soft_Dotted (new in Unicode 3.2).
277     /** Binary property Unified_Ideograph (new in Unicode 3.2).
299     /** Binary property STerm (new in Unicode 4.0.1).
301         (http://www.unicode.org/reports/tr29/)
304     /** Binary property Variation_Selector (new in Unicode 4.0.1).
340         Unicode normalization and combining character sequences.
349     /** Binary property Pattern_Syntax (new in Unicode 4.1).
351         (http://www.unicode.org/reports/tr31/)
354     /** Binary property Pattern_White_Space (new in Unicode 4.1).
356         (http://www.unicode.org/reports/tr31/)
400     /** One more than the last constant for binary Unicode properties. @stable ICU 2.1 */
406     /** First constant for enumerated/integer Unicode properties. @stable ICU 2.2 */
418         See http://www.unicode.org/reports/tr11/
439     /** Enumerated property Hangul_Syllable_Type, new in Unicode 4.
458         see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
465         see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
468     /** Enumerated property Grapheme_Cluster_Break (new in Unicode 4.1).
470         (http://www.unicode.org/reports/tr29/)
473     /** Enumerated property Sentence_Break (new in Unicode 4.1).
475         (http://www.unicode.org/reports/tr29/)
478     /** Enumerated property Word_Break (new in Unicode 4.1).
480         (http://www.unicode.org/reports/tr29/)
483     /** Enumerated property Bidi_Paired_Bracket_Type (new in Unicode 6.3).
484         Used in UAX #9: Unicode Bidirectional Algorithm
485         (http://www.unicode.org/reports/tr9/)
488 …  /** One more than the last constant for enumerated/integer Unicode properties. @stable ICU 2.2 */
500     /** First constant for bit-mask Unicode properties. @stable ICU 2.4 */
502     /** One more than the last constant for bit-mask Unicode properties. @stable ICU 2.4 */
508     /** First constant for double Unicode properties. @stable ICU 2.4 */
510     /** One more than the last constant for double Unicode properties. @stable ICU 2.4 */
516     /** First constant for string Unicode properties. @stable ICU 2.4 */
560     /** String property Bidi_Paired_Bracket (new in Unicode 6.3).
563     /** One more than the last constant for string Unicode properties. @stable ICU 2.4 */
566     /** Miscellaneous property Script_Extensions (new in Unicode 6.0).
568         For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
572     /** First constant for Unicode properties with unusual value types. @stable ICU 4.6 */
574     /** One more than the last constant for Unicode properties with unusual value types.
582  * Data for enumerated Unicode general category types.
583  * See http://www.unicode.org/Public/UNIDATA/UnicodeData.html .
591      *     / ** <Unicode 2-letter General_Category value> comment... * /
662  * U_GC_XX_MASK constants are bit flags corresponding to Unicode
781      *     / ** <Unicode 1..3-letter Bidi_Class value> comment... * /
845      *     U_BPT_<Unicode Bidi_Paired_Bracket_Type value name>
859  * Constants for Unicode blocks, see the Unicode Data file Blocks.txt
866      *     UBLOCK_<Unicode Block value name> = <integer>,
869     /** New No_Block value in Unicode 4. @stable ICU 2.6 */
894      * Unicode 3.2 renames this block to "Greek and Coptic".
1002      * Unicode 3.2 renames this block to "Combining Diacritical Marks for Symbols".
1111      * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1113      * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1121      * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1123      * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1157     /* New blocks in Unicode 3.1 */
1178     /* New blocks in Unicode 3.2 */
1183      * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement".
1214     /* New blocks in Unicode 4 */
1247     /* New blocks in Unicode 4.1 */
1290     /* New blocks in Unicode 5.0 */
1311     /* New blocks in Unicode 5.1 */
1348     /* New blocks in Unicode 5.2 */
1403     /* New blocks in Unicode 6.0 */
1430     /* New blocks in Unicode 6.1 */
1455     /* New blocks in Unicode 7.0 */
1522     /* New blocks in Unicode 8.0 */
1566      *     U_EA_<Unicode East_Asian_Width value name>
1581  * Unicode character; or the name that was defined in
1582  * Unicode version 1.0, before the Unicode standard merged
1584  * Unicode code point a unique name.
1590     /** Unicode character name (Name property). @stable ICU 2.0 */
1613  * Unicode allows for additional names, beyond the long and short
1637      *     U_DT_<Unicode Decomposition_Type value name>
1671      *     U_JT_<Unicode Joining_Type value name>
1693      *     U_JG_<Unicode Joining_Group value name>
1796      *     U_GCB_<Unicode Grapheme_Cluster_Break value name>
1809     U_GCB_SPACING_MARK = 10,    /*[SM]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1811     U_GCB_REGIONAL_INDICATOR = 12,  /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1826      *     U_WB_<Unicode Word_Break value name>
1837     U_WB_CR = 8,                /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1842     U_WB_REGIONAL_INDICATOR = 13,   /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1843     U_WB_HEBREW_LETTER = 14,    /*[HL]*/ /* from here on: new in Unicode 6.3/ICU 52 */
1859      *     U_SB_<Unicode Sentence_Break value name>
1873     U_SB_CR = 11,               /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1890      *     U_LB_<Unicode Line_Break value name>
1908     /** Renamed from the misspelled "inseperable" in Unicode 4.0.1/ICU 3.0 @stable ICU 3.0 */
1924     U_LB_NEXT_LINE = 29,         /*[NL]*/ /* from here on: new in Unicode 4/ICU 2.6 */
1926     U_LB_H2 = 31,                /*[H2]*/ /* from here on: new in Unicode 4.1/ICU 3.4 */
1931     U_LB_CLOSE_PARENTHESIS = 36, /*[CP]*/ /* new in Unicode 5.2/ICU 4.4 */
1932     U_LB_CONDITIONAL_JAPANESE_STARTER = 37,/*[CJ]*/ /* new in Unicode 6.1/ICU 49 */
1933     U_LB_HEBREW_LETTER = 38,     /*[HL]*/ /* new in Unicode 6.1/ICU 49 */
1934     U_LB_REGIONAL_INDICATOR = 39,/*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1948      *     U_NT_<Unicode Numeric_Type value name>
1968      *     U_HST_<Unicode Hangul_Syllable_Type value name>
1981  * Check a binary Unicode property for a code point.
1983  * Unicode, especially in version 3.2, defines many more properties than the
1986  * The properties APIs are intended to reflect Unicode properties as defined
1987  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
1988  * For details about the properties see http://www.unicode.org/ucd/ .
1989  * For names of Unicode properties see the UCD file PropertyAliases.txt.
1991  * Important: If ICU is built with UCD files from Unicode versions below 3.2,
1992  * then properties marked with "new in Unicode 3.2" are not or not fully available.
1997  * @return TRUE or FALSE according to the binary Unicode property value for c.
1998  *         Also FALSE if 'which' is out of bounds or if the Unicode version
2010  * Check if a code point has the Alphabetic Unicode property.
2014  * @return true if the code point has the Alphabetic Unicode property, false otherwise
2025  * Check if a code point has the Lowercase Unicode property.
2029  * @return true if the code point has the Lowercase Unicode property, false otherwise
2040  * Check if a code point has the Uppercase Unicode property.
2044  * @return true if the code point has the Uppercase Unicode property, false otherwise
2055  * Check if a code point has the White_Space Unicode property.
2063  * @return true if the code point has the White_Space Unicode property, false otherwise.
2076  * Get the property value for an enumerated or integer Unicode property for a code point.
2079  * Unicode, especially in version 3.2, defines many more properties than the
2082  * The properties APIs are intended to reflect Unicode properties as defined
2083  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
2084  * For details about the properties see http://www.unicode.org/ .
2085  * For names of Unicode properties see the UCD file PropertyAliases.txt.
2100  *         Returns 0 or 1 (for FALSE/TRUE) for binary Unicode properties.
2102  *         Returns 0 if 'which' is out of bounds or if the Unicode version
2116  * Get the minimum value for an enumerated/integer/binary Unicode property.
2123  * @return Minimum value returned by u_getIntPropertyValue for a Unicode property.
2137  * Get the maximum value for an enumerated/integer/binary Unicode property.
2141  * Examples for min/max values (for Unicode 3.2):
2152  * @return Maximum value returned by u_getIntPropertyValue for a Unicode property.
2166  * Get the numeric value for a Unicode code point as defined in the
2167  * Unicode Character Database.
2172  * For characters without any numeric values in the Unicode Character Database,
2174  * Note: This is different from the Unicode Standard which specifies NaN as the default value.
2272  * Beginning with Unicode 4, this is the same as
2395  * TRUE for Unicode White_Space characters except for "vertical space controls"
2491  * - It is a Unicode Separator character (categories "Z" = "Zs" or "Zl" or "Zp"), but is not
2505  * the exact same results because of the Unicode version
2508  * Note: Unicode 4.0.1 changed U+200B ZERO WIDTH SPACE from a Space Separator (Zs)
2510  * See http://www.unicode.org/versions/Unicode4.0.1/
2588  * Note that this is different from the Unicode definition in
2606  * which is used in the Unicode bidirectional algorithm
2607  * (UAX #9 http://www.unicode.org/reports/tr9/).
2644  * sometimes need a "poor man's" mapping to another Unicode
2652  * @return another Unicode code point that may serve as a mirror-image
2667  * See http://www.unicode.org/reports/tr9/
2714  * with the same Unicode general category ("character type").
2732  * Enumerate efficiently all code points with their Unicode general categories.
2740  * The Unicode Standard guarantees that the numeric value of the type is 0..31.
2776  * Unicode 4 explicitly assigns Han number characters the Numeric_Type
2781  * for complete numeric Unicode properties.
2794  * Returns the Unicode allocation block that contains the character.
2806  * Retrieve the name of a Unicode character.
2809  * in Unicode version 1.0.
2812  * Unicode 1.0 names are only retrieved if they are different from the modern
2846  * The Unicode ISO_Comment property is deprecated and has no values.
2868  * Find a Unicode character by its name and return its code point value.
2872  * A Unicode 1.0 name is matched only if it differs from the modern name.
2873  * Unicode names are all uppercase. Extended names are lowercase followed
2879  * @return The Unicode value of the code point with the given name,
2894  * for each Unicode character with the code point value and
2899  * @param code The Unicode code point for the character with this name.
2916  * Enumerate all assigned Unicode characters between the start and limit
2919  * For Unicode 1.0 names, only those are enumerated that differ from the
2944  * Return the Unicode name for a given property, as given in the
2945  * Unicode database file PropertyAliases.txt.
2957  *         have a short name, but some do not.  Unicode allows for
2980  * in the Unicode database file PropertyAliases.txt.  Short, long, and
3001  * Return the Unicode name for a given property value, as given in the
3002  * Unicode database file PropertyValueAliases.txt.
3030  *         a short name, but some do not.  Unicode allows for
3054  * specified in the Unicode database file PropertyValueAliases.txt.
3089  * first character in an identifier according to Unicode
3090  * (The Unicode Standard, Version 3.0, chapter 5.16 Identifiers).
3116  * Almost the same as Unicode's ID_Continue (UCHAR_ID_CONTINUE)
3117  * except that Unicode recommends to ignore Cf which is less than
3141  * Note that Unicode just recommends to ignore Cf (format controls).
3278  * Before Unicode 3.2, CaseFolding.txt contains mappings marked with 'I' that
3282  * Unicode 3.2 CaseFolding.txt instead contains mappings marked with 'T' that
3389  * The "age" is the Unicode version when the code point was first
3397  * @param versionArray The Unicode version number array, to be filled in.
3405  * Gets the Unicode version information.
3407  * for the Unicode standard that is currently used by ICU.
3408  * For example, Unicode version 3.1.1 is represented as an array with
3412  *                     the Unicode version number
3421  * See Unicode Standard Annex #15 for details, search for "FC_NFKC_Closure"
3422  * or for "FNC": http://www.unicode.org/reports/tr15/