uchar.h - OpenGrok cross reference for /external/icu/icu4c/source/common/unicode/uchar.h

Lines Matching full:unicode
1 // Copyright (C) 2016 and later: Unicode, Inc. and others.
2 // License & terms of use: http://www.unicode.org/copyright.html
18 *   8/19/1999   srl         Upgraded scripts to Unicode 3.0
28 #include "unicode/utypes.h"
33 /* Unicode version number                                                   */
36  * Unicode version number, default for the current ICU version.
37  * The actual Unicode Character Database (UCD) data is stored in uprops.dat
38  * and may be generated from UCD files from a different Unicode version.
39  * Call u_getUnicodeVersion to get the actual Unicode version of the data.
48  * \brief C API: Unicode Properties
50  * This C API provides low-level access to the Unicode Character Database.
54  * Unicode assigns each code point (not just assigned character) values for
60  * "About the Unicode Character Database" (http://www.unicode.org/ucd/)
72  * Instead, Unicode properties should be used directly.
84  * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
85  * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
112  * - u_isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
125 /** The lowest Unicode code point value. Code points are non-negative. @stable ICU 2.0 */
129  * The highest Unicode code point value (scalar value) according to
130  * The Unicode Standard. This is a 21-bit value (20.1 bits, rounded up).
145  * Selection constants for Unicode properties.
147  * one of the Unicode properties.
149  * The properties APIs are intended to reflect Unicode properties as defined
150  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
151  * For details about the properties see http://www.unicode.org/ucd/ .
152  * For names of Unicode properties see the UCD file PropertyAliases.txt.
154  * Important: If ICU is built with UCD files from Unicode versions below, e.g., 3.2,
155  * then properties marked with "new in Unicode 3.2" are not or not fully available.
167      *     UCHAR_<Unicode property name>=<integer>,
178     /** First constant for binary Unicode properties. @stable ICU 2.1 */
193     /** Binary property Default_Ignorable_Code_Point (new in Unicode 3.2).
197     /** Binary property Deprecated (new in Unicode 3.2).
211     /** Binary property Grapheme_Base (new in Unicode 3.2).
215     /** Binary property Grapheme_Extend (new in Unicode 3.2).
219     /** Binary property Grapheme_Link (new in Unicode 3.2).
240     /** Binary property IDS_Binary_Operator (new in Unicode 3.2).
244     /** Binary property IDS_Trinary_Operator (new in Unicode 3.2).
251     /** Binary property Logical_Order_Exception (new in Unicode 3.2).
266     /** Binary property Radical (new in Unicode 3.2).
270     /** Binary property Soft_Dotted (new in Unicode 3.2).
279     /** Binary property Unified_Ideograph (new in Unicode 3.2).
301     /** Binary property STerm (new in Unicode 4.0.1).
303         (http://www.unicode.org/reports/tr29/)
306     /** Binary property Variation_Selector (new in Unicode 4.0.1).
342         Unicode normalization and combining character sequences.
351     /** Binary property Pattern_Syntax (new in Unicode 4.1).
353         (http://www.unicode.org/reports/tr31/)
356     /** Binary property Pattern_White_Space (new in Unicode 4.1).
358         (http://www.unicode.org/reports/tr31/)
405      * See http://www.unicode.org/reports/tr51/#Emoji_Properties
412      * See http://www.unicode.org/reports/tr51/#Emoji_Properties
419      * See http://www.unicode.org/reports/tr51/#Emoji_Properties
426      * See http://www.unicode.org/reports/tr51/#Emoji_Properties
434      * One more than the last constant for binary Unicode properties.
443     /** First constant for enumerated/integer Unicode properties. @stable ICU 2.2 */
455         See http://www.unicode.org/reports/tr11/
476     /** Enumerated property Hangul_Syllable_Type, new in Unicode 4.
495         see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
502         see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
505     /** Enumerated property Grapheme_Cluster_Break (new in Unicode 4.1).
507         (http://www.unicode.org/reports/tr29/)
510     /** Enumerated property Sentence_Break (new in Unicode 4.1).
512         (http://www.unicode.org/reports/tr29/)
515     /** Enumerated property Word_Break (new in Unicode 4.1).
517         (http://www.unicode.org/reports/tr29/)
520     /** Enumerated property Bidi_Paired_Bracket_Type (new in Unicode 6.3).
521         Used in UAX #9: Unicode Bidirectional Algorithm
522         (http://www.unicode.org/reports/tr9/)
527      * One more than the last constant for enumerated/integer Unicode properties.
542     /** First constant for bit-mask Unicode properties. @stable ICU 2.4 */
546      * One more than the last constant for bit-mask Unicode properties.
555     /** First constant for double Unicode properties. @stable ICU 2.4 */
559      * One more than the last constant for double Unicode properties.
568     /** First constant for string Unicode properties. @stable ICU 2.4 */
612     /** String property Bidi_Paired_Bracket (new in Unicode 6.3).
617      * One more than the last constant for string Unicode properties.
623     /** Miscellaneous property Script_Extensions (new in Unicode 6.0).
625         For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
629     /** First constant for Unicode properties with unusual value types. @stable ICU 4.6 */
633      * One more than the last constant for Unicode properties with unusual value types.
644  * Data for enumerated Unicode general category types.
645  * See http://www.unicode.org/Public/UNIDATA/UnicodeData.html .
653      *     / ** <Unicode 2-letter General_Category value> comment... * /
722      * http://www.unicode.org/policies/stability_policy.html#Property_Value
730  * U_GC_XX_MASK constants are bit flags corresponding to Unicode
849      *     / ** <Unicode 1..3-letter Bidi_Class value> comment... * /
920      *     U_BPT_<Unicode Bidi_Paired_Bracket_Type value name>
941  * Constants for Unicode blocks, see the Unicode Data file Blocks.txt
948      *     UBLOCK_<Unicode Block value name> = <integer>,
951     /** New No_Block value in Unicode 4. @stable ICU 2.6 */
976      * Unicode 3.2 renames this block to "Greek and Coptic".
1084      * Unicode 3.2 renames this block to "Combining Diacritical Marks for Symbols".
1193      * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1195      * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1203      * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1205      * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1239     /* New blocks in Unicode 3.1 */
1260     /* New blocks in Unicode 3.2 */
1265      * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement".
1296     /* New blocks in Unicode 4 */
1329     /* New blocks in Unicode 4.1 */
1372     /* New blocks in Unicode 5.0 */
1393     /* New blocks in Unicode 5.1 */
1430     /* New blocks in Unicode 5.2 */
1485     /* New blocks in Unicode 6.0 */
1512     /* New blocks in Unicode 6.1 */
1537     /* New blocks in Unicode 7.0 */
1604     /* New blocks in Unicode 8.0 */
1627     /* New blocks in Unicode 9.0 */
1680      *     U_EA_<Unicode East_Asian_Width value name>
1703  * Unicode character; or the name that was defined in
1704  * Unicode version 1.0, before the Unicode standard merged
1706  * Unicode code point a unique name.
1712     /** Unicode character name (Name property). @stable ICU 2.0 */
1740  * Unicode allows for additional names, beyond the long and short
1770      *     U_DT_<Unicode Decomposition_Type value name>
1812      *     U_JT_<Unicode Joining_Type value name>
1842      *     U_JG_<Unicode Joining_Group value name>
1956      *     U_GCB_<Unicode Grapheme_Cluster_Break value name>
1970     U_GCB_SPACING_MARK = 10,    /*[SM]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1974     U_GCB_REGIONAL_INDICATOR = 12,  /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1976     U_GCB_E_BASE = 13,          /*[EB]*/ /* from here on: new in Unicode 9.0/ICU 58 */
2007      *     U_WB_<Unicode Word_Break value name>
2019     U_WB_CR = 8,                /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
2029     U_WB_REGIONAL_INDICATOR = 13,   /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
2031     U_WB_HEBREW_LETTER = 14,    /*[HL]*/ /* from here on: new in Unicode 6.3/ICU 52 */
2037     U_WB_E_BASE = 17,           /*[EB]*/ /* from here on: new in Unicode 9.0/ICU 58 */
2067      *     U_SB_<Unicode Sentence_Break value name>
2081     U_SB_CR = 11,               /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
2106      *     U_LB_<Unicode Line_Break value name>
2124     /** Renamed from the misspelled "inseperable" in Unicode 4.0.1/ICU 3.0 @stable ICU 3.0 */
2141     U_LB_NEXT_LINE = 29,         /*[NL]*/ /* from here on: new in Unicode 4/ICU 2.6 */
2145     U_LB_H2 = 31,                /*[H2]*/ /* from here on: new in Unicode 4.1/ICU 3.4 */
2155     U_LB_CLOSE_PARENTHESIS = 36, /*[CP]*/ /* new in Unicode 5.2/ICU 4.4 */
2157     U_LB_CONDITIONAL_JAPANESE_STARTER = 37,/*[CJ]*/ /* new in Unicode 6.1/ICU 49 */
2159     U_LB_HEBREW_LETTER = 38,     /*[HL]*/ /* new in Unicode 6.1/ICU 49 */
2161     U_LB_REGIONAL_INDICATOR = 39,/*[RI]*/ /* new in Unicode 6.2/ICU 50 */
2163     U_LB_E_BASE = 40,            /*[EB]*/ /* from here on: new in Unicode 9.0/ICU 58 */
2189      *     U_NT_<Unicode Numeric_Type value name>
2217      *     U_HST_<Unicode Hangul_Syllable_Type value name>
2238  * Check a binary Unicode property for a code point.
2240  * Unicode, especially in version 3.2, defines many more properties than the
2243  * The properties APIs are intended to reflect Unicode properties as defined
2244  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
2245  * For details about the properties see http://www.unicode.org/ucd/ .
2246  * For names of Unicode properties see the UCD file PropertyAliases.txt.
2248  * Important: If ICU is built with UCD files from Unicode versions below 3.2,
2249  * then properties marked with "new in Unicode 3.2" are not or not fully available.
2254  * @return TRUE or FALSE according to the binary Unicode property value for c.
2255  *         Also FALSE if 'which' is out of bounds or if the Unicode version
2267  * Check if a code point has the Alphabetic Unicode property.
2271  * @return true if the code point has the Alphabetic Unicode property, false otherwise
2282  * Check if a code point has the Lowercase Unicode property.
2286  * @return true if the code point has the Lowercase Unicode property, false otherwise
2297  * Check if a code point has the Uppercase Unicode property.
2301  * @return true if the code point has the Uppercase Unicode property, false otherwise
2312  * Check if a code point has the White_Space Unicode property.
2320  * @return true if the code point has the White_Space Unicode property, false otherwise.
2333  * Get the property value for an enumerated or integer Unicode property for a code point.
2336  * Unicode, especially in version 3.2, defines many more properties than the
2339  * The properties APIs are intended to reflect Unicode properties as defined
2340  * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
2341  * For details about the properties see http://www.unicode.org/ .
2342  * For names of Unicode properties see the UCD file PropertyAliases.txt.
2357  *         Returns 0 or 1 (for FALSE/TRUE) for binary Unicode properties.
2359  *         Returns 0 if 'which' is out of bounds or if the Unicode version
2373  * Get the minimum value for an enumerated/integer/binary Unicode property.
2380  * @return Minimum value returned by u_getIntPropertyValue for a Unicode property.
2394  * Get the maximum value for an enumerated/integer/binary Unicode property.
2398  * Examples for min/max values (for Unicode 3.2):
2409  * @return Maximum value returned by u_getIntPropertyValue for a Unicode property.
2423  * Get the numeric value for a Unicode code point as defined in the
2424  * Unicode Character Database.
2429  * For characters without any numeric values in the Unicode Character Database,
2431  * Note: This is different from the Unicode Standard which specifies NaN as the default value.
2529  * Beginning with Unicode 4, this is the same as
2652  * TRUE for Unicode White_Space characters except for "vertical space controls"
2748  * - It is a Unicode Separator character (categories "Z" = "Zs" or "Zl" or "Zp"), but is not
2762  * the exact same results because of the Unicode version
2765  * Note: Unicode 4.0.1 changed U+200B ZERO WIDTH SPACE from a Space Separator (Zs)
2767  * See http://www.unicode.org/versions/Unicode4.0.1/
2845  * Note that this is different from the Unicode definition in
2863  * which is used in the Unicode bidirectional algorithm
2864  * (UAX #9 http://www.unicode.org/reports/tr9/).
2901  * sometimes need a "poor man's" mapping to another Unicode
2909  * @return another Unicode code point that may serve as a mirror-image
2924  * See http://www.unicode.org/reports/tr9/
2971  * with the same Unicode general category ("character type").
2989  * Enumerate efficiently all code points with their Unicode general categories.
2997  * The Unicode Standard guarantees that the numeric value of the type is 0..31.
3033  * Unicode 4 explicitly assigns Han number characters the Numeric_Type
3038  * for complete numeric Unicode properties.
3051  * Returns the Unicode allocation block that contains the character.
3063  * Retrieve the name of a Unicode character.
3066  * in Unicode version 1.0.
3069  * Unicode 1.0 names are only retrieved if they are different from the modern
3103  * The Unicode ISO_Comment property is deprecated and has no values.
3125  * Find a Unicode character by its name and return its code point value.
3129  * A Unicode 1.0 name is matched only if it differs from the modern name.
3130  * Unicode names are all uppercase. Extended names are lowercase followed
3136  * @return The Unicode value of the code point with the given name,
3151  * for each Unicode character with the code point value and
3156  * @param code The Unicode code point for the character with this name.
3173  * Enumerate all assigned Unicode characters between the start and limit
3176  * For Unicode 1.0 names, only those are enumerated that differ from the
3201  * Return the Unicode name for a given property, as given in the
3202  * Unicode database file PropertyAliases.txt.
3214  *         have a short name, but some do not.  Unicode allows for
3237  * in the Unicode database file PropertyAliases.txt.  Short, long, and
3258  * Return the Unicode name for a given property value, as given in the
3259  * Unicode database file PropertyValueAliases.txt.
3287  *         a short name, but some do not.  Unicode allows for
3311  * specified in the Unicode database file PropertyValueAliases.txt.
3346  * first character in an identifier according to Unicode
3347  * (The Unicode Standard, Version 3.0, chapter 5.16 Identifiers).
3373  * Almost the same as Unicode's ID_Continue (UCHAR_ID_CONTINUE)
3374  * except that Unicode recommends to ignore Cf which is less than
3398  * Note that Unicode just recommends to ignore Cf (format controls).
3535  * Before Unicode 3.2, CaseFolding.txt contains mappings marked with 'I' that
3539  * Unicode 3.2 CaseFolding.txt instead contains mappings marked with 'T' that
3646  * The "age" is the Unicode version when the code point was first
3654  * @param versionArray The Unicode version number array, to be filled in.
3662  * Gets the Unicode version information.
3664  * for the Unicode standard that is currently used by ICU.
3665  * For example, Unicode version 3.1.1 is represented as an array with
3669  *                     the Unicode version number
3678  * See Unicode Standard Annex #15 for details, search for "FC_NFKC_Closure"
3679  * or for "FNC": http://www.unicode.org/reports/tr15/