Lines Matching full:unicode

16 *   8/19/1999   srl         Upgraded scripts to Unicode 3.0
26 #include "unicode/utypes.h"
31 /* Unicode version number */
34 * Unicode version number, default for the current ICU version.
35 * The actual Unicode Character Database (UCD) data is stored in uprops.dat
36 * and may be generated from UCD files from a different Unicode version.
37 * Call u_getUnicodeVersion to get the actual Unicode version of the data.
46 * \brief C API: Unicode Properties
48 * This C API provides low-level access to the Unicode Character Database.
52 * Unicode assigns each code point (not just assigned character) values for
58 * "About the Unicode Character Database" (http://www.unicode.org/ucd/)
70 * Instead, Unicode properties should be used directly.
82 * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
83 * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
110 * - u_isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
123 /** The lowest Unicode code point value. Code points are non-negative. @stable ICU 2.0 */
127 * The highest Unicode code point value (scalar value) according to
128 * The Unicode Standard. This is a 21-bit value (20.1 bits, rounded up).
143 * Selection constants for Unicode properties.
145 * one of the Unicode properties.
147 * The properties APIs are intended to reflect Unicode properties as defined
148 * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
149 * For details about the properties see http://www.unicode.org/ucd/ .
150 * For names of Unicode properties see the UCD file PropertyAliases.txt.
152 * Important: If ICU is built with UCD files from Unicode versions below, e.g., 3.2,
153 * then properties marked with "new in Unicode 3.2" are not or not fully available.
165 * UCHAR_<Unicode property name>=<integer>,
176 /** First constant for binary Unicode properties. @stable ICU 2.1 */
191 /** Binary property Default_Ignorable_Code_Point (new in Unicode 3.2).
195 /** Binary property Deprecated (new in Unicode 3.2).
209 /** Binary property Grapheme_Base (new in Unicode 3.2).
213 /** Binary property Grapheme_Extend (new in Unicode 3.2).
217 /** Binary property Grapheme_Link (new in Unicode 3.2).
238 /** Binary property IDS_Binary_Operator (new in Unicode 3.2).
242 /** Binary property IDS_Trinary_Operator (new in Unicode 3.2).
249 /** Binary property Logical_Order_Exception (new in Unicode 3.2).
264 /** Binary property Radical (new in Unicode 3.2).
268 /** Binary property Soft_Dotted (new in Unicode 3.2).
277 /** Binary property Unified_Ideograph (new in Unicode 3.2).
299 /** Binary property STerm (new in Unicode 4.0.1).
301 (http://www.unicode.org/reports/tr29/)
304 /** Binary property Variation_Selector (new in Unicode 4.0.1).
340 Unicode normalization and combining character sequences.
349 /** Binary property Pattern_Syntax (new in Unicode 4.1).
351 (http://www.unicode.org/reports/tr31/)
354 /** Binary property Pattern_White_Space (new in Unicode 4.1).
356 (http://www.unicode.org/reports/tr31/)
400 /** One more than the last constant for binary Unicode properties. @stable ICU 2.1 */
406 /** First constant for enumerated/integer Unicode properties. @stable ICU 2.2 */
418 See http://www.unicode.org/reports/tr11/
439 /** Enumerated property Hangul_Syllable_Type, new in Unicode 4.
458 see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
465 see UNORM_FCD and http://www.unicode.org/notes/tn5/#FCD .
468 /** Enumerated property Grapheme_Cluster_Break (new in Unicode 4.1).
470 (http://www.unicode.org/reports/tr29/)
473 /** Enumerated property Sentence_Break (new in Unicode 4.1).
475 (http://www.unicode.org/reports/tr29/)
478 /** Enumerated property Word_Break (new in Unicode 4.1).
480 (http://www.unicode.org/reports/tr29/)
483 /** Enumerated property Bidi_Paired_Bracket_Type (new in Unicode 6.3).
484 Used in UAX #9: Unicode Bidirectional Algorithm
485 (http://www.unicode.org/reports/tr9/)
488 … /** One more than the last constant for enumerated/integer Unicode properties. @stable ICU 2.2 */
500 /** First constant for bit-mask Unicode properties. @stable ICU 2.4 */
502 /** One more than the last constant for bit-mask Unicode properties. @stable ICU 2.4 */
508 /** First constant for double Unicode properties. @stable ICU 2.4 */
510 /** One more than the last constant for double Unicode properties. @stable ICU 2.4 */
516 /** First constant for string Unicode properties. @stable ICU 2.4 */
560 /** String property Bidi_Paired_Bracket (new in Unicode 6.3).
563 /** One more than the last constant for string Unicode properties. @stable ICU 2.4 */
566 /** Miscellaneous property Script_Extensions (new in Unicode 6.0).
568 For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
572 /** First constant for Unicode properties with unusual value types. @stable ICU 4.6 */
574 /** One more than the last constant for Unicode properties with unusual value types.
582 * Data for enumerated Unicode general category types.
583 * See http://www.unicode.org/Public/UNIDATA/UnicodeData.html .
591 * / ** <Unicode 2-letter General_Category value> comment... * /
662 * U_GC_XX_MASK constants are bit flags corresponding to Unicode
781 * / ** <Unicode 1..3-letter Bidi_Class value> comment... * /
845 * U_BPT_<Unicode Bidi_Paired_Bracket_Type value name>
859 * Constants for Unicode blocks, see the Unicode Data file Blocks.txt
866 * UBLOCK_<Unicode Block value name> = <integer>,
869 /** New No_Block value in Unicode 4. @stable ICU 2.6 */
894 * Unicode 3.2 renames this block to "Greek and Coptic".
1002 * Unicode 3.2 renames this block to "Combining Diacritical Marks for Symbols".
1111 * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1113 * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1121 * Until Unicode 3.1.1, the corresponding block name was "Private Use",
1123 * Unicode 3.2 renames the block for the BMP PUA to "Private Use Area" and
1157 /* New blocks in Unicode 3.1 */
1178 /* New blocks in Unicode 3.2 */
1183 * Unicode 4.0.1 renames the "Cyrillic Supplementary" block to "Cyrillic Supplement".
1214 /* New blocks in Unicode 4 */
1247 /* New blocks in Unicode 4.1 */
1290 /* New blocks in Unicode 5.0 */
1311 /* New blocks in Unicode 5.1 */
1348 /* New blocks in Unicode 5.2 */
1403 /* New blocks in Unicode 6.0 */
1430 /* New blocks in Unicode 6.1 */
1455 /* New blocks in Unicode 7.0 */
1522 /* New blocks in Unicode 8.0 */
1566 * U_EA_<Unicode East_Asian_Width value name>
1581 * Unicode character; or the name that was defined in
1582 * Unicode version 1.0, before the Unicode standard merged
1584 * Unicode code point a unique name.
1590 /** Unicode character name (Name property). @stable ICU 2.0 */
1613 * Unicode allows for additional names, beyond the long and short
1637 * U_DT_<Unicode Decomposition_Type value name>
1671 * U_JT_<Unicode Joining_Type value name>
1693 * U_JG_<Unicode Joining_Group value name>
1796 * U_GCB_<Unicode Grapheme_Cluster_Break value name>
1809 U_GCB_SPACING_MARK = 10, /*[SM]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1811 U_GCB_REGIONAL_INDICATOR = 12, /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1826 * U_WB_<Unicode Word_Break value name>
1837 U_WB_CR = 8, /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1842 U_WB_REGIONAL_INDICATOR = 13, /*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1843 U_WB_HEBREW_LETTER = 14, /*[HL]*/ /* from here on: new in Unicode 6.3/ICU 52 */
1859 * U_SB_<Unicode Sentence_Break value name>
1873 U_SB_CR = 11, /*[CR]*/ /* from here on: new in Unicode 5.1/ICU 4.0 */
1890 * U_LB_<Unicode Line_Break value name>
1908 /** Renamed from the misspelled "inseperable" in Unicode 4.0.1/ICU 3.0 @stable ICU 3.0 */
1924 U_LB_NEXT_LINE = 29, /*[NL]*/ /* from here on: new in Unicode 4/ICU 2.6 */
1926 U_LB_H2 = 31, /*[H2]*/ /* from here on: new in Unicode 4.1/ICU 3.4 */
1931 U_LB_CLOSE_PARENTHESIS = 36, /*[CP]*/ /* new in Unicode 5.2/ICU 4.4 */
1932 U_LB_CONDITIONAL_JAPANESE_STARTER = 37,/*[CJ]*/ /* new in Unicode 6.1/ICU 49 */
1933 U_LB_HEBREW_LETTER = 38, /*[HL]*/ /* new in Unicode 6.1/ICU 49 */
1934 U_LB_REGIONAL_INDICATOR = 39,/*[RI]*/ /* new in Unicode 6.2/ICU 50 */
1948 * U_NT_<Unicode Numeric_Type value name>
1968 * U_HST_<Unicode Hangul_Syllable_Type value name>
1981 * Check a binary Unicode property for a code point.
1983 * Unicode, especially in version 3.2, defines many more properties than the
1986 * The properties APIs are intended to reflect Unicode properties as defined
1987 * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
1988 * For details about the properties see http://www.unicode.org/ucd/ .
1989 * For names of Unicode properties see the UCD file PropertyAliases.txt.
1991 * Important: If ICU is built with UCD files from Unicode versions below 3.2,
1992 * then properties marked with "new in Unicode 3.2" are not or not fully available.
1997 * @return TRUE or FALSE according to the binary Unicode property value for c.
1998 * Also FALSE if 'which' is out of bounds or if the Unicode version
2010 * Check if a code point has the Alphabetic Unicode property.
2014 * @return true if the code point has the Alphabetic Unicode property, false otherwise
2025 * Check if a code point has the Lowercase Unicode property.
2029 * @return true if the code point has the Lowercase Unicode property, false otherwise
2040 * Check if a code point has the Uppercase Unicode property.
2044 * @return true if the code point has the Uppercase Unicode property, false otherwise
2055 * Check if a code point has the White_Space Unicode property.
2063 * @return true if the code point has the White_Space Unicode property, false otherwise.
2076 * Get the property value for an enumerated or integer Unicode property for a code point.
2079 * Unicode, especially in version 3.2, defines many more properties than the
2082 * The properties APIs are intended to reflect Unicode properties as defined
2083 * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
2084 * For details about the properties see http://www.unicode.org/ .
2085 * For names of Unicode properties see the UCD file PropertyAliases.txt.
2100 * Returns 0 or 1 (for FALSE/TRUE) for binary Unicode properties.
2102 * Returns 0 if 'which' is out of bounds or if the Unicode version
2116 * Get the minimum value for an enumerated/integer/binary Unicode property.
2123 * @return Minimum value returned by u_getIntPropertyValue for a Unicode property.
2137 * Get the maximum value for an enumerated/integer/binary Unicode property.
2141 * Examples for min/max values (for Unicode 3.2):
2152 * @return Maximum value returned by u_getIntPropertyValue for a Unicode property.
2166 * Get the numeric value for a Unicode code point as defined in the
2167 * Unicode Character Database.
2172 * For characters without any numeric values in the Unicode Character Database,
2174 * Note: This is different from the Unicode Standard which specifies NaN as the default value.
2272 * Beginning with Unicode 4, this is the same as
2395 * TRUE for Unicode White_Space characters except for "vertical space controls"
2491 * - It is a Unicode Separator character (categories "Z" = "Zs" or "Zl" or "Zp"), but is not
2505 * the exact same results because of the Unicode version
2508 * Note: Unicode 4.0.1 changed U+200B ZERO WIDTH SPACE from a Space Separator (Zs)
2510 * See http://www.unicode.org/versions/Unicode4.0.1/
2588 * Note that this is different from the Unicode definition in
2606 * which is used in the Unicode bidirectional algorithm
2607 * (UAX #9 http://www.unicode.org/reports/tr9/).
2644 * sometimes need a "poor man's" mapping to another Unicode
2652 * @return another Unicode code point that may serve as a mirror-image
2667 * See http://www.unicode.org/reports/tr9/
2714 * with the same Unicode general category ("character type").
2732 * Enumerate efficiently all code points with their Unicode general categories.
2740 * The Unicode Standard guarantees that the numeric value of the type is 0..31.
2776 * Unicode 4 explicitly assigns Han number characters the Numeric_Type
2781 * for complete numeric Unicode properties.
2794 * Returns the Unicode allocation block that contains the character.
2806 * Retrieve the name of a Unicode character.
2809 * in Unicode version 1.0.
2812 * Unicode 1.0 names are only retrieved if they are different from the modern
2846 * The Unicode ISO_Comment property is deprecated and has no values.
2868 * Find a Unicode character by its name and return its code point value.
2872 * A Unicode 1.0 name is matched only if it differs from the modern name.
2873 * Unicode names are all uppercase. Extended names are lowercase followed
2879 * @return The Unicode value of the code point with the given name,
2894 * for each Unicode character with the code point value and
2899 * @param code The Unicode code point for the character with this name.
2916 * Enumerate all assigned Unicode characters between the start and limit
2919 * For Unicode 1.0 names, only those are enumerated that differ from the
2944 * Return the Unicode name for a given property, as given in the
2945 * Unicode database file PropertyAliases.txt.
2957 * have a short name, but some do not. Unicode allows for
2980 * in the Unicode database file PropertyAliases.txt. Short, long, and
3001 * Return the Unicode name for a given property value, as given in the
3002 * Unicode database file PropertyValueAliases.txt.
3030 * a short name, but some do not. Unicode allows for
3054 * specified in the Unicode database file PropertyValueAliases.txt.
3089 * first character in an identifier according to Unicode
3090 * (The Unicode Standard, Version 3.0, chapter 5.16 Identifiers).
3116 * Almost the same as Unicode's ID_Continue (UCHAR_ID_CONTINUE)
3117 * except that Unicode recommends to ignore Cf which is less than
3141 * Note that Unicode just recommends to ignore Cf (format controls).
3278 * Before Unicode 3.2, CaseFolding.txt contains mappings marked with 'I' that
3282 * Unicode 3.2 CaseFolding.txt instead contains mappings marked with 'T' that
3389 * The "age" is the Unicode version when the code point was first
3397 * @param versionArray The Unicode version number array, to be filled in.
3405 * Gets the Unicode version information.
3407 * for the Unicode standard that is currently used by ICU.
3408 * For example, Unicode version 3.1.1 is represented as an array with
3412 * the Unicode version number
3421 * See Unicode Standard Annex #15 for details, search for "FC_NFKC_Closure"
3422 * or for "FNC": http://www.unicode.org/reports/tr15/