Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | - | - | ||||
GraphemeBreakTest.html | D | 23-Nov-2023 | 48.8 KiB | 226 | 225 | |
GraphemeBreakTest.txt | D | 23-Nov-2023 | 105.4 KiB | 785 | 784 | |
readme.txt | D | 23-Nov-2023 | 2 KiB | 46 | 31 |
readme.txt
1CLDR Segmentation data 2# Copyright © 1991-2020 Unicode, Inc. 3# For terms of use, see http://www.unicode.org/copyright.html 4# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. 5# CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/) 6The segments directory contains files used to customize the default segmentation data in the UCD. 7 8Currently this just applies to the Grapheme Cluster Break (GCB) (https://unicode.org/reports/tr29/) algorithm, 9to add support for not splitting Indic aksaras. 10 11The modifications are: 12 131. Adding 3 new character categories to https://unicode.org/reports/tr29/#Grapheme_Cluster_Break_Property_Values 14 15 Virama=[\p{Gujr}\p{sc=Telu}\p{sc=Mlym}\p{sc=Orya}\p{sc=Beng}\p{sc=Deva}&\p{Indic_Syllabic_Category=Virama}] 16 17 LinkingConsonant=[\p{Gujr}\p{sc=Telu}\p{sc=Mlym}\p{sc=Orya}\p{sc=Beng}\p{sc=Deva}&\p{Indic_Syllabic_Category=Consonant}] 18 19 ExtCccZwj=[\p{gcb=Extend}-\p{ccc=0}] \p{gcb=ZWJ}] 20 21Note that these categories are not GCB property values: 22In fact, they overlap the GCB property values. 23It is not necessary for the rules to have disjoint categories. 24The list of scripts can be added to over time, as test files for them become available. 25 262. Adding a rule to https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules 27 28 9.3) LinkingConsonant ExtCccZwj* Virama ExtCccZwj* × LinkingConsonant 29 303. Adding test files supplied by India to org.unicode.cldr.unittest.data.graphemeCluster/* 31 32 TestSegmenter-Bengali.txt 33 TestSegmenter-Devanagari.txt 34 TestSegmenter-Gujarati.txt 35 TestSegmenter-Malayalam.txt 36 TestSegmenter-Odia.txt 37 TestSegmenter-Telugu.txt 38 394. Adding modified files in this directory, which can be used in place of the default files from 40 https://unicode.org/Public/12.0.0/ucd/auxiliary/ 41 42 GraphemeBreakTest.html 43 GraphemeBreakTest.txt 44 45Note: The GraphemeBreakProperty.txt file is unmodified, as those properties don't change. 46