Lines Matching +full:9 +full:- +full:tools
3 * Copyright (C) 2004-2016, International Business Machines
7 * encoding: US-ASCII
25 ---------------------------------------------------------------------------- ***
44 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
50 ---------------------------------------------------------------------------- ***
54 * Command-line environment setup
67 https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
68 https://unicode-org.atlassian.net/browse/ICU-12850 vo
73 - for each of the three new enumerated properties
84 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
86 + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
89 the tool has an --only_ppucd option:
90 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
93 - add new property short names (uppercase) to _prop_and_value_re
97 so that the tools build can pick up the new definitions from the installed header files.
99 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
101 * build Unicode tools using CMake+make
103 $ICU_SRC/tools/unicode/c/icudefs.txt:
105 # Location (--prefix) of where ICU was installed.
111 mkdir -p tools/unicode/c
112 cd tools/unicode/c
114 $ICU_ROOT/dbg/tools/unicode/c$
115 cmake ../../../../../src/tools/unicode/c
119 $ICU_ROOT/dbg/tools/unicode/c$
121 - rebuild ICU (make install) & tools
124 - add genprops/layoutpropsbuilder.cpp with pieces from sibling files
125 - generate new icu4c/source/common/ulayout_props_data.h
126 - for each of the three new enumerated properties
128 + small, 8-bit UCPTrie
129 (A small 16-bit trie with bit fields for these three properties
133 - uprops.cpp: #include ulayout_props_data.h
134 - uprops.cpp: add getInPC() etc. functions
135 - uprops.cpp: add lines to intProps[], include max values
136 - uprops.h: add UPropertySource constants
137 - uprops.cpp: add uprops_addPropertyStarts(src)
138 - uniset_props.cpp: add to UnicodeSet_initInclusion()
139 - intltest/ucdtest.cpp: write unit tests
142 - refresh just the pnames.icu file with the new property [value] names, just to be safe
143 - see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
144 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
145 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
146 - copy the big-endian Unicode data files to another location,
151 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
154 - UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
155 - UCharacterProperty.java: for each new property
161 - UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
162 - UnicodeSet.java: add to getInclusions()
163 - UCharacterTest.java: write unit tests
165 ---------------------------------------------------------------------------- ***
170 http://unicode.org/versions/beta-11.0.0.html
172 http://www.unicode.org/reports/uax-proposed-updates.html
173 http://www.unicode.org/reports/tr44/tr44-21.html
175 * Command-line environment setup
188 - ticket:13630: Unicode 11
189 - ^/branches/markus/uni11
193 - cldrbug 10978: Unicode 11
194 - ^/branches/markus/uni11
197 - makedata.mak
198 - uchar.h
199 - com.ibm.icu.util.VersionInfo
200 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
202 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
208 - mkdir -p $UNICODE_DATA
209 - download Unicode files into $UNICODE_DATA
213 * for manual diffs and for Unicode Tools input data updates:
219 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
222 the tool has an --only_ppucd option:
223 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
225 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
228 so that the tools build can pick up the new definitions from the installed header files.
230 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
233 - fix other errors
235 -> add Extended_Pictographic binary property
236 -> add new short names for all Emoji properties
239 - preparseucd.py error:
260 -> add to uchar.h
263 -> add to UCharacter.UnicodeBlock IDs
264 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
266 -> add to UCharacter.UnicodeBlock objects
267 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
272 -> uchar.h & UCharacter.GraphemeClusterBreak
273 -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
276 -> ignore: ICU does not yet support this property
280 -> uchar.h & UCharacter.JoiningGroup
289 -> uscript.h & com.ibm.icu.lang.UScript
290 -> Nushu had been added already
291 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
295 -> uchar.h & UCharacter.WordBreak
298 - see UTS #51
299 - short names set in preparseucd.py
302 - boolean emoji property Extended_Pictographic
303 -> added in preparseucd.py
304 -> uchar.h & UProperty.java
305 - misc. property Equivalent_Unified_Ideograph (EqUIdeo)
307 -> ignore for now
311 …$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $…
316 - make sure that the Unicode Tools tree contains the latest security data files
317 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
318 - update the hardcoded version number there in the DIRECTORY path
319 - run the tool (no special environment variables needed)
320 - copy & paste from the Console output into the .cpp & .java files
324 …bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --cs…
325 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
326 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
327 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
328 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
331 so that the tools build can pick up the new definitions from the installed header files.
333 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
335 * build Unicode tools using CMake+make
337 $ICU_SRC/tools/unicode/c/icudefs.txt:
339 # Location (--prefix) of where ICU was installed.
345 mkdir -p tools/unicode/c
346 cd tools/unicode/c
348 $ICU_ROOT/dbg/tools/unicode/c$
349 cmake ../../../../src/tools/unicode/c
353 $ICU_ROOT/dbg/tools/unicode/c$
355 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
356 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
357 - rebuild ICU (make install) & tools
361 genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
362 - With the addition of Georgian Mtavruli capital letters,
365 - Changing the data structure (now formatVersion 4),
366 adding one bit for no-simple-case-folding (for Cherokee), and
371 - Further changes to gain one more bit for the exceptions index,
375 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
376 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
377 - Unicode 6.0..11.0: U+2260, U+226E, U+226F
378 - nothing new in this Unicode version, no test file to update
381 - Andy handles RBBI & spoof check test failures
383 - Errors in char.txt, word.txt, word_POSIX.txt like
386 -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
388 -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
390 -> Andy adjusts the rule sets further to sync with
395 - UCA DUCET goes into Mark's Unicode tools, see
396 https://sites.google.com/site/unicodetools/home#TOC-UCA
399 …s/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/Colla…
400 ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
402 - CLDR root data files are checked into $CLDR_SRC/common/uca/
403 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
405 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
407 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
408 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
411 - restore TODO diffs in UCARules.txt
412 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
413 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
419 - if CLDR common/uca/unihan-index.txt changes, then update
420 CLDR common/collation/root.xml <collation type="private-unihan">
423 - run genuca, see command line above;
425 …Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/googl…
431 (in case the script sample characters flip-flop)
434 - rebuild ICU4C
438 - run Unicode Tools
441 -ea
442 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
443 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
444 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
445 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
446 -DUVERSION=11.0.0
447 - run Unicode Tools
450 - check CLDR diffs
453 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
454 - copy to CLDR
457 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
458 - run CLDR unit tests, commit to CLDR
459 - generate ICU zh collation data: run CLDR
462 -t collation
463 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
464 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
465 -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
466 -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
469 -ea
470 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
471 - rebuild ICU4C
474 - run all tests with the collation test data *_SHORT.txt or the full files
476 - note on intltest: if collate/UCAConformanceTest fails, then
478 fix the conformance test before looking into the multi-thread test
481 - refresh just the UCD/UCA-related/derived files, just to be safe
482 - see (ICU4C)/source/data/icu4j-readme.txt
483 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
484 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
488 echo timestamp > uni-core-data
489 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
490 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
492 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a .…
494 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
495 mkdir -p /tmp/icu4j/main/shared/data
497 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
498 mkdir -p /tmp/icu4j/main/shared/data
501 - copy the big-endian Unicode data files to another location,
505 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
506 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
513 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
516 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
517 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
519 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
527 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
537 - send notice to icu-design about new born-@stable API (enum constants etc.)
540 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
546 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
549 - do not merge the icudata.jar and testdata.jar,
551 - make sure that changes to Unicode tools are checked in:
554 ---------------------------------------------------------------------------- ***
559 http://www.unicode.org/versions/beta-10.0.0.html
560 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
562 http://www.unicode.org/reports/uax-proposed-updates.html
563 http://www.unicode.org/reports/tr44/tr44-19.html
565 * Command-line environment setup
578 - ticket:12985: Unicode 10
579 - ticket:13061: undo hacks from emoji 5.0 update
580 - ticket:13062: add Emoji_Component property
581 - ^/branches/markus/uni10
585 - cldrbug 10055: Unicode 10
586 - cldrbug 9882: Unicode 10 script metadata
587 - cldrbug 10219: numbering systems for Unicode 10
590 - makedata.mak
591 - uchar.h
592 - com.ibm.icu.util.VersionInfo
593 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
595 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
601 - mkdir -p $UNICODE_DATA
602 - download Unicode 10.0 files into $UNICODE_DATA
605 - download emoji 5.0 files into $UNICODE_DATA/emoji
612 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
615 the tool has an --only_ppucd option:
616 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
618 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
621 so that the tools build can pick up the new definitions from the installed header files.
623 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
626 - remove or add new Unicode scripts from/to the
627 only-in-ISO-15924 list according to the error messages:
629 -> adjust _scripts_only_in_iso15924 as indicated
630 - fix other errors
632 -> add vo=Vertical_Orientation to _ignored_properties
633 -> later removed again, parsing the file, even though we do not yet store data for runtime use
636 - preparseucd.py error:
652 -> add to uchar.h
655 -> add to UCharacter.UnicodeBlock IDs
656 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
658 -> add to UCharacter.UnicodeBlock objects
659 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
673 -> uchar.h & UCharacter.JoiningGroup
679 -> uscript.h & com.ibm.icu.lang.UScript
680 -> Nushu had been added already
681 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
685 - boolean Emoji_Component from emoji 5
686 -> uchar.h & UProperty.java
687 - boolean
692 -> uchar.h & UProperty.java
693 -> single immutable range, to be hardcoded
694 - boolean
699 -> was new in Unicode 9
700 -> uchar.h & UProperty.java
701 - enumerated
708 -> only pre-parsed for now, but not yet stored for runtime use
712 …$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $…
716 …bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --cs…
717 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
718 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
719 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
720 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
723 so that the tools build can pick up the new definitions from the installed header files.
725 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
727 * build Unicode tools using CMake+make
729 $ICU_SRC/tools/unicode/c/icudefs.txt:
731 # Location (--prefix) of where ICU was installed.
736 $ICU_ROOT/dbg/tools/unicode/c$
737 cmake ../../../../src/tools/unicode/c
741 $ICU_ROOT/dbg/tools/unicode/c$
743 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
744 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
745 - rebuild ICU (make install) & tools
748 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
749 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
750 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
751 - nothing new in this Unicode version, no test file to update
754 - Andy handles RBBI & spoof check test failures
758 - UCA DUCET goes into Mark's Unicode tools, see
759 https://sites.google.com/site/unicodetools/home#TOC-UCA
760 - CLDR root data files are checked into $CLDR_SRC/common/uca/
761 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
763 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
765 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
766 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
769 - restore TODO diffs in UCARules.txt
770 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
771 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
777 - if CLDR common/uca/unihan-index.txt changes, then update
778 CLDR common/collation/root.xml <collation type="private-unihan">
781 - run genuca, see command line above;
783 …Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/googl…
789 (in case the script sample characters flip-flop)
792 - rebuild ICU4C
796 - run Unicode Tools
799 -ea
800 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
801 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
802 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
803 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
804 -DUVERSION=10.0.0
805 - run Unicode Tools
808 - check CLDR diffs
811 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
812 - copy to CLDR
815 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
816 - run CLDR unit tests, commit to CLDR
817 - generate ICU zh collation data: run CLDR
820 -t collation
821 -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
822 -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
823 -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
824 -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
827 -ea
828 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
829 - rebuild ICU4C
832 - run all tests with the collation test data *_SHORT.txt or the full files
834 - note on intltest: if collate/UCAConformanceTest fails, then
836 fix the conformance test before looking into the multi-thread test
839 - refresh just the UCD/UCA-related/derived files, just to be safe
840 - see (ICU4C)/source/data/icu4j-readme.txt
841 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
842 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
846 echo timestamp > uni-core-data
847 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
848 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
850 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a .…
852 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
853 mkdir -p /tmp/icu4j/main/shared/data
855 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
856 mkdir -p /tmp/icu4j/main/shared/data
859 - copy the big-endian Unicode data files to another location,
863 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
864 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
871 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
874 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
875 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
877 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
885 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
895 - send notice to icu-design about new born-@stable API (enum constants etc.)
898 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
900 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
903 - do not merge the icudata.jar and testdata.jar,
905 - make sure that changes to Unicode tools are checked in:
908 ---------------------------------------------------------------------------- ***
911 - ICU 59 mostly remains on Unicode 9.0
912 - except updates bidi and segmentation data to Unicode 10 beta
914 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
916 * Command-line environment setup
928 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
929 - changes directly on trunk
935 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
936 - download emoji 5.0 beta files into the same uni90e50 folder
937 - download Unicode 10.0 beta files: ucd
949 - adjust for combined trunks
950 - write new copyright lines
951 - ignore new Emoji_Component property for now
954 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
957 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
960 so that the tools build can pick up the new definitions from the installed header files.
962 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
964 * build Unicode tools using CMake+make
966 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
968 # Location (--prefix) of where ICU was installed.
973 ~/svn.icu/trunk/dbg/tools/unicode/c$
974 cmake ../../../../src/tools/unicode/c
978 ~/svn.icu/trunk/dbg/tools/unicode/c$
980 - rebuild ICU (make install) & tools
983 - Andy handles RBBI & spoof check test failures
986 - refresh just the UCD/UCA-related/derived files, just to be safe
987 - see (ICU4C)/source/data/icu4j-readme.txt
988 - mkdir /tmp/icu4j
989 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
993 echo timestamp > uni-core-data
994 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
995 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
997 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a .…
999 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
1000 mkdir -p /tmp/icu4j/main/shared/data
1002 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
1003 mkdir -p /tmp/icu4j/main/shared/data
1006 - copy the big-endian Unicode data files to another location,
1010 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1015 …jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data…
1018 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1019 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
1021 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
1024 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1033 ---------------------------------------------------------------------------- ***
1037 * Command-line environment setup
1046 http://www.unicode.org/review/pri323/ -- beta review
1047 http://www.unicode.org/reports/uax-proposed-updates.html
1048 http://www.unicode.org/versions/beta-9.0.0.html
1050 http://www.unicode.org/reports/tr44/tr44-17.html
1054 - ticket:12526: integrate Unicode 9
1055 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
1056 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
1060 - cldrbug 9414: UCA 9
1061 - ^/branches/markus/uni90 at r11518 from trunk at r11517
1063 - cldrbug 8745: Unicode 9.0 script metadata
1066 - makedata.mak
1067 - uchar.h
1068 - com.ibm.icu.util.VersionInfo
1069 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1071 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1078 - download UCD & IDNA files
1079 - make sure that the Unicode data folder passed into preparseucd.py
1081 - only for manual diffs: remove version suffixes from the file names
1084 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1085 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.i…
1086 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1088 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
1093 - remove or add new Unicode scripts from/to the
1094 only-in-ISO-15924 list according to the error messages:
1099 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1101 - DerivedNumericValues.txt new numeric values
1102 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
1107 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
1110 - adjust preparseucd.py for Tangut algorithmic names
1112 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
1113 ->
1114 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
1115 - avoid block-compressing most String/Miscellaneous property values,
1116 triggered by genprops not coping with a multi-code point Case_Folding on
1118 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
1121 - 1 new property PCM=Prepended_Concatenation_Mark
1137 -> add to uchar.h
1139 -> add to UCharacter.UnicodeBlock IDs
1140 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1142 -> add to UCharacter.UnicodeBlock objects
1143 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1151 -> uchar.h & UCharacter.GraphemeClusterBreak
1156 -> uchar.h & UCharacter.JoiningGroup
1161 -> uchar.h & UCharacter.LineBreak
1169 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
1176 -> uchar.h & UCharacter.WordBreak
1184 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1185 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1186 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1187 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1188 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1191 so that the tools build can pick up the new definitions from the installed header files.
1193 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
1195 * build Unicode tools using CMake+make
1199 # Location (--prefix) of where ICU was installed.
1211 genuca/genuca --hanOrder implicit $ICU_SRC_DIR
1212 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
1213 - rebuild ICU (make install) & tools
1216 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1217 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1218 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
1219 - nothing new in 9.0, no test file to update
1222 - Andy handles RBBI & spoof check test failures
1226 - UCA DUCET goes into Mark's Unicode tools, see
1227 https://sites.google.com/site/unicodetools/home#TOC-UCA
1228 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
1231 - cd (CLDR UCA branch)/common/uca/
1232 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1234 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1235 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
1238 - restore TODO diffs in UCARules.txt
1239 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1240 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1246 - if CLDR common/uca/unihan-index.txt changes, then update
1247 CLDR common/collation/root.xml <collation type="private-unihan">
1250 - run genuca, see command line above;
1252 …Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/s…
1258 (in case the script sample characters flip-flop)
1261 - rebuild ICU4C
1264 - run Unicode Tools
1267 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
1268 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
1269 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
1270 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
1271 -DUVERSION=9.0.0
1272 -ea
1273 - run Unicode Tools
1276 - check CLDR diffs
1279 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1280 - copy to CLDR
1283 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1284 - commit to CLDR
1285 - generate ICU zh collation data: run CLDR
1288 -t collation
1289 -s /home/mscherer/svn.cldr/trunk/common/collation
1290 -m /home/mscherer/svn.cldr/trunk/common/supplemental
1291 -d /home/mscherer/svn.icu/trunk/src/source/data/coll
1292 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
1295 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
1296 - rebuild ICU4C
1299 - run all tests with the collation test data *_SHORT.txt or the full files
1301 - note on intltest: if collate/UCAConformanceTest fails, then
1303 fix the conformance test before looking into the multi-thread test
1306 - refresh just the UCD/UCA-related/derived files, just to be safe
1307 - see (ICU4C)/source/data/icu4j-readme.txt
1308 - mkdir /tmp/icu4j
1309 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1313 echo timestamp > uni-core-data
1314 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
1315 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
1317 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a .…
1319 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
1320 mkdir -p /tmp/icu4j/main/shared/data
1322 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
1323 mkdir -p /tmp/icu4j/main/shared/data
1326 - copy the big-endian Unicode data files to another location,
1330 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1331 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1338 …jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$IC…
1341 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1342 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1344 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1352 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1363 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1369 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1379 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1380 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1381 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1384 - send notice to icu-design about new born-@stable API (enum constants etc.)
1387 - do not merge the icudata.jar and testdata.jar,
1389 - make sure that changes to Unicode tools & ICU tools are checked in
1391 http://bugs.icu-project.org/trac/log/tools/trunk
1393 ---------------------------------------------------------------------------- ***
1395 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
1398 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
1399 - new combination/alias codes: Hanb, Jamo
1400 - used in CLDR 29 and in spoof checker
1401 - new Z* code: Zsye
1404 -> com.ibm.icu.lang.UScript
1405 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1410 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
1412 Note: If we have to run preparseucd.py again before the Unicode 9 update,
1423 see http://bugs.icu-project.org/trac/ticket/12141
1432 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1438 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1448 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1449 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1450 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1452 ---------------------------------------------------------------------------- ***
1454 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
1459 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
1469 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
1481 ---------------------------------------------------------------------------- ***
1485 * Command-line environment setup
1494 http://www.unicode.org/review/pri297/ -- beta review
1495 http://www.unicode.org/reports/uax-proposed-updates.html
1496 http://unicode.org/versions/beta-8.0.0.html
1498 http://www.unicode.org/reports/tr44/tr44-15.html
1502 - ticket:11574: Unicode 8
1503 - C++ branches/markus/uni80 at r37351 from trunk at r37343
1504 - Java branches/markus/uni80 at r37352 from trunk at r37338
1508 - cldrbug 8311: UCA 8
1509 - branches/markus/uni80 at r11518 from trunk at r11517
1511 - cldrbug 8109: Unicode 8.0 script metadata
1512 - cldrbug 8418: Updated segmentation for Unicode 8.0
1515 - makedata.mak
1516 - uchar.h
1517 - com.ibm.icu.util.VersionInfo
1518 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1520 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1527 - download UCD & IDNA files
1528 - make sure that the Unicode data folder passed into preparseucd.py
1530 - only for manual diffs: remove version suffixes from the file names
1533 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1534 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.i…
1535 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1537 - also: from http://unicode.org/Public/security/8.0.0/ download new
1544 - remove new Unicode scripts from the
1545 only-in-ISO-15924 list according to the error message:
1548 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1550 - property and file name change:
1551 IndicMatraCategory -> IndicPositionalCategory
1552 - UnicodeData.txt unusual numeric values (improper fractions)
1561 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
1563 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
1568 - 10 new Block (blk) values:
1579 -> add to uchar.h
1581 -> add to UCharacter.UnicodeBlock IDs
1582 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1584 -> add to UCharacter.UnicodeBlock objects
1585 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1587 - 6 new Script (sc) values:
1594 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
1602 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1603 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1604 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1605 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1606 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1609 so that the tools build can pick up the new definitions from the installed header files.
1611 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1613 * build Unicode tools using CMake+make
1617 # Location (--prefix) of where ICU was installed.
1626 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1627 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
1628 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
1629 - rebuild ICU (make install) & tools
1630 - run genuca again (see step above) so that it picks up the new nfc.nrm
1631 - rebuild ICU (make install) & tools
1634 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1635 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1636 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
1637 - nothing new in 8.0, no test file to update
1640 - bad Cherokee case folding due to difference in fallbacks:
1645 - Andy handles RBBI & spoof check test failures
1649 - UCA DUCET goes into Mark's Unicode tools, see
1650 https://sites.google.com/site/unicodetools/home#TOC-UCA
1651 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
1652 - cd (CLDR UCA branch)/common/uca/
1653 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1655 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1656 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
1659 - restore TODO diffs in UCARules.txt
1660 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1661 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1667 - if CLDR common/uca/unihan-index.txt changes, then update
1668 CLDR common/collation/root.xml <collation type="private-unihan">
1670 - run genuca, see command line above;
1672 …Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/sv…
1677 (in case the script sample characters flip-flop)
1680 - rebuild ICU4C
1683 - run all tests with the collation test data *_SHORT.txt or the full files
1685 - note on intltest: if collate/UCAConformanceTest fails, then
1687 fix the conformance test before looking into the multi-thread test
1688 - fixed bug in CollationWeights::getWeightRanges()
1692 - refresh just the UCD/UCA-related/derived files, just to be safe
1693 - see (ICU4C)/source/data/icu4j-readme.txt
1694 - mkdir /tmp/icu4j
1695 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1699 echo timestamp > uni-core-data
1700 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
1701 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
1703 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a .…
1705 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
1706 mkdir -p /tmp/icu4j/main/shared/data
1708 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
1709 mkdir -p /tmp/icu4j/main/shared/data
1712 - copy the big-endian Unicode data files to another location,
1716 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1717 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1724 …jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICU…
1727 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1728 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1730 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1738 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1754 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1760 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1770 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1771 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1772 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1775 - send notice to icu-design about new born-@stable API (enum constants etc.)
1778 - do not merge the icudata.jar and testdata.jar,
1780 - make sure that changes to Unicode tools & ICU tools are checked in
1782 http://bugs.icu-project.org/trac/log/tools/trunk
1784 ---------------------------------------------------------------------------- ***
1788 http://www.unicode.org/review/pri271/ -- beta review
1789 http://www.unicode.org/reports/uax-proposed-updates.html
1790 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
1791 http://www.unicode.org/reports/tr44/tr44-13.html
1795 - ticket 10821: Unicode 7.0, UCA 7.0
1796 - C++ branches/markus/uni70 at r35584 from trunk at r35580
1797 - Java branches/markus/uni70 at r35587 from trunk at r35545
1801 - ticket 7195: UCA 7.0 CLDR root collation
1802 - branches/markus/uni70 at r10062 from trunk at r10061
1804 - ticket 6762: script metadata for Unicode 7.0 new scripts
1807 - makedata.mak
1808 - uchar.h
1809 - com.ibm.icu.util.VersionInfo
1810 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1812 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1819 - download UCD & IDNA files
1820 - make sure that the Unicode data folder passed into preparseucd.py
1822 - only for manual diffs: remove version suffixes from the file names
1825 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1826 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.i…
1827 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1828 - Restore TODO diffs in source/data/unidata/UCARules.txt
1831 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
1833 - also: from http://unicode.org/Public/security/7.0.0/ download new
1838 - remove new Unicode scripts from the
1839 only-in-ISO-15924 list according to the error message:
1844 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1846 - NamesList.txt now has a heading with a non-ASCII character
1848 + escape non-ASCII characters in heading comments
1849 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
1853 - 32 new Block (blk) values:
1886 -> add to uchar.h
1888 -> add to UCharacter.UnicodeBlock IDs
1889 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1891 -> add to UCharacter.UnicodeBlock objects
1892 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1894 - 28 new Joining_Group (jg) values:
1923 -> uchar.h & UCharacter.JoiningGroup
1924 - 23 new Script (sc) values:
1948 -> uscript.h (many were added before)
1951 -> com.ibm.icu.lang.UScript
1952 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1954 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1955 (added 2012-11-01)
1959 (added 2013-10-12)
1963 -> uscript.h (some overlap with additions from Unicode)
1964 -> com.ibm.icu.lang.UScript
1965 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1967 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
1968 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1976 - cd $ICU_ROOT/dbg
1977 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1978 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1979 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
1980 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1981 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1982 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1983 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1984 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1987 so that the tools build can pick up the new definitions from the installed header files.
1989 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1991 * build Unicode tools using CMake+make
1995 # Location (--prefix) of where ICU was installed.
2004 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
2011 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2012 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
2013 - rebuild ICU (make install) & tools
2014 - run genuca again (see step above) so that it picks up the new nfc.nrm
2015 - rebuild ICU (make install) & tools
2018 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2019 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2020 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
2021 - nothing new in 7.0, no test file to update
2026 - refresh just the UCD-related files, just to be safe
2027 - see (ICU4C)/source/data/icu4j-readme.txt
2028 - mkdir /tmp/icu4j
2029 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2033 echo timestamp > uni-core-data
2034 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2035 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
2037 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a .…
2039 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
2040 mkdir -p /tmp/icu4j/main/shared/data
2042 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
2043 mkdir -p /tmp/icu4j/main/shared/data
2046 - copy the big-endian Unicode data files to another location,
2049 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2050 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2058 - refresh ICU4J
2059 …~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2067 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2076 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
2077 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
2078 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
2079 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
2080 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
2081 - review data; compare files, use blankweights.sed or similar
2082 …~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.tx…
2083 - cd ~/svn.unitools/Generated/uca/7.0.0/
2084 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2086 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2089 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2091 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2095 - run genuca, see command line above
2096 - rebuild ICU4C
2097 - refresh ICU4J collation data:
2100 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2101 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2103 …~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2104 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for deb…
2105 - note on intltest: if collate/UCAConformanceTest fails, then
2107 fix the conformance test before looking into the multi-thread test
2108 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
2109 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
2113 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2114 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2116 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2124 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2130 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
2134 - diff current <icu>/source/layout files vs. generated ones
2135 … ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2138 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2139 - if you just copy the above files, then
2141 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2144 - send notice to icu-design about new born-@stable API (enum constants etc.)
2147 - do not merge the icudata.jar and testdata.jar,
2150 ---------------------------------------------------------------------------- ***
2154 http://www.unicode.org/review/pri249/ -- beta review
2155 http://www.unicode.org/reports/uax-proposed-updates.html
2156 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
2157 http://www.unicode.org/reports/tr44/tr44-11.html
2161 - ticket 10128: update ICU to Unicode 6.3 beta
2162 - ticket 10168: update ICU to Unicode 6.3 final
2163 - C++ branches/markus/uni63 at r33552 from trunk at r33551
2164 - Java branches/markus/uni63 at r33550 from trunk at r33553
2166 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
2169 - makedata.mak
2170 - uchar.h
2172 - com.ibm.icu.util.VersionInfo
2173 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2175 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2182 - download UCD, UCA & IDNA files
2183 - make sure that the Unicode data folder passed into preparseucd.py
2185 - modify preparseucd.py:
2188 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src …
2189 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2190 - Check test file diffs for previously commented-out, known-failing data lines;
2194 - 1 new Enumerated Property
2196 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
2197 -> ubidi_props.h & .c & UBiDiProps.java
2198 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
2199 -> uprops.cpp
2200 -> change ubidi.icu format version from 2.0 to 2.1
2201 - 1 new Miscellaneous Property
2203 -> uchar.h & UProperty.java
2204 -> ppucd.h & .cpp
2207 - 3 Bidi_Paired_Bracket_Type (bpt) values:
2211 -> uchar.h & UCharacter.BidiPairedBracketType
2212 -> ubidi_props.h & .c & UBiDiProps.java
2213 -> change ubidi.icu format version from 2.0 to 2.1
2214 - 4 new Bidi_Class (bc) values:
2219 -> uchar.h & UCharacterEnums.ECharacterDirection
2220 -> until the bidi code gets updated,
2222 - 3 new Word_Break (WB) values:
2226 -> uchar.h & UCharacter.WordBreak
2227 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
2228 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2229 (added 2012-10-16)
2232 -> uscript.h
2233 -> com.ibm.icu.lang.UScript
2234 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2236 -> preparseucd.py _scripts_only_in_iso15924
2237 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2239 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2243 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
2244 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
2245 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
2246 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2247 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2248 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt …
2249 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2252 so that the tools build can pick up the new definitions from the installed header files.
2254 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2256 * build Unicode tools using CMake+make
2260 # Location (--prefix) of where ICU was installed.
2269 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
2270 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l …
2271 - rebuild ICU (make install) & tools
2272 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
2273 - rebuild ICU (make install) & tools
2276 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2277 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2278 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
2279 - nothing new in 6.3, no test file to update
2282 - refresh just the UCD-related files, just to be safe
2283 - see (ICU4C)/source/data/icu4j-readme.txt
2284 - mkdir /tmp/icu4j
2285 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2289 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
2290 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
2292 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a .…
2294 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
2295 mkdir -p /tmp/icu4j/main/shared/data
2297 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
2298 mkdir -p /tmp/icu4j/main/shared/data
2301 - copy the big-endian Unicode data files to another location,
2303 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2304 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
2310 - refresh ICU4J
2311 …~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2314 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2316 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
2318 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
2319 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
2320 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2321 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2323 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2325 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2326 - check test file diffs for previously commented-out, known-failing data lines;
2328 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2329 - run genuca, see command line above
2330 - rebuild ICU4C
2331 - refresh ICU4J collation data:
2333 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2334 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2336 …~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2337 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for deb…
2338 - note on intltest: if collate/UCAConformanceTest fails, then
2340 fix the conformance test before looking into the multi-thread test
2345 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2346 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2348 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2351 - skipped for Unicode 6.3: no new scripts
2354 - do not merge the icudata.jar and testdata.jar,
2357 ---------------------------------------------------------------------------- ***
2362 http://www.unicode.org/versions/beta-6.2.0.html
2363 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
2367 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
2372 - ticket 9515: Unicode 6.2: final ICU update
2374 - ticket 9514: UCA 6.2: fix UCARules.txt
2376 - ticket 9437: update ICU to Unicode 6.2
2377 - C++ branches/markus/uni62 at r32050 from trunk at r32041
2378 - Java branches/markus/uni62 at r32068 from trunk at r32066
2381 - makedata.mak
2382 - uchar.h
2384 - com.ibm.icu.util.VersionInfo
2385 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2391 - download UCD, UCA & IDNA files
2392 - make sure that the Unicode data folder passed into preparseucd.py
2394 - modify preparseucd.py: NamesList.txt is now in UTF-8
2395 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.i…
2396 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2397 - Check test file diffs for previously commented-out, known-failing data lines;
2401 - 1 new Line_Break (lb) value:
2403 -> uchar.h & UCharacter.LineBreak
2404 - 1 new Word_Break (WB) value:
2406 -> uchar.h & UCharacter.WordBreak
2407 - 1 new Grapheme_Cluster_Break (GCB) value:
2409 -> uchar.h & UCharacter.GraphemeClusterBreak
2412 The new value -1, which was really supposed to be NaN but that would have required
2413 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
2415 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
2416 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
2420 -> uprops.h, uchar.c & UCharacterProperty.java
2421 -> cucdtst.c & UCharacterTest.java
2424 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
2425 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
2426 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
2427 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2428 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2429 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt …
2430 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2433 so that the tools build can pick up the new definitions from the installed header files.
2434 * build Unicode tools using CMake+make
2437 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
2438 - in initial bootstrapping, change the UCA version
2440 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l…
2441 - rebuild ICU (make install) & tools
2445 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
2446 - rebuild ICU (make install) & tools
2449 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2450 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2451 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
2452 - nothing new in 6.2, no test file to update
2455 - refresh just the UCD-related files, just to be safe
2456 - see (ICU4C)/source/data/icu4j-readme.txt
2457 - mkdir /tmp/icu4j
2458 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2462 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
2463 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
2465 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a .…
2467 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
2468 mkdir -p /tmp/icu4j/main/shared/data
2470 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
2471 mkdir -p /tmp/icu4j/main/shared/data
2474 - copy the big-endian Unicode data files to another location,
2476 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2477 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
2483 - refresh ICU4J
2484 …~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2487 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2491 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
2492 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
2493 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2494 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2496 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2498 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2499 - check test file diffs for previously commented-out, known-failing data lines;
2501 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2502 - run genuca, see command line above
2503 - rebuild ICU4C
2504 - refresh ICU4J collation data:
2506 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2507 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2509 …~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2510 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for deb…
2511 - note on intltest: if collate/UCAConformanceTest fails, then
2513 fix the conformance test before looking into the multi-thread test
2518 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2519 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2521 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2524 - skipped for Unicode 6.2: no new scripts
2527 - do not merge the icudata.jar and testdata.jar,
2530 ---------------------------------------------------------------------------- ***
2534 Tools simplified since the Unicode 6.1 update. See
2535 - http://site.icu-project.org/design/props/ppucd
2536 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
2539 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
2542 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
2543 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.i…
2544 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2545 - Check test file diffs for previously commented-out, known-failing data lines;
2549 - Script codes that are in ISO 15924 but not in Unicode are now listed in
2555 - No more manual changes for CJK ranges for algorithmic names;
2559 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
2562 - it is now generated by preparseucd.py
2565 - it is now generated by preparseucd.py
2566 - make sure that the Unicode data folder passed into preparseucd.py
2571 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
2572 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
2573 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
2574 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2575 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2576 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt …
2577 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2580 * build Unicode tools using CMake+make
2583 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l…
2585 ---------------------------------------------------------------------------- ***
2591 - ticket 8995 final update to Unicode 6.1
2592 - ticket 8994 regenerate source/layout/CanonData.cpp
2594 - ticket 8961 support Unicode "Age" value *names*
2595 - ticket 8963 support multiple character name aliases & types
2597 - ticket 8827 "update ICU to Unicode 6.1"
2598 - C++ branches/markus/uni61 at r30864 from trunk at r30843
2599 - Java branches/markus/uni61 at r30865 from trunk at r30863
2602 - makedata.mak
2603 - uchar.h
2605 - com.ibm.icu.util.VersionInfo
2606 - icutools/unicode/makedefs.sh
2614 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/proces…
2615 - This prepares both unidata and testdata files in respective output subfolders.
2616 - Check test file diffs for previously commented-out, known-failing data lines;
2620 - 11 new block names:
2632 -> add to uchar.h
2633 -> add to UCharacter.UnicodeBlock IDs
2634 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2636 -> add to UCharacter.UnicodeBlock objects
2637 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2639 - 1 new Joining_Group (jg) value:
2641 -> uchar.h & UCharacter.JoiningGroup
2642 - 2 new Line_Break (lb) values:
2645 -> uchar.h & UCharacter.LineBreak
2646 - 7 new scripts:
2654 -> remove these from SyntheticPropertyValueAliases.txt
2655 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2657 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2658 (added 2011-06-21)
2661 and another one added 2011-12-09
2663 -> uscript.h
2664 -> com.ibm.icu.lang.UScript
2665 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2667 -> SyntheticPropertyValueAliases.txt
2668 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2672 - the last Unihan code point changes from U+9FCB to U+9FCC
2673 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
2678 - 2 new default-AL blocks:
2679 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
2681 # U+1EE00 - U+1EEFF (was default-R)
2682 - 2 new default-R blocks:
2684 # U+10980 - U+1099F
2685 # Meroitic Cursive: U+109A0 - U+109FF
2686 -> should be picked up by the explicit data in the file
2689 - from
2693 - to
2699 - Also, the file previously allowed multiple aliases but only now does it
2704 - This breaks our gennames parser, unames.icu data structure, and API.
2709 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2715 so that the tools build can pick up the new definitions from the installed header files.
2716 * build Unicode tools (at least genpname) using CMake+make
2720 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2721 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --…
2724 * build Unicode tools using CMake+make
2727 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2730 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
2731 to ~/svn.icu/tools/trunk/src/unicode/py
2732 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
2733 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2734 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2737 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2738 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2739 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
2740 - nothing new in 6.1, no test file to update
2743 - in initial bootstrapping, change the UCA version
2745 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2746 - rebuild ICU & tools
2750 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
2751 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2752 - rebuild ICU & tools
2755 - refresh just the UCD-related files, just to be safe
2756 - see (ICU4C)/source/data/icu4j-readme.txt
2757 - mkdir /tmp/icu4j
2758 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2762 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
2763 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
2765 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a .…
2767 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
2768 mkdir -p /tmp/icu4j/main/shared/data
2770 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
2771 mkdir -p /tmp/icu4j/main/shared/data
2774 - copy the big-endian Unicode data files to another location,
2776 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2777 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
2783 - refresh ICU4J
2784 …~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2787 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2790 - temporarily ignore collation issues that look like UCA/UCD mismatches,
2795 - get output from Mark's tools; look in
2796 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
2797 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2798 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2800 - update (ICU)/source/test/testdata/CollationTest_*.txt
2802 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2803 - check test file diffs for previously commented-out, known-failing data lines;
2805 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2806 - run makeuca.sh:
2807 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2808 - rebuild ICU4C
2809 - refresh ICU4J collation data:
2811 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2812 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2814 …~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2815 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for deb…
2816 - note on intltest: if collate/UCAConformanceTest fails, then
2818 fix the conformance test before looking into the multi-thread test
2821 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2822 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2824 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2830 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2837 - diff current <icu>/source/layout files vs. generated ones
2838 …~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/too…
2841 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2842 - if you just copy the above files, then
2844 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2847 - do not merge the icudata.jar and testdata.jar,
2850 ---------------------------------------------------------------------------- ***
2854 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2855 (added 2010-12-21)
2865 -> uscript.h
2866 -> com.ibm.icu.lang.UScript
2867 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2869 -> genpname/SyntheticPropertyValueAliases.txt
2870 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2874 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2879 * rebuild Unicode tools (at least genpname) using make
2880 - You might first need to "make install" ICU so that the tools build can pick
2885 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2886 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --…
2887 - rebuild ICU & tools
2890 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~…
2891 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --cso…
2892 - rebuild ICU & tools
2895 - refresh just the UCD-related files, just to be safe
2896 - see (ICU4C)/source/data/icu4j-readme.txt
2897 - mkdir /tmp/icu4j
2898 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2899 - copy the big-endian Unicode data files to another location,
2901 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2904 - refresh ICU4J
2905 …~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
2909 ---------------------------------------------------------------------------- ***
2918 - makedata.mak
2919 - uchar.h
2921 - com.ibm.icu.util.VersionInfo
2927 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/proces…
2928 - This now prepares both unidata and testdata files in respective output subfolders.
2931 - new Script_Extensions property defined in the new ScriptExtensions.txt file
2933 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
2935 -> uchar.h with new UProperty section
2936 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
2939 - 12 new block names:
2952 -> add to uchar.h
2953 -> add to UCharacter.UnicodeBlock
2954 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2956 - Joining_Group (jg) values:
2958 -> uchar.h & UCharacter.JoiningGroup
2959 - 3 new scripts:
2963 -> remove these from SyntheticPropertyValueAliases.txt
2964 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
2965 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2967 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2968 (added 2009-11-11..2010-07-18)
2982 -> uscript.h
2983 -> com.ibm.icu.lang.UScript
2984 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2986 -> SyntheticPropertyValueAliases.txt
2987 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2989 - ISO 15924 name change
2991 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2992 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2995 - new CJK block:
2998 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
3000 * build Unicode tools using CMake+make
3003 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3008 * rebuild Unicode tools (at least genpname) using make
3009 - You might first need to "make install" ICU so that the tools build can pick
3013 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3014 - rebuild ICU & tools
3017 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3020 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
3021 to ~/svn.icu/tools/trunk/src/unicode/py
3022 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
3023 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3024 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3027 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3028 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3029 - Unicode 6.0: U+2260, U+226E, U+226F
3032 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3033 - rebuild ICU & tools
3034 - run makeuca.sh so that genuca picks up the new nfc.nrm:
3035 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3036 - rebuild ICU & tools
3039 - parser & generator: genprops & uprops.icu
3040 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/use…
3041 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
3044 - (one-time change)
3045 - genbidi/gencase/genprops tools changes
3046 - re-run makeprops.sh (see above)
3047 - UCharacterProperty.java, UCharacterTypeIterator.java,
3052 - refresh just the UCD-related files, just to be safe
3053 - see (ICU4C)/source/data/icu4j-readme.txt
3054 - mkdir /tmp/icu4j
3055 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3059 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3061 …tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a .…
3062 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
3063 mkdir -p /tmp/icu4j/main/shared/data
3065 - copy the big-endian Unicode data files to another location,
3067 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3068 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3074 - refresh ICU4J
3075 …~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
3078 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3080 * un-hardcode normalization skippable (NF*_Inert) test data
3081 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
3084 - now handled by early ucdcopy.py and
3087 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
3089 - they are not used in ICU4J
3093 - get output from Mark's tools; look in
3095 http://www.macchiato.com/unicode/utc/additional-uca-files
3098 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3099 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3100 - update Han-implicit ranges for new CJK extensions:
3102 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
3103 do not add it into invuca so that tailoring primary-after an ignorable works
3104 - genuca: permit space between [variable top] bytes
3105 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
3106 - run makeuca.sh:
3107 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3108 - rebuild ICU4C
3109 - refresh ICU4J collation data:
3111 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3112 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3114 …~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /…
3115 - update (ICU)/source/test/testdata/CollationTest_*.txt
3117 with output from Mark's Unicode tools
3118 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
3119 - note on intltest: if collate/UCAConformanceTest fails, then
3121 fix the conformance test before looking into the multi-thread test
3124 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3125 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3127 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3142 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3143 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3145 ---------------------------------------------------------------------------- ***
3163 - makedata.mak
3164 - uchar.h
3165 - configure.in & configure
3166 - update ucdVersion in gennames.c if an algorithmic range changes
3172 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unico…
3173 - includes finding files regardless of version numbers,
3175 ucdstrip and ucdmerge tools on the desired set of files
3178 - PropertyAliases.txt
3194 - PropertyValueAliases.txt
3199 ->
3206 - DerivedBidiClass.txt
3207 new default-R range: U+1E800 - U+1EFFF
3208 - UnicodeData.txt
3211 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
3217 - run preparse.pl
3218 + cd \svn\icuproj\icu\trunk\source\tools\genpname
3226 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
3231 - new block & script values
3235 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
3242 - new Joining Group (JG) values: Farsi_Yeh, Nya
3243 - new Line_Break (lb) value:
3247 - Unihan range end moves from 9FC3 to 9FCB
3248 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
3253 - Verified that definitions for Cased and Case_Ignorable are unchanged.
3258 - new numeric values that didn't exist in Unicode data before:
3259 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
3260 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
3264 and fractions with numerators -1..17 and denominators 1..16
3269 - the old code assumed that all Jamo characters are in the 11xx block
3270 - Unicode 5.2 fills holes there and adds new Jamo characters in
3271 A960..A97F; Hangul Jamo Extended-A
3273 D7B0..D7FF; Hangul Jamo Extended-B
3274 - Hangul_Syllable_Type can be trivially derived from a subset of
3278 …ata>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
3282 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3283 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
3284 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
3285 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
3286 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
3287 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
3288 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
3289 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
3298 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
3303 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
3304 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
3305 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
3308 …- generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/gentest…
3310 …python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHO…
3311 …python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt…
3313 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
3315 - note on intltest: if collate/UCAConformanceTest fails, then
3317 fix the conformance test before looking into the multi-thread test
3320 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
3321 - Problem: These properties should be disjoint, but aren't
3322 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
3323 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
3326 - without stored data
3329 - add it as another name field in unames.icu
3330 - make it available via u_charName() and UCharNameChoice and
3331 - consider it in u_charFromName()
3339 - review format and data
3340 - copy BidiTest.txt to source/test/testdata
3341 - write test code using this data
3342 - fix ICU code where it fails the conformance test
3345 - generally, find and update code corresponding to C/C++
3346 - UCharacter.UnicodeBlock constants:
3350 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
3352 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
3354 - port test changes to Java
3358 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
3366 -> Eric Mader wrote in email on 20090930:
3375 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3380 -> Eric Mader wrote in email on 20090930:
3381 "This is just a matter of making sure that all the per-script tables have
3390 - Update User Guide
3391 + Jamo_Short_Name, sfc->scf, binary property value aliases
3393 ---------------------------------------------------------------------------- ***
3402 - makedata.mak
3403 - uchar.h
3404 - configure.in & configure
3405 - update ucdVersion in gennames.c if an algorithmic range changes
3410 - ucdstrip:
3419 - ucdstrip and ucdmerge:
3450 - run preparse.pl
3451 + cd \svn\icuproj\icu\uni51\source\tools\genpname
3459 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
3463 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
3467 - new block & script values
3473 - uprops.icu (uprops.h) only provides 7 bits for script codes.
3478 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
3484 Block (current count: 172) grows from 8 to 9 bits,
3486 - renamed property Simple_Case_Folding (sfc->scf)
3488 - new property JSN Jamo_Short_Name
3490 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
3491 - new Joining Group (JG) value: Burushashki_Yeh_Barree
3492 - new Sentence_Break (SB) values:
3497 - new Word_Break (WB) values:
3503 * Further changes in the 2008-02-29 update:
3504 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
3506 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h'…
3507 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
3508 - new Word_Break (WB) value: NL=Newline
3511 - Unihan range end moves from 9FBB to 9FC3
3512 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
3516 …urce\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
3520 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3521 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
3522 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
3523 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
3524 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
3525 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
3526 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
3534 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
3546 - Test that APIs using Unicode property value aliases (like UnicodeSet)
3548 -> TestBinaryValues() tests in both cintltst and intltest
3565 - Update User Guide
3566 + Jamo_Short_Name, sfc->scf, binary property value aliases
3568 ---------------------------------------------------------------------------- ***
3579 - ucdstrip:
3588 - ucdstrip and ucdmerge:
3621 - run preparse.pl
3626 - new block & script values
3630 …oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
3634 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3643 - copy the .c source files to C:\cvs\oss\icu\source\common
3647 - makedata.mak
3648 - uchar.h
3649 - configure.in
3665 ---------------------------------------------------------------------------- ***
3677 - ucdstrip:
3684 - ucdstrip and ucdmerge:
3696 - handle new enumerated properties in sub read_uchar
3697 - run preparse.pl
3700 - new binary properties
3703 - new enumerated properties
3707 - new block & script & line break values
3710 - case-ignorable changes
3715 - makedata.mak
3716 - uchar.h
3717 - configure.in
3720 - verify that u_charMirror() round-trips
3721 - test all new properties and some new values of old properties
3726 - Unihan range end moves from 9FA5 to 9FBB
3727 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
3734 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
3739 …urce\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C …
3740 …ource\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C …
3741 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
3744 - compare new special casing context conditions with previous ones
3748 - consider storing only the short name if it is the same as the long name
3751 - UAX #29 changes (grapheme/word/sentence breaks)
3752 - UAX #14 changes (line breaks)
3753 - Pattern_Syntax & Pattern_White_Space
3755 ---------------------------------------------------------------------------- ***
3768 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
3769 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
3772 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
3774 http://www.unicode.org/review/resolved-pri.html#pri26
3775 - undone again because no corrigendum in sight;
3779 - update from http://www.unicode.org/copyright.html
3783 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
3784 - add U_LB_INSEPARABLE due to a spelling fix
3787 - new binary properties
3792 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
3793 - perl script: correctly calculate the maximum number of fields per row
3796 - new script code Hrkt=Katakana_Or_Hiragana
3799 - "FNC" -> "FC_NFKC"
3800 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
3803 - changed from 3 columns to 2, dropping the numeric type
3808 - makedata.mak
3809 - uchar.h
3810 - configure.in
3813 - update test of default bidi classes according to PRI #28
3815 http://www.unicode.org/review/resolved-pri.html#pri28
3816 - bidi tests: change exemplar character for ES depending on Unicode version
3817 - change hardcoded expected property values where they change
3822 - read UCD.html
3825 - use new Hrkt=Katakana_Or_Hiragana
3828 - are now part of combining character sequences
3829 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ