1# Copyright (C) 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html 3# Copyright (C) 2010-2014, International Business Machines Corporation and others. 4# All Rights Reserved. 5# 6# Commands for regenerating ICU4C locale data (.txt files) from CLDR. 7# 8# The process requires local copies of 9# - CLDR (the source of most of the data, and some Java tools) 10# - ICU4J (used only for checking the converted data) 11# - ICU4C (the destination for the new data, and the source for some of it) 12# (Either check out ICU4C from Subversion, or download the additional 13# icu4c-*-data.zip file so that the icu/source/data/ directory is fully 14# populated.) 15# 16# For an official CLDR data integration into ICU, these should be clean, freshly 17# checked-out. For released CLDR sources, an alternative to checking out sources 18# for a given version is downloading the zipped sources for the common (core.zip) 19# and tools (tools.zip) directory subtrees from the Data column in 20# [http://cldr.unicode.org/index/downloads]. 21# 22# The versions of each of these must match. Included with the release notes for 23# ICU is the version number and/or a CLDR svn tag name for the revision of CLDR 24# that was the source of the data for that release of ICU. 25# 26# Note: Some versions of the OpenJDK will not build the CLDR java utilities. 27# If you see compilation errors complaining about type incompatibilities with 28# functions on generic classes, try switching to the Sun JDK. 29# 30# Besides a standard JDK, the process also requires ant 31# (http://ant.apache.org/), 32# plus the xml-apis.jar from the Apache xalan package 33# (http://xml.apache.org/xalan-j/downloads.html). 34# 35# Note: Enough things can (and will) fail in this process that it is best to 36# run the commands separately from an interactive shell. They should all 37# copy and paste without problems. 38# 39# It is often useful to save logs of the output of many of the steps in this 40# process. The commands below put log files in /tmp; you may want to put them 41# somewhere else. 42# 43#---- 44# 45# There are several environment variables that need to be defined. 46# 47# a) Java- and ant-related variables 48# 49# JAVA_HOME: Path to JDK (a directory, containing e.g. bin/java, bin/javac, 50# etc.); on many systems this can be set using 51# `/usr/libexec/java_home`. 52# 53# ANT_OPTS: You may want to set: 54# 55# -Xmx1024m, to give Java more memory; otherwise it may run out 56# of heap. 57# 58# b) CLDR-related variables 59# 60# CLDR_DIR: Path to root of CLDR sources, below which are the common and 61# tools directories. 62# CLDR_CLASSES: Defined relative to CLDR_DIR. It only needs to be set if you 63# are not running ant jar for CLDR and have a non-default output 64# folder for cldr-tools classes. 65# 66# c) ICU-related variables 67# These variables only need to be set if you're directly reusing the 68# commands below. 69# 70# ICU4C_DIR: Path to root of ICU4C sources, below which is the source dir. 71# 72# ICU4J_ROOT: Path to root of ICU4J sources, below which is the main dir. 73# 74#---- 75# 76# If you are adding or removing locales, or specific kinds of locale data, 77# there are some xml files in the ICU sources that need to be updated (these xml 78# files are used in addition to the CLDR files as inputs to the CLDR data build 79# process for ICU): 80# 81# icu/trunk/source/data/icu-config.xml - Update <locales> to add or remove 82# CLDR locales for inclusion in ICU. Update <paths> to prefer 83# alt forms for certain paths, or to exclude certain paths; note 84# that <paths> items can only have draft or alt attributes. 85# 86# Note that if a language-only locale (e.g. "de") is included in 87# <locales>, then all region sublocales for that language that 88# are present in CLDR data (e.g. "de_AT", "de_BE", "de_CH", etc.) 89# should also be included in <locales>, per PMC policy decision 90# 2012-05-02 (see http://bugs.icu-project.org/trac/ticket/9298). 91# 92# icu/trunk/source/data/build.xml - If you are adding or removing break 93# iterators, you need to update <fileset id="brkitr" ...> under 94# <target name="clean" ...> to clean the correct set of files. 95# 96# icu/trunk/source/data/xml/ - If you are adding a new locale, break 97# iterator, collation tailoring, or rule-based number formatter, 98# you may need to add a corresponding xml file in (respectively) 99# the main/, brkitr/, collation/, or rbnf/ subdirectory here. 100# 101#---- 102# 103# For an official CLDR data integration into ICU, there are some additional 104# considerations: 105# 106# a) Don't commit anything in ICU sources (and possibly any changes in CLDR 107# sources, depending on their nature) until you have finished testing and 108# resolving build issues and test failures for both ICU4C and ICU4J. 109# 110# b) There are version numbers that may need manual updating in CLDR (other 111# version numbers get updated automatically, based on these): 112# 113# common/dtd/ldml.dtd - update cldrVersion 114# common/dtd/ldmlBCP47.dtd - update cldrVersion 115# common/dtd/ldmlSupplemental.dtd - update cldrVersion 116# tools/java/org/unicode/cldr/util/CLDRFile.java - update GEN_VERSION 117# 118# c) After everything is committed, you will need to tag the CLDR, ICU4J, and 119# ICU4C sources that ended up being used for the integration; see step 17 120# below. 121# 122################################################################################ 123 124# 1a. Java and ant variables, adjust for your system 125 126export JAVA_HOME=`/usr/libexec/java_home` 127export ANT_OPTS="-Xmx1024m" 128 129# 1b. CLDR variables, adjust for your setup; with cygwin it might be e.g. 130# CLDR_DIR=`cygpath -wp /build/cldr` 131 132export CLDR_DIR=$HOME/cldr/trunk 133#export CLDR_CLASSES=$CLDR_DIR/tools/java/classes 134 135# 1c. ICU variables 136 137export ICU4C_DIR=$HOME/icu/icu/trunk 138export ICU4J_ROOT=$HOME/icu/icu4j/trunk 139 140# 2. Build the CLDR Java tools 141 142cd $CLDR_DIR/tools/java 143#cd $CLDR_DIR/cldr-tools 144ant jar 145 146# 3. Configure ICU4C, build and test without new data first, to verify that 147# there are no pre-existing errors (configure shown here for MacOSX, adjust 148# for your platform). 149 150cd $ICU4C_DIR/source 151./runConfigureICU MacOSX 152make all 2>&1 | tee /tmp/icu4c-oldData-makeAll.txt 153make check 2>&1 | tee /tmp/icu4c-oldData-makeCheck.txt 154 155# 4. Build the new ICU4C data files; these include .txt files and .mk files. 156# These new files will replace whatever was already present in the ICU4C sources. 157# This process uses ant with ICU's data/build.xml and data/icu-config.xml to 158# operate (via CLDR's ant/CLDRConverterTool.java and ant/CLDRBuild.java) the 159# necessary CLDR tools including LDML2ICUConverter, ConvertTransforms, etc. 160# This process will take several minutes. 161# Keep a log so you can investigate anything that looks suspicious. 162 163cd $ICU4C_DIR/source/data 164ant clean 165ant all 2>&1 | tee /tmp/cldrNN-buildLog.txt 166 167# 5. Check which data files have modifications, which have been added or removed 168# (if there are no changes, you may not need to proceed further). Make sure the 169# list seems reasonable. 170 171svn status 172 173# 6. Fix any errors, investigate any warnings. Some warnings are expected, 174# including warnings for missing versions in locale names which specify some 175# collationvariants, e.g. 176# [cldr-build] WARNING (ja_JP_TRADITIONAL): No version #?? 177# [cldr-build] WARNING (zh_TW_STROKE): No version #?? 178# and warnings for some empty collation bundles, e.g. 179# [cldr-build] WARNING (en): warning: No collations found. Bundle will ... 180# [cldr-build] WARNING (to): warning: No collations found. Bundle will ... 181# 182# Fixing may entail modifying CLDR source data or tools - for example, 183# updating the validSubLocales for collation data (file a bug if appropriate). 184# Repeat steps 4-5 until there are no build errors and no unexpected 185# warnings. 186 187# 7. Now rebuild ICU4C with the new data and run make check tests. 188# Again, keep a log so you can investigate the errors. 189 190cd $ICU4C_DIR/source 191make check 2>&1 | tee /tmp/icu4c-newData-makeCheck.txt 192 193# 8. Investigate each test case failure. The first run processing new CLDR data 194# from the Survey Tool can result in thousands of failures (in many cases, one 195# CLDR data fix can resolve hundreds of test failures). If the error is caused 196# by bad CLDR data, then file a CLDR bug, fix the data, and regenerate from 197# step 4. If the data is OK but the testcase needs to be updated because the 198# data has legitimately changed, then update the testcase. You will check in 199# the updated testcases along with the new ICU data at the end of this process. 200# Note that if the new data has any differences in structure, you will have to 201# update test/testdata/structLocale.txt or /tsutil/cldrtest/TestLocaleStructure 202# may fail. 203# Repeat steps 4-7 until there are no errors. 204 205# 9. Now run the make check tests in exhaustive mode: 206 207cd $ICU4C_DIR/source 208export INTLTEST_OPTS="-e" 209export CINTLTST_OPTS="-e" 210make check 2>&1 | tee /tmp/icu4c-newData-makeCheckEx.txt 211 212# 10. Again, investigate each failure, fixing CLDR data or ICU test cases as 213# appropriate, and repeating steps 4-7 and 9 until there are no errors. 214 215# 11. Now with ICU4J, build and test without new data first, to verify that 216# there are no pre-existing errors (or at least to have the pre-existing errors 217# as a base for comparison): 218 219cd $ICU4J_ROOT 220ant all 2>&1 | tee /tmp/icu4j-oldData-antAll.txt 221ant check 2>&1 | tee /tmp/icu4j-oldData-antCheck.txt 222 223# 12. Now build the new data for ICU4J 224 225cd $ICU4C_DIR/source/data 226make icu4j-data-install 227 228# 13. Now rebuild ICU4J with the new data and run tests: 229# Keep a log so you can investigate the errors. 230 231cd $ICU4J_ROOT 232ant check 2>&1 | tee /tmp/icu4j-newData-antCheck.txt 233 234# 14. Investigate test case failures; fix test cases and repeat from step 12, 235# or fix CLDR data and repeat from step 4, as appropriate, until; there are no 236# more failures in ICU4C or ICU4J (except failures that were present before you 237# began testing the new CLDR data). 238 239# 15. Check the file changes; then svn add or svn remove as necessary, and 240# commit the changes. 241 242cd $ICU4C_DIR/source 243svn status 244# add or remove as necessary, then commit 245 246cd $ICU4J_ROOT 247svn status 248# add or remove as necessary, then commit 249 250# 16. For an official CLDR data integration into ICU, now tag the CLDR, ICU4J, 251# and ICU4C sources with an appropriate CLDR milestone (you can check previous 252# tags for format), e.g.: 253 254svn copy svn+ssh://unicode.org/repos/cldr/trunk \ 255svn+ssh://unicode.org/repos/cldr/tags/release-NNN \ 256--parents -m "cldrbug nnnn: tag cldr sources for NNN" 257 258svn copy svn+ssh://source.icu-project.org/repos/icu/icu4j/trunk \ 259svn+ssh://source.icu-project.org/repos/icu/icu4j/tags/cldr-NNN \ 260--parents -m 'ticket:mmmm: tag the version used for integrating CLDR NNN' 261 262svn copy svn+ssh://source.icu-project.org/repos/icu/icu/trunk \ 263svn+ssh://source.icu-project.org/repos/icu/icu/tags/cldr-NNN \ 264--parents -m 'ticket:mmmm: tag the version used for integrating CLDR NNN' 265 266