1 2 3 4Internet Engineering Task Force M. Davis 5Internet-Draft Google 6Intended status: BCP A. Phillips 7Expires: July 17, 2010 Lab126 8 Y. Umaoka 9 IBM 10 January 13, 2010 11 12 13 BCP 47 Extension U 14 draft-davis-u-langtag-ext-00 15 16Abstract 17 18 This document specifies an Extension to BCP 47 which provides subtags 19 that specify language and/or locale-based behavior or refinements to 20 language tags, according to work done by the Unicode Consortium. 21 22Status of this Memo 23 24 This Internet-Draft is submitted to IETF in full conformance with the 25 provisions of BCP 78 and BCP 79. 26 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 31 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 36 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt. 39 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 42 43 This Internet-Draft will expire on July 17, 2010. 44 45Copyright Notice 46 47 Copyright (c) 2010 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 49 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 53 54 55Davis, et al. Expires July 17, 2010 [Page 1] 56 57Internet-Draft BCP 47 Unicode Locale Extension January 2010 58 59 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the BSD License. 67 68 69Table of Contents 70 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 72 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 3 73 2. BCP47 Required Information . . . . . . . . . . . . . . . . . . 3 74 2.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 2.1.1. Canonicalization . . . . . . . . . . . . . . . . . . . 5 76 2.2. Registration Form . . . . . . . . . . . . . . . . . . . . . 5 77 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 5 78 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 79 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 80 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 81 6.1. Normative References . . . . . . . . . . . . . . . . . . . 6 82 6.2. Informative References . . . . . . . . . . . . . . . . . . 6 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111Davis, et al. Expires July 17, 2010 [Page 2] 112 113Internet-Draft BCP 47 Unicode Locale Extension January 2010 114 115 1161. Introduction 117 118 [BCP47] permits the definition and registration of language tag 119 extensions "that contain a language component and are compatible with 120 applications that understand language tags". This document defines 121 an extension for identifying Unicode locale-based variations using 122 language tags. The "singleton" identifier for this extension is 'u'. 123 1241.1. Requirements Language 125 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 128 document are to be interpreted as described in RFC 2119. 129 130 1312. BCP47 Required Information 132 133 Language tags, as defined by [BCP47], are useful for identifying the 134 language of content. They are also used as locale identifiers (or 135 can be mapped to locales) in many operating environments and APIs. 136 However, most such locale identifiers also provide additional 137 "tailorings" or options for specific values within a language, 138 culture, region, or other variation. This extension provides a 139 mechanism for using these additional tailorings within language tags 140 for general interchange. 141 142 The maintaining authority for this extension's registry is the 143 Unicode Consortium. Unicode defines common locale data and 144 identifiers for this data: 145 146 +---------------+---------------------------------------------------+ 147 | Item | Value | 148 +---------------+---------------------------------------------------+ 149 | Name | Unicode Consortium | 150 | Contact Email | cldr@unicode.org | 151 | Discussion | cldr-users@unicode.org | 152 | List Email | | 153 | URL Location | cldr.unicode.org | 154 | Specification | Unicode Technical Standard #35 Unicode Locale | 155 | | Data Markup Language (LDML), | 156 | | http://unicode.org/reports/tr35/ | 157 | Section | Section 3.2 BCP 47 Tag Conversion | 158 +---------------+---------------------------------------------------+ 159 160 The specification of extension subtags is provided by Section 3 of 161 Unicode Technical Standard #35 Unicode Locale Data Markup Language 162 [LDML]. As required by BCP 47, subtags follow the language tag ABNF 163 and other rules for the formation of language tags and subtags, are 164 165 166 167Davis, et al. Expires July 17, 2010 [Page 3] 168 169Internet-Draft BCP 47 Unicode Locale Extension January 2010 170 171 172 restricted to the ASCII letters and digits, are not case sensitive, 173 and do not exceed eight characters in length. 174 175 [LDML] specifies a canonical representation. LDML is available over 176 the Internet and at no cost, and is available via a royalty-free 177 license at http://unicode.org/copyright.html. LDML is versioned, and 178 each version of LDML is numbered, dated, and stable. Extension 179 subtags, once defined by LDML, are never retracted or change in 180 meaning in a substantial way. 181 1822.1. Summary 183 184 The subtags available for use in the 'u' extension consist of a set 185 of attributes, keys, and types. Attributes, keys, types, and their 186 respective meanings are defined in Section 3 (Unicode Language and 187 Locale Identifiers) of [LDML]. The following is a summary of that 188 definition (for details see Section 3): 189 190 o An 'attribute' is a subtag with a length of three or more 191 characters following the singleton and preceding any 'keyword' 192 sequences. No attributes were defined at the time of this 193 document's publication. 194 195 o A 'keyword' is a sequence of subtags consisting of a 'key' subtag, 196 followed by zero or more 'type' subtags. Each 'key' MUST be 197 unique within the extension. The order of the 'type' subtags 198 within a 'keyword' is sometimes significant to their 199 interpretation. Note that 'keys' can appear without a subsequent 200 'type' subtag. 201 202 A. A 'key' is a subtag with a length of exactly two characters. 203 Each 'key' is followed by zero or more 'type' subtags. 204 205 B. A 'type' is a subtag with a length of three or more characters 206 following a key. 'Type' subtags are specific to a particular 207 'key' and the order of the 'type' subtags MAY be significant 208 to the interpretation of the 'keyword'. 209 210 For example, the language tag "de-DE-u-attr-co-phonebk" consists of: 211 212 o The base language tag "de-DE" (German as used in Germany), exactly 213 as defined by [BCP47] using subtags from the IANA Language Subtag 214 Registry. 215 216 o The singleton 'u', identifying this extension. 217 218 o The attribute 'attr', which is an example for illustration (no 219 attributes were defined at the time this document was published). 220 221 222 223Davis, et al. Expires July 17, 2010 [Page 4] 224 225Internet-Draft BCP 47 Unicode Locale Extension January 2010 226 227 228 o The keyword 'co-phonebk', consisting to the key 'co' (Collation) 229 and the type 'phonebk' (Phonebook collation order). 230 231 With successive versions of [LDML], additional attributes, keys, and 232 types MAY be defined. Once defined, attributes, keys, and types will 233 never be removed. Machine-readable files listing the valid 234 attributes, keys, and types are available in the CLDR repository for 235 each version. For example, for version 1.7.2, the files are located 236 at http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/. 237 These also can contain aliases which were used in previous versions 238 of [LDML]. 239 2402.1.1. Canonicalization 241 242 As required by [BCP47], case is not significant. The canonical form 243 for all subtags in the extension is lowercase. The canonical order 244 of attributes is in [US-ASCII] order (that is, numbers before 245 letters, with letters sorted as lowercase US-ASCII code points). The 246 canonical order of keywords is in [US-ASCII] order by key. The order 247 of subtags within a keyword is significant; the meaning of this 248 extension is altered if those subtags are rearranged. Thus, the 249 canonical form of the extension never reorders the subtags within a 250 keyword. 251 2522.2. Registration Form 253 254 Per [RFC5646], Section 3.7: 255 %% 256 Identifier: u 257 Description: Unicode Locale 258 Comments: Subtags for the identification of language and cultural 259 variations. Used to set behavior in locale APIs. 260 Added: 2009-mm-dd 261 RFC: [TBD] 262 Authority: Unicode Consortium 263 Contact_Email: cldr@unicode.org 264 Mailing_List: cldr-users@unicode.org 265 URL: http://cldr.unicode.org 266 %% 267 268 2693. Acknowledgements 270 271 Thanks to John Emmons and the rest of the Unicode CLDR Technical 272 Committee for their work in developing the BCP 47 subtags for LDML. 273 274 275 276 277 278 279Davis, et al. Expires July 17, 2010 [Page 5] 280 281Internet-Draft BCP 47 Unicode Locale Extension January 2010 282 283 2844. IANA Considerations 285 286 This document will require IANA to insert the record in Section 2.2 287 into the Language Extensions Registry, according to Section 3.7. 288 Extensions and the Extensions Registry of "Tags for Identifying 289 Languages" in [BCP47]. There might be occasional maintenance of this 290 record. This document does not require IANA to create or maintain a 291 new registry or otherwise impact IANA. 292 293 2945. Security Considerations 295 296 The security considerations for this extension are the same as those 297 for [RFC5646] (or its successors). See Section 6. Security 298 Considerations of [RFC5646]. 299 300 3016. References 302 3036.1. Normative References 304 305 [BCP47] Davis, M., Ed., "Tags for the Identification of Language 306 (BCP47)", September 2009. 307 308 [LDML] Davis, M., "Unicode Technical Standard #35: Locale Data 309 Markup Language (LDML)", December 2007, 310 <http://www.unicode.org/reports/tr35/>. 311 312 [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying 313 Languages", BCP 47, RFC 5646, September 2009. 314 315 [US-ASCII] 316 International Organization for Standardization, "ISO/IEC 317 646:1991, Information technology -- ISO 7-bit coded 318 character set for information interchange.", 1991. 319 3206.2. Informative References 321 322 [ldml-registry] 323 "Registry for Common Locale Data Repository tag elements", 324 September 2009. 325 326 327 328 329 330 331 332 333 334 335Davis, et al. Expires July 17, 2010 [Page 6] 336 337Internet-Draft BCP 47 Unicode Locale Extension January 2010 338 339 340Authors' Addresses 341 342 Mark Davis 343 Google 344 345 Email: mark@macchiato.com 346 347 348 Addison Phillips 349 Lab126 350 351 Email: addison@inter-locale.com 352 353 354 Yoshito Umaoka 355 IBM 356 357 Email: yoshito_umaoka@us.ibm.com 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391Davis, et al. Expires July 17, 2010 [Page 7] 392 393 394