1
2
3
4Internet Engineering Task Force                                 M. Davis
5Internet-Draft                                                    Google
6Intended status: BCP                                         A. Phillips
7Expires: July 17, 2010                                            Lab126
8                                                               Y. Umaoka
9                                                                     IBM
10                                                        January 13, 2010
11
12
13                           BCP 47 Extension U
14                      draft-davis-u-langtag-ext-00
15
16Abstract
17
18   This document specifies an Extension to BCP 47 which provides subtags
19   that specify language and/or locale-based behavior or refinements to
20   language tags, according to work done by the Unicode Consortium.
21
22Status of this Memo
23
24   This Internet-Draft is submitted to IETF in full conformance with the
25   provisions of BCP 78 and BCP 79.
26
27   Internet-Drafts are working documents of the Internet Engineering
28   Task Force (IETF), its areas, and its working groups.  Note that
29   other groups may also distribute working documents as Internet-
30   Drafts.
31
32   Internet-Drafts are draft documents valid for a maximum of six months
33   and may be updated, replaced, or obsoleted by other documents at any
34   time.  It is inappropriate to use Internet-Drafts as reference
35   material or to cite them other than as "work in progress."
36
37   The list of current Internet-Drafts can be accessed at
38   http://www.ietf.org/ietf/1id-abstracts.txt.
39
40   The list of Internet-Draft Shadow Directories can be accessed at
41   http://www.ietf.org/shadow.html.
42
43   This Internet-Draft will expire on July 17, 2010.
44
45Copyright Notice
46
47   Copyright (c) 2010 IETF Trust and the persons identified as the
48   document authors.  All rights reserved.
49
50   This document is subject to BCP 78 and the IETF Trust's Legal
51   Provisions Relating to IETF Documents
52
53
54
55Davis, et al.             Expires July 17, 2010                 [Page 1]
56
57Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
58
59
60   (http://trustee.ietf.org/license-info) in effect on the date of
61   publication of this document.  Please review these documents
62   carefully, as they describe your rights and restrictions with respect
63   to this document.  Code Components extracted from this document must
64   include Simplified BSD License text as described in Section 4.e of
65   the Trust Legal Provisions and are provided without warranty as
66   described in the BSD License.
67
68
69Table of Contents
70
71   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
72     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 3
73   2.  BCP47 Required Information  . . . . . . . . . . . . . . . . . . 3
74     2.1.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4
75       2.1.1.  Canonicalization  . . . . . . . . . . . . . . . . . . . 5
76     2.2.  Registration Form . . . . . . . . . . . . . . . . . . . . . 5
77   3.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 5
78   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
79   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
80   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
81     6.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
82     6.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
83   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 6
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111Davis, et al.             Expires July 17, 2010                 [Page 2]
112
113Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
114
115
1161.  Introduction
117
118   [BCP47] permits the definition and registration of language tag
119   extensions "that contain a language component and are compatible with
120   applications that understand language tags".  This document defines
121   an extension for identifying Unicode locale-based variations using
122   language tags.  The "singleton" identifier for this extension is 'u'.
123
1241.1.  Requirements Language
125
126   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
127   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
128   document are to be interpreted as described in RFC 2119.
129
130
1312.  BCP47 Required Information
132
133   Language tags, as defined by [BCP47], are useful for identifying the
134   language of content.  They are also used as locale identifiers (or
135   can be mapped to locales) in many operating environments and APIs.
136   However, most such locale identifiers also provide additional
137   "tailorings" or options for specific values within a language,
138   culture, region, or other variation.  This extension provides a
139   mechanism for using these additional tailorings within language tags
140   for general interchange.
141
142   The maintaining authority for this extension's registry is the
143   Unicode Consortium.  Unicode defines common locale data and
144   identifiers for this data:
145
146   +---------------+---------------------------------------------------+
147   | Item          | Value                                             |
148   +---------------+---------------------------------------------------+
149   | Name          | Unicode Consortium                                |
150   | Contact Email | cldr@unicode.org                                  |
151   | Discussion    | cldr-users@unicode.org                            |
152   | List Email    |                                                   |
153   | URL Location  | cldr.unicode.org                                  |
154   | Specification | Unicode Technical Standard #35 Unicode Locale     |
155   |               | Data Markup Language (LDML),                      |
156   |               | http://unicode.org/reports/tr35/                  |
157   | Section       | Section 3.2 BCP 47 Tag Conversion                 |
158   +---------------+---------------------------------------------------+
159
160   The specification of extension subtags is provided by Section 3 of
161   Unicode Technical Standard #35 Unicode Locale Data Markup Language
162   [LDML].  As required by BCP 47, subtags follow the language tag ABNF
163   and other rules for the formation of language tags and subtags, are
164
165
166
167Davis, et al.             Expires July 17, 2010                 [Page 3]
168
169Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
170
171
172   restricted to the ASCII letters and digits, are not case sensitive,
173   and do not exceed eight characters in length.
174
175   [LDML] specifies a canonical representation.  LDML is available over
176   the Internet and at no cost, and is available via a royalty-free
177   license at http://unicode.org/copyright.html.  LDML is versioned, and
178   each version of LDML is numbered, dated, and stable.  Extension
179   subtags, once defined by LDML, are never retracted or change in
180   meaning in a substantial way.
181
1822.1.  Summary
183
184   The subtags available for use in the 'u' extension consist of a set
185   of attributes, keys, and types.  Attributes, keys, types, and their
186   respective meanings are defined in Section 3 (Unicode Language and
187   Locale Identifiers) of [LDML].  The following is a summary of that
188   definition (for details see Section 3):
189
190   o  An 'attribute' is a subtag with a length of three or more
191      characters following the singleton and preceding any 'keyword'
192      sequences.  No attributes were defined at the time of this
193      document's publication.
194
195   o  A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
196      followed by zero or more 'type' subtags.  Each 'key' MUST be
197      unique within the extension.  The order of the 'type' subtags
198      within a 'keyword' is sometimes significant to their
199      interpretation.  Note that 'keys' can appear without a subsequent
200      'type' subtag.
201
202      A.  A 'key' is a subtag with a length of exactly two characters.
203          Each 'key' is followed by zero or more 'type' subtags.
204
205      B.  A 'type' is a subtag with a length of three or more characters
206          following a key.  'Type' subtags are specific to a particular
207          'key' and the order of the 'type' subtags MAY be significant
208          to the interpretation of the 'keyword'.
209
210   For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
211
212   o  The base language tag "de-DE" (German as used in Germany), exactly
213      as defined by [BCP47] using subtags from the IANA Language Subtag
214      Registry.
215
216   o  The singleton 'u', identifying this extension.
217
218   o  The attribute 'attr', which is an example for illustration (no
219      attributes were defined at the time this document was published).
220
221
222
223Davis, et al.             Expires July 17, 2010                 [Page 4]
224
225Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
226
227
228   o  The keyword 'co-phonebk', consisting to the key 'co' (Collation)
229      and the type 'phonebk' (Phonebook collation order).
230
231   With successive versions of [LDML], additional attributes, keys, and
232   types MAY be defined.  Once defined, attributes, keys, and types will
233   never be removed.  Machine-readable files listing the valid
234   attributes, keys, and types are available in the CLDR repository for
235   each version.  For example, for version 1.7.2, the files are located
236   at http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/.
237   These also can contain aliases which were used in previous versions
238   of [LDML].
239
2402.1.1.  Canonicalization
241
242   As required by [BCP47], case is not significant.  The canonical form
243   for all subtags in the extension is lowercase.  The canonical order
244   of attributes is in [US-ASCII] order (that is, numbers before
245   letters, with letters sorted as lowercase US-ASCII code points).  The
246   canonical order of keywords is in [US-ASCII] order by key.  The order
247   of subtags within a keyword is significant; the meaning of this
248   extension is altered if those subtags are rearranged.  Thus, the
249   canonical form of the extension never reorders the subtags within a
250   keyword.
251
2522.2.  Registration Form
253
254   Per [RFC5646], Section 3.7:
255   %%
256   Identifier: u
257   Description: Unicode Locale
258   Comments: Subtags for the identification of language and cultural
259       variations. Used to set behavior in locale APIs.
260   Added: 2009-mm-dd
261   RFC: [TBD]
262   Authority: Unicode Consortium
263   Contact_Email: cldr@unicode.org
264   Mailing_List: cldr-users@unicode.org
265   URL: http://cldr.unicode.org
266   %%
267
268
2693.  Acknowledgements
270
271   Thanks to John Emmons and the rest of the Unicode CLDR Technical
272   Committee for their work in developing the BCP 47 subtags for LDML.
273
274
275
276
277
278
279Davis, et al.             Expires July 17, 2010                 [Page 5]
280
281Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
282
283
2844.  IANA Considerations
285
286   This document will require IANA to insert the record in Section 2.2
287   into the Language Extensions Registry, according to Section 3.7.
288   Extensions and the Extensions Registry of "Tags for Identifying
289   Languages" in [BCP47].  There might be occasional maintenance of this
290   record.  This document does not require IANA to create or maintain a
291   new registry or otherwise impact IANA.
292
293
2945.  Security Considerations
295
296   The security considerations for this extension are the same as those
297   for [RFC5646] (or its successors).  See Section 6.  Security
298   Considerations of [RFC5646].
299
300
3016.  References
302
3036.1.  Normative References
304
305   [BCP47]    Davis, M., Ed., "Tags for the Identification of Language
306              (BCP47)", September 2009.
307
308   [LDML]     Davis, M., "Unicode Technical Standard #35: Locale Data
309              Markup Language (LDML)", December 2007,
310              <http://www.unicode.org/reports/tr35/>.
311
312   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
313              Languages", BCP 47, RFC 5646, September 2009.
314
315   [US-ASCII]
316              International Organization for Standardization, "ISO/IEC
317              646:1991, Information technology -- ISO 7-bit coded
318              character set for information interchange.", 1991.
319
3206.2.  Informative References
321
322   [ldml-registry]
323              "Registry for Common Locale Data Repository tag elements",
324              September 2009.
325
326
327
328
329
330
331
332
333
334
335Davis, et al.             Expires July 17, 2010                 [Page 6]
336
337Internet-Draft       BCP 47 Unicode Locale Extension        January 2010
338
339
340Authors' Addresses
341
342   Mark Davis
343   Google
344
345   Email: mark@macchiato.com
346
347
348   Addison Phillips
349   Lab126
350
351   Email: addison@inter-locale.com
352
353
354   Yoshito Umaoka
355   IBM
356
357   Email: yoshito_umaoka@us.ibm.com
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391Davis, et al.             Expires July 17, 2010                 [Page 7]
392
393
394