1<?xml version="1.0" encoding="US-ASCII"?>
2<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
3<!ENTITY RFC4646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4646.xml">
4<!ENTITY rfc5646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5646.xml">
6<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
7<?rfc strict="yes" ?>
8<?rfc toc="yes"?>
9<?rfc tocdepth="4"?>
10<?rfc symrefs="yes"?>
11<?rfc sortrefs="yes" ?>
12<?rfc compact="yes" ?>
13<?rfc subcompact="no" ?>
14<rfc category="info" docName="draft-davis-t-langtag-ext-08" ipr="trust200902"
15	submissionType="independent"
17	<front>
20		<title abbrev="BCP 47 Extension T">BCP 47 Extension T - Transformed Content</title>
22		<author fullname="Mark Davis" initials="M.E." surname="Davis">
23			<organization>Google</organization>
24			<address>
25				<email>mark@macchiato.com</email>
26			</address>
27		</author>
29		<author fullname="Addison Phillips" initials="A" surname="Phillips">
30			<organization>Lab126</organization>
31			<address>
32				<email>addison@lab126.com</email>
33			</address>
34		</author>
36		<author initials="Y" surname="Umaoka" fullname="Yoshito Umaoka">
37			<organization abbrev="IBM">IBM</organization>
38			<address>
39				<email>yoshito_umaoka@us.ibm.com</email>
40			</address>
41		</author>
43        <author initials="C" surname="Falk" fullname="Courtney Falk">
44            <organization abbrev="Infinite Automata">Infinite Automata</organization>
45            <address>
46                <email>court@infiauto.com</email>
47            </address>
48        </author>
50		<date month="December" year="2011" day="6" />
54		<!-- Meta-data Declarations -->
56		<area>General</area>
58		<workgroup>Internet Engineering Task Force</workgroup>
62		<keyword>locale</keyword>
63		<keyword>bcp 47</keyword>
65		<!-- Keywords will be incorporated into HTML output files in a meta tag
66			but they have no effect on text or nroff output. If you submit your draft
67			to the RFC Editor, the keywords will be used for the search engine. -->
69		<abstract>
70			<t>
71				This document specifies an Extension to BCP 47
72				which provides
73				subtags
74				for specifying the source language or script of transformed
75				content,
76				including content
77				that
78				has been transliterated, transcribed, or
79				translated, or in some other way influenced by the source. It also provides for additional information used for
80				identification.
81			</t>
82		</abstract>
83	</front>
85	<middle>
86		<section title="Introduction">
87			<t>
88				<xref target="BCP47"></xref>
89				permits the definition and registration of language tag extensions
90				"that contain a language component and are compatible with
91				applications that
92				understand language tags". This document defines an
93				extension for
94				specifying the source of content that has been transformed,
95				including text that has been transliterated, transcribed, or
96				translated, or in some other way influenced by the source.
97				It may be used in queries to request content that has been
98				transformed.
99				The "singleton" identifier for this extension is 't'.
100			</t>
101			<t>
102				Language tags, as defined by
103				<xref target="BCP47"></xref>, are useful for identifying the language of content.
104				There are
105				mechanisms for specifying variant subtags for special purposes.
106				However, these variants are insufficient for specifying content that has
107				undergone
108				transformations,
109				including content that has been
110				transliterated,
111				transcribed, or
112				translated.
113				The correct interpretation of the content may depend upon knowledge of the conventions used for the transformation.
114			</t>
115			<t>
116			   Suppose that Italian or Russian
117			   cities on a map are transcribed for Japanese users. Each name needs to be
118			   transliterated into katakana using rules appropriate for the specific
119			   source and target language.   When tagging such data, it is important
120			   to be able to indicate not only the resulting content language ("ja"
121			   in this case), but also the source language.</t>
122						<t>Transforms such as transliterations may vary depending not only on the
123			   basis of the source and target script, but also on the source and target language.
124			   Thus the
125			   Russian &lt;U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds to
126			   the Cyrillic &lt;PE, U, TE, I, EN>) transliterates into "Putin" in
127			   English but "Poutine" in French.  The identifier could be used to indicate
128			   a desired mechanical transformation in an API, or could be used to tag
129			   data that has been converted (mechanically or by hand) according to a
130			   transliteration method.</t>
131				<t>
132				In addition, many different conventions have arisen for how to transform text, even between the same languages and scripts.
133                For example, "Gaddafi" is commonly transliterated from Arabic to English as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y).
134				Some examples of  standardized conventions used for transcribing or transliterating text include:
135                <list style="letters">
136					<t>United Nations Group of Experts on Geographical Names (UNGEGN)</t>
137					<t>US Library of Congress (LOC)</t>
138					<t>US Board on Geographic Names (BGN)</t>
139					<t>Korean Ministry of Culture, Sports and Tourism (MCST)</t>
140					<t>International Organization for Standardization (ISO)</t>
141			     </list>
142				</t>
143				<t>The usage of this extension is not limited to formal transformations,
144				and may include other instances where the content is in some other way influenced by the source.
145				For example, this extension could be used to designate a request for a speech recognizer
146				that is tailored specifically for 2nd-language speakers who are
147				1st-language speakers of a particular language (e.g. a recognizer for "English spoken with a Chinese accent").</t>
148			<section title="Requirements Language">
149				<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
150					NOT",
152					in this
153					document are to be interpreted as described in RFC 2119.</t>
154			</section>
155		</section>
159    <?rfc needLines="8" ?>
161		<section title="BCP47 Required Information">
162            <section title="Overview">
163				<t>
164					Identification of transformed content can be done using the 't' extension
165					defined in this document.
166					This extension is formed by the 't'
167					singleton followed by a sequence of subtags that would form a
168					language tag as defined by
169					<xref target="BCP47"></xref>.
170					This allows for the source language or script to be specified to
171					the degree of precision required.
172					There are restrictions on the
173					sequence of subtags.
174					They MUST form a regular, valid, canonical
175					language
176					tag, and MUST neither include extensions nor private use
177					sequences introduced by the
178					singleton
179					'x'.
180					Where only the script is
181					relevant (such as identifying
182					a
183					script-script
184					transliteration) then
185					'und' is used for the primary language subtag.
186				</t>
187				<t>For example:</t>
188				<texttable>
189					<ttcol>Language Tag</ttcol>
191					<ttcol>Description</ttcol>
193					<c>ja-t-it</c>
195					<c>The content is Japanese, transformed from Italian.</c>
197					<c>ja-Kana-t-it</c>
199					<c>The content is Japanese Katakana, transformed from Italian.</c>
201					<c>und-Latn-t-und-cyrl</c>
203					<c>The content is in the Latin script, transformed from the Cyrillic
204						script.</c>
206				</texttable>
207				<t>
208					Note that the sequence of subtags governed by 't' cannot contain a
209					singleton (a single-character subtag), because that would start a
210					new extension.
211					For example, the tag "ja-t-i-ami"
212					does not indicate
213					that the source is in "i-ami", because "i-ami" is not a
214					regular
215					language tag in
216					<xref target="BCP47"></xref>. That tag would express an empty 't' extension followed by an 'i'
217					extension.
218				</t>
219				<t>The 't' extension is not intended for use in structured data that already provides
220				separate source and target language identifiers.
221				For example, this is the case in localization interchange formats such as XLIFF.
222				In such cases, it would be inappropriate to use "ja-t-it" for the target language tag because the source language tag
223				"it" would already be present in the data. Instead one would use the language tag "ja".
224				</t>
225				<t>As noted earlier, it is sometimes necessary to indicate additional
226					information about a transformation.
227					This additional information is optionally supplied after the source in a series of one or more fields,
228					where each field consists of a field separator subtag followed by one or more non-separator subtags.
229					Each field separator subtag consists of a single letter followed by a single digit.
230					</t>
231				<t>A transformation mechanism is an optional field that indicates
232					the
233					specification used for the transformation, such as "UNGEGN" for
234					the
235					the United Nations Group of Experts on
236					Geographical
237					Names
238					transliterations and transcriptions. It uses the 'm0' field separator followed by certain subtags.
239				</t>
240				<t>For example:</t>
241				<texttable>
242					<ttcol>Language Tag</ttcol>
244					<ttcol>Description</ttcol>
246					<c>und-Cyrl-t-und-latn-m0-ungegn-2007</c>
248					<c>the content is in Cyrillic, transformed from Latn, according
249						to a
250						UNGEGN specification dated 2007.</c>
252				</texttable>
253				<t>The field separator subtags such as 'm0' were chosen because they are
254					short, visually distinctive,
255					and cannot occur in a language subtag
256					(outside of an extension and
257					after 'x'),
258					thus eliminating the
259					potential for collision or confusion with the
260					source language tag.</t>
261				<t>
262					The field subtags are defined by
263					<eref target="http://unicode.org/reports/tr35/">Section 3</eref>
264					of
265					<xref target="UTS35">Unicode
266						Technical Standard #35: Unicode Locale Data
267						Markup Language</xref> (LDML), the main specification for the Unicode
268                    Common Locale Data Repository (CLDR) project.
269                    As required by BCP 47, subtags follow the language tag ABNF and
270					other rules for the formation of language tags and subtags, are
271					restricted to the ASCII letters and digits, are not case sensitive,
272					and do not exceed eight characters in length.
273				</t>
274				<t>
275					EDITORIAL NOTE: This new facility has been accepted by the Unicode
276				    CLDR committee for incorporation into the next versions of CLDR and LDML, parallel
277					with the structure of the 'u' extension
278					<xref target="RFC6067"></xref>,
279					for which it is already the maintaining authority.
280					The data and
281					specification will be available by the time this internet
282					draft has
283					been
284					approved.
285				</t>
286				<t>The LDML specification is available over the Internet and at no cost, and
287					is
288					available via a royalty-free license at
289					http://unicode.org/copyright.html. LDML is versioned, and each
290					version of LDML is numbered, dated, and stable. Extension subtags,
291					once
292					defined by LDML, are never retracted or substantially changed in meaning. </t>
293				<t>The maintaining authority for the 't' extension is
294					the Unicode
295					Consortium:</t>
297				<texttable>
298					<ttcol>Item</ttcol>
300					<ttcol>Value</ttcol>
302					<c>Name</c>
304					<c>Unicode Consortium</c>
306					<c>Contact Email</c>
308					<c>cldr-contact@unicode.org</c>
310					<c>Discussion List Email</c>
312					<c>cldr-users@unicode.org</c>
314					<c>URL Location</c>
316					<c>cldr.unicode.org</c>
318					<c>Specification</c>
320					<c>Unicode Technical Standard #35 Unicode Locale Data Markup
321						Language (LDML), http://unicode.org/reports/tr35/</c>
322					<c>Section</c>
324					<c>Section 3 Unicode Language and Locale Identifiers</c>
325				</texttable>
326            </section>
327			<section title="Structure" anchor="structure">
328				<t>The subtags in the 't' extension are of the following form:</t>
330<artwork type='abnf'>
331t-ext=    "t"                      ; Extension
332          (("-" lang *("-" field)) ; Source + optional field(s)
333          / 1*("-" field))         ; Field(s) only (no source)
335lang=     language                 ; BCP47, with restrictions
336          ["-" script]
337          ["-" region]
338          *("-" variant)
340field=    sep 1*("-" 3*8alphanum)  ; With restrictions
342sep=      ALPHA DIGIT              ; Subtag separators
343alphanum= ALPHA / DIGIT
346                <t>where &lt;language>, &lt;script>, &lt;region>, and &lt;variant> rules are specified in <xref target="BCP47"></xref>,
347                &lt;ALPHA> and &lt;DIGIT> rules - in <xref target="RFC5234"></xref>.</t>
348				<t>Description and restrictions:
349					<list style="letters">
350						<t>The 't' extension MUST have at least one subtag.</t>
351						<t>
352							The 't' extension normally starts with a source language tag,
353							which MUST be a regular, canonical language tag as specified by
354							<xref target="BCP47"></xref>.
355							Tags described by the 'irregular' production in BCP 47 MUST NOT
356							be
357							used to form the language tag.
358							The source language tag MAY be
359							omitted: some field values do not
360							require it.
361						</t>
362						<t>There is optionally a sequence of fields, where each field has a
363							separator followed by a sequence of one or more subtags.
364							Two identical field
365							separators MUST NOT be present in the language tag.</t>
366						<t>
367							The order of the fields in a 't' extension is not significant. The order of subtags within a field is significant.
368							(See
369							<xref target='canonicalization' />
370							Canonicalization.)
371						</t>
372		                <t>
373		                    The 't' subtag fields are defined by
374		                    <eref target="http://unicode.org/reports/tr35/">Section 3</eref>
375		                    of
376		                    <xref target="UTS35">Unicode
377		                        Technical Standard #35: Unicode Locale
378		                        Data Markup Language</xref>.
379		                </t>
380					</list>
381				</t>
382			</section>
383            <section title="Canonicalization" anchor="canonicalization">
384                <t>As required by
385                    <xref target="BCP47"></xref>, the use of uppercase or lowercase letters is not significant in
386                    the subtags used in this extension. The canonical form for all
387                    subtags in the extension is lowercase, with the fields ordered by
388                    the separators, alphabetically.
389                    The order of subtags within a field is significant, and MUST NOT be changed in the process of canonicalizing.</t>
390            </section>
391            <section title="BCP47 Registration Form" anchor="regform">
392                <t>
393                    Per
394                    <xref target="BCP47">RFC 5646, Section 3.7</xref>:
395                </t>
396                <figure>
397                    <artwork>
399Identifier: t
400Description: Specifying Transformed Content
401Comments: Subtags for the identification of content that has been
402transformed, including but not limited to:
403transliteration, transcription, and translation.
404Added: 2010-mm-dd
405RFC: [TBD]
406Authority: Unicode Consortium
407Contact_Email: cldr-contact@unicode.org
408Mailing_List: cldr-users@unicode.org
409URL: http://www.unicode.org/Public/cldr/latest/core.zip
410%% </artwork>
411                </figure>
413            </section>
414            <section title="Field Definitions" anchor="summary">
415                <t>Assignment of 't' field subtags is determined by the Unicode CLDR
416                    Technical Committee, in accordance with the policies and procedures
417                    in
418                    <eref target="http://www.unicode.org/consortium/tc-procedures.html">http://www.unicode.org/consortium/tc-procedures.html</eref>,
419                    and subject to the Unicode Consortium Policies on
420                    <eref target="http://www.unicode.org/policies/policies.html">http://www.unicode.org/policies/policies.html</eref>.</t>
421                <t>
422                    Assignments that can be made by successive versions of
423                    <xref target="UTS35">LDML</xref>
424                    by the Unicode Consortium without requiring a new RFC include:
425                    <list style="symbols">
426                    <t>The
427                    allocation of new field separator subtags for use after the 't' extension.</t>
428                    <t>The allocation of subtags valid after a field separator subtag.</t>
429                    <t>The addition of subtag aliases and descriptions. </t>
430                    <t>The modification of subtag descriptions.</t>
431                    </list>
432                    Changes to the syntax or meaning of the 't' extension would require a new
433                    RFC that obsoletes this document; such an RFC would break stability, and
434                    would thus be contrary to the policies of the Unicode Consortium.
435                </t>
436				<t>
437				  At the time this document was published, one field was specified in
438				  <xref target="UTS35"></xref>: the transform mechanism.
439                  That field is summarized here:
440					<list style="letters">
441						<t>
442							The transform mechanism consists of a sequence of
443							subtags
444							starting
445							with the 'm0' separator followed by one or more
446							mechanism subtags.
447							Each mechanism subtag has a length of 3 to 8
448							alphanumeric
449							characters.
450							The sequence as a whole provides an
451							identification of the
452							specification
453							for the transform,
454							such as the
455							mechanism subtag 'ungegn' in
456							"und-Cyrl-t-und-latn-m0-ungegn".
457							In
458							many cases, only one mechanism subtag is necessary, but
459							multiple
460							subtags MAY be defined in
461							<xref target="UTS35"></xref>
462							where necessary.
463						</t>
464						<t>
465							Any purely numeric subtag is a representation of a date in the
466							Gregorian calendar.
467							It MAY occur in any mechanism field, but it SHOULD only be used where necessary.
468							If it does occur:
469							<list style="symbols">
470								<t>it MUST occur as the final subtag in the field</t>
471								<t>it MUST NOT be the only subtag in the field</t>
472								<t>it MUST only consist of a sequence of digits of the form YYYY,
473									YYYYMM, or YYYYMMDD</t>
474                                <t>it SHOULD be as short as possible</t>
475                            <t>Note: The format is related to that of <xref target="RFC3339"></xref>, but is not the same.
476                            The RFC 3339 full-date won't work because it uses hyphens. The offset ("Z") is not used
477                            because the date is a publication date (aka 'floating date'). For more information, see
478                             Section 3.3, Floating Time in
479                             <xref target="W3C-TimeZones"></xref>.</t>
480							</list>
481							Examples:
482							<list style="symbols">
483							<t>20110623 represents June 23rd, 2011.</t>
484							<t>There are 3 dated versions of the UNGEGN transliteration
485                            specification for Hebrew to Latin. They can be represented by the following language tags:
486                            <list style="symbols">
487                                <t>und-Hebr-t-und-Latn-m0-ungegn-1972</t>
488                                <t>und-Hebr-t-und-Latn-m0-ungegn-1977</t>
489                                <t>und-Hebr-t-und-Latn-m0-ungegn-2007</t>
490                            </list>
491							</t>
492							<t>Suppose that the BGN transliteration
493							specification for Cyrillic to Latin had three versions,
494							dated
495							June 11th, 1999; Dec 30th, 1999; and May 1st, 2011.
496							In that
497							case, the corresponding first two DATE subtags would require
498							months
499							to be distinctive (199906 and 199912), but the last
500							subtag
501							would only
502							require the year (2011).</t>
503							</list>
504						</t>
505						<t>
506							Some mechanisms may use a versioning system that is not
507							distinguished by date, or not by date alone.
508							In the latter case,
509							the version will be of a form specified by
510							<xref target="UTS35"></xref>
511							for that mechanism.
512							For example, if the mechanism XXX uses
513							versions of the form v21a,
514							then a tag could look like
515							"ja-t-it-m0-xxx-v21a". If there are
516							multiple subversions
517							distinguished by date,
518							then a tag could look like
519							"ja-t-it-m0-xxx-v21a-2007".
520						</t>
521					</list>
523				</t>
524				<t>A language tag with the 't' extension MAY be used to request a specific transform of content.
525				In such a case, the recipient SHOULD return content that corresponds
526				as closely as feasible to the requested transform, including the specification of the mechanism.
527				For example, if the request is ja-t-it-m0-xxx-v21a-2007,
528				and the recipient has content corresponding to both ja-t-it-m0-xxx-v21a and ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred.
529				As is the case for language matching as discussed in <xref target="BCP47"></xref>,
530				different implementations MAY have different measures of "closeness".</t>
531			</section>
532			<section title="Registration of Field Subtags" anchor="registration">
533				<t>Registration of transform mechanisms is requested by filing a ticket at
534					<eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>.
535					The proposal in the ticket MUST contain the following information:</t>
536				<texttable>
537                    <ttcol>Item</ttcol>
538                    <ttcol>Description</ttcol>
539                    <c>Subtag</c>
540                    <c>The proposed mechanism subtag (or subtag sequence).</c>
541                    <c>Description</c>
542                    <c>A description of the proposed mechanism; that description MUST be sufficient to distinguish it from other mechanisms in use.</c>
543                    <c>Version</c>
544                    <c>If versioning for the mechanism is not done according to date, then a description of the versioning conventions used for the mechanism.</c>
545				</texttable>
546                <t>Proposals for clarifications of descriptions or additional aliases may also be requested by filing a ticket.</t>
547                <t>The committee MAY define a template for submissions that requests more information,
548                 if it is found that such information would be useful in evaluating proposals.</t>
549			</section>
550            <section title="Registration of Additional Fields" anchor="field-registration">
551                <t>In the event that it proves necessary to add an additional field (such as 'm2'),
552                it can be requested by filing a ticket at
553                    <eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>.
554                    The proposal in the ticket MUST contain a full description of the
555                    proposed field semantics and subtag syntax,
556                    and MUST be conform to the ABNF syntax for "field" presented in <xref target="structure" />.</t>
557            </section>
558            <section title="Committee Responses to Registration Proposals" anchor="committee-responses">
559                <t>The committee MUST post each proposal publicly within 2 weeks after reception,
560                to allow for comments. The committee must respond publicly to each proposal within 4 weeks after reception.</t>
561                <t>The response MAY:
562                    <list style="symbols">
563                        <t>request more information or clarification</t>
564                        <t>accept the proposal, optionally with modifications to the subtag or description</t>
565                        <t>reject the proposal, because of significant objections raised on the mailing list or
566                        due to problems with constraints in this document or in <xref target="UTS35"></xref></t>
567                    </list>
568                </t>
569                <t>Accepted tickets result in a new entry in the machine-readable CLDR BCP47 data,
570                or in the case of a clarified description,
571                modifications to the description attribute value for an existing entry.</t>
572            </section>
573            <section title="Machine-Readable Data" anchor="machine-readable">
574				<t>
575					EDITORIAL NOTE: The following parallels the structure used for the
576					'u' extension
577					<xref target="RFC6067"></xref>,
578					for which the Unicode Consortium is the maintaining authority.
579					The
580					data and
581					specification will be available by the time this internet
582					draft has
583					been
584					approved. The description field is in the process of being added to CLDR.
585				</t>
586				<t>
587					Beginning with CLDR version 1.7.2, machine-readable files are
588					available listing the data defined for BCP47 extensions for each
589					successive version of
590					<xref target="UTS35"></xref>. These releases are listed on
591					<eref target="http://cldr.unicode.org/index/downloads">http://cldr.unicode.org/index/downloads</eref>.
592					Each release has an associated data directory of the form
593					"http://unicode.org/Public/cldr/&lt;version&gt;", where
594					"&lt;version&gt;" is replaced by the release number. For example,
595					for version 1.7.2, the "core.zip" file is located at
596					<eref target="http://unicode.org/Public/cldr/1.7.2/">http://unicode.org/Public/cldr/1.7.2/core.zip</eref>.
597					The most
598                    recent version is always identified by the version "latest" and can
599                    be accessed by the URL in
600                    <xref target="regform"></xref>.</t>
601			     <t>Inside the "core.zip" file, the directory "common/bcp47" contains the
602					data files listing the valid attributes, keys, and types for each successive version of <xref target="UTS35"></xref>.
603					Each data file list the keys and types relevant to that topic. For example, mechanism.xml contains the subtags (types) for the 't' mechanisms.</t>
604					<t>The XML structure lists the keys, such as &lt;key extension="t" name="m0" alias="collation" description="Transliteration extension mechanism">, with subelements for the types,
605					such as &lt;type name="ungegn" description="United Nations Group of Experts on Geographical Names"/>. The currently defined attributes for the mechanisms include:</t>
606			     <texttable>
607                    <ttcol>Attribute</ttcol>
608                    <ttcol>Description</ttcol>
609                    <ttcol>Examples</ttcol>
611                    <c>name</c>
612                    <c>The name of the mechanism, limited to 3-8 characters (or sequences of them).</c>
613                    <c>UNGEGN, ALALC</c>
615                    <c>description</c>
616                    <c>A description of the name, with all and only that information necessary to distinguish one name
617                     from others with which it might be confused.  Descriptions are not intended to provide general background information.</c>
618                    <c>United Nations Group of Experts on Geographical Names; American Library Association-Library of Congress</c>
620                    <c>since</c>
621                    <c>Indicates the first version of CLDR where the name appears. (Required for new items.)</c>
622                    <c>1.9, 2.0.1</c>
624                    <c>alias</c>
625                    <c>Alternative name of the key or type, not limited in number of characters. Aliases are intended for backwards compatibility,
626                    not to provide all possible alternate names or designations. (Optional)</c>
627                    <c></c>
629				</texttable>
630				<t>The file for the transform extension is "transform.xml".
631				The initial version of that file contains the following information.</t>
632				<figure><artwork>
633&lt;key extension="t" name="m0" description=
634      "Transliteration extension mechanism"/>
635   &lt;type name="ungegn" description=
636      "United Nations Group of Experts on Geographical Names"/>
637   &lt;type name="alaloc" description=
638      "American Library Association-Library of Congress"/>
639   &lt;type name="bgn" description=
640      "US Board on Geographic Names"/>
641   &lt;type name="mcst" description=
642      "Korean Ministry of Culture, Sports and Tourism"/>
643   &lt;type name="iso" description=
644      "International Organization for Standardization"/>
645   &lt;type name="din" description=
646      "Deutsches Institut fuer Normung"/>
647   &lt;type name="gost" description=
648      "Euro-Asian Council for Standardization, Metrology
649       and Certification"/>
651				</artwork></figure>
652				<t>
653					To get the version information in XML when working with the data
654					files, the XML parser must be validating. When the 'core.zip' file
655					is unzipped, the 'dtd' directory will be at the same level as the
656					'bcp47' directory; that is required for correct validation. For
657					each release after CLDR 1.8, types introduced in that release are
658					also marked in the data files by the XML attribute "since", such as
659					in the following example:
660					<figure>
661						<artwork>&lt;type name="adp" since="1.9"/&gt; </artwork>
662					</figure>
663				</t>
664				<t>
665					The data is also currently maintained in a source code repository,
666					with each release tagged, for viewing directly without unzipping.
667					For example, see:
668					<list style="symbols">
669						<t>http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/</t>
670						<t>http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/</t>
671					</list>
672				</t>
673				<t>For more information, see
674				<eref target="http://cldr.unicode.org/index/bcp47-extension">http://cldr.unicode.org/index/bcp47-extension</eref>.</t>
675			</section>
676		</section>
677		<section anchor="Acknowledgements" title="Acknowledgements">
678			<t>Thanks to John Emmons and the rest of the Unicode
679				CLDR Technical
680				Committee for their work in developing the BCP 47 subtags
681				for LDML.</t>
682		</section>
684		<section anchor="IANA" title="IANA Considerations">
685			<t>
686				This document will require IANA to insert the record of
687				<xref target="regform"></xref>
688				into the Language Extensions Registry, according to
689				Section 3.7,
690				Extensions and the Extensions Registry of "Tags for
691				Identifying
692				Languages" in
693				<xref target="BCP47"></xref>. Per Section 5.2 of
694				<xref target="BCP47"></xref>, there might be occasional (rare) requests by the Unicode
695				Consortium (the "Authority" listed in the record) for maintenance of
696				this record. Changes that can be submitted to IANA without the
697				publication of a new RFC are limited to modification of the
698				Comments, Contact_Email, Mailing_List, and URL fields. Any such
699				requested changes MUST use the domain 'unicode.org' in any new
700				addresses or URIs, MUST explicitly cite this document (so that IANA
701				can reference these requirements), and MUST originate from the
702				'unicode.org' domain. The domain or authority can only be changed
703				via a new RFC.
704			</t>
705			<t>This document does not require IANA to create or maintain a new
706				registry or otherwise impact IANA.</t>
707		</section>
709		<section anchor="Security" title="Security Considerations">
710			<t>
711				The security considerations for this extension are the same as those
712				for
713				<xref target="BCP47"></xref>. See
714				<xref target="BCP47">RFC 5646, Section 6, Security Considerations</xref>.
715			</t>
716		</section>
717	</middle>
721	<back>
722		<references title="Normative References">
723			<reference anchor="UTS35" target="http://www.unicode.org/reports/tr35/">
724				<front>
725					<title abbrev="LDML">
726						Unicode Technical Standard #35: Locale Data
727						Markup Language (LDML)
728						</title>
729					<author initials="M" surname="Davis" fullname="Mark Davis">
730						<organization>Unicode Consortium</organization>
731					</author>
732					<date day="21" month="December" year="2007" />
733				</front>
734			</reference>
735			<reference anchor="BCP47">
736				<front>
737					<title abbrev="BCP47">Tags for the Identification of Language (BCP47)</title>
738					<author initials="M.E." surname="Davis" fullname="Mark Davis"
739						role="editor">
740						<organization>Google</organization>
741					</author>
742                    <author initials="A." surname="Phillips" fullname="Addison Phillips"
743                        role="editor">
744                        <organization>Lab126</organization>
745                    </author>
746					<date month="September" year="2009" />
747				</front>
748			</reference>
749			<reference anchor="RFC6067">
750				<front>
751					<title abbrev="RFC6067">BCP 47 Extension U</title>
752					<author initials="M.E." surname="Davis" fullname="Mark Davis"
753						role="editor">
754						<organization>Google
755						</organization>
756					</author>
757                    <author initials="A." surname="Phillips" fullname="Addison Phillips"
758                        role="editor">
759                        <organization>Lab126</organization>
760                    </author>
761                    <author initials="Y." surname="Umaoka" fullname="Yoshito Umaoka"
762                        role="editor">
763                        <organization>IBM</organization>
764                    </author>
765					<date month="September" year="2010" />
766				</front>
767			</reference>
768			<reference anchor="RFC5234">
769				<front>
770					<title>Augmented BNF for Syntax Specifications: ABNF</title>
771					<author surname="Crocker" fullname="Dave Crocker"
772                        role="editor">
773						<organization>International Organization for Standardization</organization>
774					</author>
775					<date year="2008" />
776					<abstract>
777						<t>   Internet technical specifications often need to define a formal
778   syntax.  Over the years, a modified version of Backus-Naur Form
779   (BNF), called Augmented BNF (ABNF), has been popular among many
780   Internet specifications.  The current specification documents ABNF.
781   It balances compactness and simplicity with reasonable
782   representational power.  The differences between standard BNF and
783   ABNF involve naming rules, repetition, alternatives, order-
784   independence, and value ranges.  This specification also supplies
785   additional rule definitions and encoding for a core lexical analyzer
786   of the type common to several Internet specifications.</t>
787					</abstract>
788				</front>
789			</reference>
790		</references>
791		<references title="Informative References">
792			<reference anchor="ldml-registry">
793				<front>
794					<title>Registry for Common Locale Data Repository tag elements</title>
795					<author fullname="Unicode Consortium"></author>
796					<date year="2009" month="September" />
797				</front>
798			</reference>
799            <reference anchor="W3C-TimeZones" target="http://www.w3.org/TR/2011/NOTE-timezone-20110705/">
800                <front>
801                    <title>W3C Working Group Note: Working with Time Zones</title>
802                    <author surname="Phillips" fullname="Addison Phillips" role="editor">
803                        <organization>W3C</organization>
804                    </author>
805                    <date year="2011" month="July" />
806                </front>
807            </reference>
808            <reference anchor="RFC3339">
809                <front>
810                    <title>Date and Time on the Internet: Timestamps</title>
811                    <author surname="Klyne" fullname="Graham Klyne"
812                        role="editor">
813                        <organization>Clearswift Corporation</organization>
814                    </author>
815                    <author surname="Newman" fullname="Chris Newman"
816                        role="editor">
817                        <organization>Sun Microsystems</organization>
818                    </author>
819                    <date year="2002" />
820                    <abstract>
821                        <t>   This document specifies an Internet standards track protocol for the
822   Internet community, and requests discussion and suggestions for
823   improvements.  Please refer to the current edition of the "Internet
824   Official Protocol Standards" (STD 1) for the standardization state
825   and status of this protocol.  Distribution of this memo is unlimited.
826                        </t>
827                    </abstract>
828                </front>
829            </reference>
830		</references>
833	</back>