Lines Matching +full:use +full:- +full:external +full:- +full:names

1 <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
3 <!-- LAST TOUCHED BY: Tim Bray, 8 February 1997 --><!-- The words 'FINAL EDIT' in comments mark pla…
5 publication. --><!ENTITY XML.version "1.0">
8 <!ENTITY w3c.doc.date "02-Feb-1998">
19 <!ENTITY mdash "--">
20 <!-- &#x2014, but nsgmls doesn't grok hex --><!ENTITY com "--">
21 <!ENTITY como "--">
22 <!ENTITY comc "--">
24 <!-- <!ENTITY nbsp "�"> --><!ENTITY nbsp "&#160;">
30 <!-- audience and distribution status: for use at publication time --><!ENTITY doc.audience "publi…
34 <!-- for Panorama *-->
40 <w3c-designation>REC-xml-&iso6.doc.date;</w3c-designation>
41 <w3c-doctype>W3C Recommendation</w3c-doctype>
45 <loc href="http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;">
46 http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;</loc>
47 <loc href="http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml">
48 http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.xml</loc>
49 <loc href="http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html">
50 http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.html</loc>
51 <loc href="http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf">
52 http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.pdf</loc>
53 <loc href="http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps">
54 http://www.w3.org/TR/1998/REC-xml-&iso6.doc.date;.ps</loc>
57 <loc href="http://www.w3.org/TR/REC-xml">
58 http://www.w3.org/TR/REC-xml</loc>
61 <loc href="http://www.w3.org/TR/PR-xml-971208">
62 http://www.w3.org/TR/PR-xml-971208</loc>
63 <!--
64 <loc href='http://www.w3.org/TR/WD-xml-961114'>
65 http://www.w3.org/TR/WD-xml-961114</loc>
66 <loc href='http://www.w3.org/TR/WD-xml-lang-970331'>
67 http://www.w3.org/TR/WD-xml-lang-970331</loc>
68 <loc href='http://www.w3.org/TR/WD-xml-lang-970630'>
69 http://www.w3.org/TR/WD-xml-lang-970630</loc>
70 <loc href='http://www.w3.org/TR/WD-xml-970807'>
71 http://www.w3.org/TR/WD-xml-970807</loc>
72 <loc href='http://www.w3.org/TR/WD-xml-971117'>
73 http://www.w3.org/TR/WD-xml-971117</loc>-->
82 <author><name>C. M. Sperberg-McQueen</name>
108 corrected) for use on the World Wide Web. It is a product of the W3C
113 <p>This specification uses the term URI, which is defined by <bibref ref="Berners-Lee"/>, a work in…
117 <loc href="http://www.w3.org/XML/xml-19980210-errata">http://www.w3.org/XML/xml-19980210-errata</lo…
119 <loc href="mailto:xml-editor@w3.org">xml-editor@w3.org</loc>.
126 World-Wide Web Consortium, XML Working Group, 1996, 1997.</p>
133 <language id="ebnf">Extended Backus-Naur Form (formal grammar)</language>
137 <sitem>1997-12-03 : CMSMcQ : yet further changes</sitem>
138 <sitem>1997-12-02 : TB : further changes (see TB to XML WG,
140 <sitem>1997-12-02 : CMSMcQ : deal with as many corrections and
142 entify hard-coded document date in pubdate element,
145 about refernece to Berners-Lee et al.),
149 re-order back matter so normative appendices come first,
150 re-tag back matter so informative appendices are tagged informdiv1,
156 add reference to 'Fielding draft' (Berners-Lee et al.),
158 drop URIchar non-terminal and use SkipLit instead,
175 <sitem>1997-12-01 : JB : add some column-width parameters</sitem>
176 <sitem>1997-12-01 : CMSMcQ : begin round of changes to incorporate
184 change grammar's handling of internal subset (drop non-terminal markupdecls),
186 add integral-declaration constraint on internal subset,
191 add description of how to generate our name-space rules from
194 <sitem>1997-10-08 : TB : Removed %-constructs again, new rules
196 <sitem>1997-10-01 : TB : Case-sensitive markup; cleaned up
197 element-type defs, lotsa little edits for style</sitem>
198 <sitem>1997-09-25 : TB : Change to elm's new DTD, with
199 substantial detail cleanup as a side-effect</sitem>
200 <sitem>1997-07-24 : CMSMcQ : correct error (lost *) in definition
202 <sitem>Allow all empty elements to have end-tags, consistent with
204 <sitem>1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections:
205 introduce the term 'empty-element tag', note that all empty elements
206 may use it, and elements declared EMPTY must use it.
214 <sitem>1997-06-30 : CMSMcQ : change date, some cosmetic changes,
221 <sitem>1997-06-29 : TB : various edits</sitem>
222 <sitem>1997-06-29 : CMSMcQ : further changes:
228 <sitem>1997-06-28 : CMSMcQ : Various changes for 1 July draft:
239 <sitem>1997-04-02 : CMSMcQ : final corrections of editorial errors
241 well-formed: Webster's Second hyphenates it, and that's enough
243 <sitem>1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self</sitem>
244 <sitem>1997-03-31 : Tim Bray : many changes</sitem>
245 <sitem>1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling),
250 <sitem>1997-03-28 : CMSMcQ : make as many corrections as possible, from
257 but 8879 uses that name for both internal and external entities.)</sitem>
258 <sitem>1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply
259 my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not'
261 <sitem>1997-03-21 : TB : massive changes on plane flight from Chicago
263 <sitem>1997-03-21 : CMSMcQ : correct as many reported errors as possible.
265 <sitem>1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec.</sitem>
266 <sitem>1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for
271 <sitem>1996-11-12 : CMSMcQ : revise using Tim's edits:
274 Suppress QuotedNames, Names (not used).
275 Correct trivial-grammar doc type decl.
279 Charref should use just [0-9] not Digit.
283 Clarify discussion of encoding names.
288 Reserve entity names of the form u-NNNN.
294 <sitem>1996-11-11 : CMSMcQ : revise for style.
296 <sitem>1996-11-10 : CMSMcQ : revise for style.
297 Fix / complete section on names, characters.
301 <sitem>1996-10-31 : TB : Add Entity Handling section</sitem>
302 <sitem>1996-10-30 : TB : Clean up term &amp; termdef. Slip in
304 <sitem>1996-10-28 : TB : Change DTD. Implement some of Michael's
306 XML namespace reservation. Add section on white-space handling.
308 <sitem>1996-10-24 : CMSMcQ : quick tweaks, implement some ERB
312 in marked sections. Call them attribute-value pairs not
313 name-value pairs, except once. Internal subset is optional, needs
316 <sitem>1996-10-16 : TB : track down &amp; excise all DSD references;
318 <sitem>1996-10-?? : TB : consistency check, fix up scraps so
320 <sitem>1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and
332 section on partial-DTD summary PIs to end of Logical Structures
334 Revise DSD syntax section to use Tim's subset-in-a-PI
336 <sitem>1996-10-10 : TB : eliminate name recognizers (and more?)</sitem>
337 <sitem>1996-10-09 : CMSMcQ : revise for style, consistency through 2.3
339 <sitem>1996-10-09 : CMSMcQ : re-unite everything for convenience,
341 <sitem>1996-10-08 : TB : first major homogenization pass</sitem>
342 <sitem>1996-10-08 : TB : turn "current" attribute on div type into
344 <sitem>1996-10-02 : TB : remould into skeleton + entities</sitem>
345 <sitem>1996-09-30 : CMSMcQ : add a few more sections prior to exchange
347 <sitem>1996-09-20 : CMSMcQ : finish transcribing notes.</sitem>
348 <sitem>1996-09-19 : CMSMcQ : begin transcribing notes for draft.</sitem>
349 <sitem>1996-09-13 : CMSMcQ : made outline from notes of 09-06,
355 <div1 id="sec-intro">
358 data objects called <termref def="dt-xml-doc">XML documents</termref> and
366 <p>XML documents are made up of storage units called <termref def="dt-entity">entities</termref>, w…
368 Parsed data is made up of <termref def="dt-character">characters</termref>,
370 of which form <termref def="dt-chardata">character data</termref>,
371 and some of which form <termref def="dt-markup">markup</termref>.
375 <p><termdef id="dt-xml-proc" term="XML Processor">A software module
377 and provide access to their content and structure.</termdef> <termdef id="dt-app" term="Application…
383 <div2 id="sec-origin-goals">
403 <item><p>XML documents should be human-legible and reasonably
420 <!-- is for &doc.audience;.-->
428 <div2 id="sec-terminology">
438 <def><p><termdef id="dt-may" term="May">Conforming documents and XML
446 <!-- do NOT change this! this is what defines a violation of
447 a 'must' clause as 'an error'. -MSM -->
452 <def><p><termdef id="dt-error" term="Error">A violation of the rules of this
459 <def><p><termdef id="dt-fatal" term="Fatal Error">An error
460 which a conforming <termref def="dt-xml-proc">XML processor</termref>
484 <termref def="dt-valid">valid</termref> XML documents.
487 <termref def="dt-validating">validating XML processors</termref>.</p></def>
490 <label>well-formedness constraint</label>
491 <def><p>A rule which applies to all <termref def="dt-wellformed">well-formed</termref> XML document…
492 Violations of well-formedness constraints are
493 <termref def="dt-fatal">fatal errors</termref>.</p></def>
498 <def><p><termdef id="dt-match" term="match">(Of strings or names:)
499 Two strings or names being compared must be identical.
519 <def><p><termdef id="dt-compat" term="For Compatibility">A feature of
525 <def><p><termdef id="dt-interop" term="For interoperability">A
526 non-binding recommendation included to increase the chances that XML
537 <!-- &Docs; -->
539 <div1 id="sec-documents">
542 <p><termdef id="dt-xml-doc" term="XML Document">
545 <termref def="dt-wellformed">well-formed</termref>, as
547 A well-formed XML document may in addition be
548 <termref def="dt-valid">valid</termref> if it meets certain further
552 …t is composed of units called <termref def="dt-entity">entities</termref>. An entity may <termref…
553 inclusion in the document. A document begins in a "root" or <termref def="dt-docent">document enti…
561 in <specref ref="wf-entities"/>.
564 <div2 id="sec-well-formed">
565 <head>Well-Formed XML Documents</head>
567 <p><termdef id="dt-wellformed" term="Well-Formed">
569 a well-formed XML document if:</termdef>
572 matches the production labeled <nt def="NT-document">document</nt>.</p></item>
574 meets all the well-formedness constraints given in this specification.</p>
576 <item><p>Each of the <termref def="dt-parsedent">parsed entities</termref>
578 <titleref href="wf-entities">well-formed</titleref>.</p></item>
583 <prod id="NT-document"><lhs>document</lhs>
584 <rhs><nt def="NT-prolog">prolog</nt>
585 <nt def="NT-element">element</nt>
586 <nt def="NT-Misc">Misc</nt>*</rhs></prod>
589 <p>Matching the <nt def="NT-document">document</nt> production
593 <termref def="dt-element">elements</termref>.</p>
595 <!--* N.B. some readers (notably JC) find the following
601 could however use some recasting when the editors are feeling
602 stronger. -MSM *-->
603 <item><p><termdef id="dt-root" term="Root Element">There is exactly
605 part of which appears in the <termref def="dt-content">content</termref> of any other element.</ter…
606 For all other elements, if the start-tag is in the content of another
607 element, the end-tag is in the content of the same element. More
608 simply stated, the elements, delimited by start- and end-tags, nest
613 <p><termdef id="dt-parentchild" term="Parent/Child">As a consequence
615 for each non-root element
628 <p><termdef id="dt-text" term="Text">A parsed entity contains
630 <termref def="dt-character">characters</termref>,
632 <termdef id="dt-character" term="Character">A <term>character</term>
637 The use of "compatibility characters", as defined in section 6.8
643 <prod id="NT-Char"><lhs>Char</lhs>
644 <rhs>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
645 | [#x10000-#x10FFFF]</rhs>
653 vary from entity to entity. All XML processors must accept the UTF-8
654 and UTF-16 encodings of 10646; the mechanisms for signaling which of
655 the two is in use, or for bringing other encodings into play, are
658 <!--
662 UCS-4 code value.
663 </p>-->
666 <div2 id="sec-common-syn">
670 <p><nt def="NT-S">S</nt> (white space) consists of one or more space (#x20)
676 <prod id="NT-S"><lhs>S</lhs>
688 <p><termdef id="dt-name" term="Name">A <term>Name</term> is a token
692 Names beginning with the string "<code>xml</code>", or any string
698 <p>The colon character within XML names is reserved for experimentation with
703 (There is no guarantee that any name-space mechanism
704 adopted for XML will in fact use the colon as a name-space delimiter.)
705 In practice, this means that authors should not use the colon in XML
706 names except as part of name-space experiments, but that XML processors
710 <nt def="NT-Nmtoken">Nmtoken</nt> (name token) is any mixture of
713 <head>Names and Tokens</head>
714 <prod id="NT-NameChar"><lhs>NameChar</lhs>
715 <rhs><nt def="NT-Letter">Letter</nt>
716 | <nt def="NT-Digit">Digit</nt>
717 | '.' | '-' | '_' | ':'
718 | <nt def="NT-CombiningChar">CombiningChar</nt>
719 | <nt def="NT-Extender">Extender</nt></rhs>
721 <prod id="NT-Name"><lhs>Name</lhs>
722 <rhs>(<nt def="NT-Letter">Letter</nt> | '_' | ':')
723 (<nt def="NT-NameChar">NameChar</nt>)*</rhs></prod>
724 <prod id="NT-Names"><lhs>Names</lhs>
725 <rhs><nt def="NT-Name">Name</nt>
726 (<nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt>)*</rhs></prod>
727 <prod id="NT-Nmtoken"><lhs>Nmtoken</lhs>
728 <rhs>(<nt def="NT-NameChar">NameChar</nt>)+</rhs></prod>
729 <prod id="NT-Nmtokens"><lhs>Nmtokens</lhs>
730 <rhs><nt def="NT-Nmtoken">Nmtoken</nt> (<nt def="NT-S">S</nt> <nt def="NT-Nmtoken">Nmtoken</nt>)*</…
737 (<nt def="NT-EntityValue">EntityValue</nt>),
738 the values of attributes (<nt def="NT-AttValue">AttValue</nt>),
739 and external identifiers
740 (<nt def="NT-SystemLiteral">SystemLiteral</nt>).
741 Note that a <nt def="NT-SystemLiteral">SystemLiteral</nt>
745 <prod id="NT-EntityValue"><lhs>EntityValue</lhs>
748 | <nt def="NT-PEReference">PEReference</nt>
749 | <nt def="NT-Reference">Reference</nt>)*
755 | <nt def="NT-PEReference">PEReference</nt>
756 | <nt def="NT-Reference">Reference</nt>)*
759 <prod id="NT-AttValue"><lhs>AttValue</lhs>
762 | <nt def="NT-Reference">Reference</nt>)*
768 | <nt def="NT-Reference">Reference</nt>)*
771 <prod id="NT-SystemLiteral"><lhs>SystemLiteral</lhs>
775 <prod id="NT-PubidLiteral"><lhs>PubidLiteral</lhs>
776 <rhs>'"' <nt def="NT-PubidChar">PubidChar</nt>*
778 | "'" (<nt def="NT-PubidChar">PubidChar</nt> - "'")* "'"</rhs>
780 <prod id="NT-PubidChar"><lhs>PubidChar</lhs>
782 |&nbsp;[a-zA-Z0-9]
783 |&nbsp;[-'()+,./:=?;!*#@$_%]</rhs>
793 <p><termref def="dt-text">Text</termref> consists of intermingled
794 <termref def="dt-chardata">character
796 <termdef id="dt-markup" term="Markup"><term>Markup</term> takes the form of
797 <termref def="dt-stag">start-tags</termref>,
798 <termref def="dt-etag">end-tags</termref>,
799 <termref def="dt-empty">empty-element tags</termref>,
800 <termref def="dt-entref">entity references</termref>,
801 <termref def="dt-charref">character references</termref>,
802 <termref def="dt-comment">comments</termref>,
803 <termref def="dt-cdsection">CDATA section</termref> delimiters,
804 <termref def="dt-doctype">document type declarations</termref>, and
805 <termref def="dt-pi">processing instructions</termref>.
808 <p><termdef id="dt-chardata" term="Character Data">All text that is not markup
813 delimiters, or within a <termref def="dt-comment">comment</termref>, a
814 <termref def="dt-pi">processing instruction</termref>,
815 or a <termref def="dt-cdsection">CDATA section</termref>.
817 They are also legal within the <termref def="dt-litentval">literal entity
819 <specref ref="wf-entities"/>.
820 <!-- FINAL EDIT: restore internal entity decl or leave it out. -->
822 they must be <termref def="dt-escape">escaped</termref>
823 using either <termref def="dt-charref">numeric character references</termref>
828 "<code>&amp;gt;</code>", and must, <termref def="dt-compat">for
836 a <termref def="dt-cdsection">CDATA section</termref>.
841 not contain the start-delimiter of any markup.
843 is any string of characters not including the CDATA-section-close
847 apostrophe or single-quote character (') may be represented as
848 "<code>&amp;apos;</code>", and the double-quote character (") as
852 <prod id="NT-CharData">
854 <rhs>[^&lt;&amp;]* - ([^&lt;&amp;]* ']]&gt;' [^&lt;&amp;]*)</rhs>
860 <div2 id="sec-comments">
863 <p><termdef id="dt-comment" term="Comment"><term>Comments</term> may
865 <termref def="dt-markup">markup</termref>; in addition,
868 They are not part of the document's <termref def="dt-chardata">character
872 <termref def="dt-compat">For compatibility</termref>, the string
873 "<code>--</code>" (double-hyphen) must not occur within
877 <prod id="NT-Comment"><lhs>Comment</lhs>
878 <rhs>'&lt;!--'
879 ((<nt def="NT-Char">Char</nt> - '-')
880 | ('-' (<nt def="NT-Char">Char</nt> - '-')))*
881 '--&gt;'</rhs>
890 <div2 id="sec-pi">
893 <p><termdef id="dt-pi" term="Processing instruction"><term>Processing
899 <prod id="NT-PI"><lhs>PI</lhs>
900 <rhs>'&lt;?' <nt def="NT-PITarget">PITarget</nt>
901 (<nt def="NT-S">S</nt>
902 (<nt def="NT-Char">Char</nt>* -
903 (<nt def="NT-Char">Char</nt>* &pic; <nt def="NT-Char">Char</nt>*)))?
905 <prod id="NT-PITarget"><lhs>PITarget</lhs>
906 <rhs><nt def="NT-Name">Name</nt> -
910 PIs are not part of the document's <termref def="dt-chardata">character
912 PI begins with a target (<nt def="NT-PITarget">PITarget</nt>) used
914 The target names "<code>XML</code>", "<code>xml</code>", and so on are
918 XML <termref def="dt-notation">Notation</termref> mechanism
924 <div2 id="sec-cdata-sect">
927 <p><termdef id="dt-cdsection" term="CDATA Section"><term>CDATA sections</term>
936 <prod id="NT-CDSect"><lhs>CDSect</lhs>
937 <rhs><nt def="NT-CDStart">CDStart</nt>
938 <nt def="NT-CData">CData</nt>
939 <nt def="NT-CDEnd">CDEnd</nt></rhs></prod>
940 <prod id="NT-CDStart"><lhs>CDStart</lhs>
943 <prod id="NT-CData"><lhs>CData</lhs>
944 <rhs>(<nt def="NT-Char">Char</nt>* -
945 (<nt def="NT-Char">Char</nt>* ']]&gt;' <nt def="NT-Char">Char</nt>*))
948 <prod id="NT-CDEnd"><lhs>CDEnd</lhs>
953 Within a CDATA section, only the <nt def="NT-CDEnd">CDEnd</nt> string is
962 are recognized as <termref def="dt-chardata">character data</termref>, not
963 <termref def="dt-markup">markup</termref>:
968 <div2 id="sec-prolog-dtd">
971 <p><termdef id="dt-xmldecl" term="XML Declaration">XML documents
976 For example, the following is a complete XML document, <termref def="dt-wellformed">well-formed</te…
977 <termref def="dt-valid">valid</termref>:
988 for a document to use the value "<code>1.0</code>"
995 use any particular numbering scheme.
1003 storage and logical structure and to associate attribute-value pairs
1004 with its logical structures. XML provides a mechanism, the <termref def="dt-doctype">document type…
1005 constraints on the logical structure and to support the use of
1008 <termdef id="dt-valid" term="Validity">An XML document is
1013 the first <termref def="dt-element">element</termref> in the document.
1017 <prod id="NT-prolog"><lhs>prolog</lhs>
1018 <rhs><nt def="NT-XMLDecl">XMLDecl</nt>?
1019 <nt def="NT-Misc">Misc</nt>*
1020 (<nt def="NT-doctypedecl">doctypedecl</nt>
1021 <nt def="NT-Misc">Misc</nt>*)?</rhs></prod>
1022 <prod id="NT-XMLDecl"><lhs>XMLDecl</lhs>
1024 <nt def="NT-VersionInfo">VersionInfo</nt>
1025 <nt def="NT-EncodingDecl">EncodingDecl</nt>?
1026 <nt def="NT-SDDecl">SDDecl</nt>?
1027 <nt def="NT-S">S</nt>?
1030 <prod id="NT-VersionInfo"><lhs>VersionInfo</lhs>
1031 <rhs><nt def="NT-S">S</nt> 'version' <nt def="NT-Eq">Eq</nt>
1032 (' <nt def="NT-VersionNum">VersionNum</nt> '
1033 | " <nt def="NT-VersionNum">VersionNum</nt> ")</rhs>
1035 <prod id="NT-Eq"><lhs>Eq</lhs>
1036 <rhs><nt def="NT-S">S</nt>? '=' <nt def="NT-S">S</nt>?</rhs></prod>
1037 <prod id="NT-VersionNum">
1039 <rhs>([a-zA-Z0-9_.:] | '-')+</rhs>
1041 <prod id="NT-Misc"><lhs>Misc</lhs>
1042 <rhs><nt def="NT-Comment">Comment</nt> | <nt def="NT-PI">PI</nt> |
1043 <nt def="NT-S">S</nt></rhs></prod>
1047 <p><termdef id="dt-doctype" term="Document Type Declaration">The XML
1050 <termref def="dt-markupdecl">markup declarations</termref>
1055 The document type declaration can point to an external subset (a
1057 <termref def="dt-extent">external entity</termref>) containing markup
1064 <p><termdef id="dt-markupdecl" term="markup declaration">
1066 an <termref def="dt-eldecl">element type declaration</termref>,
1067 an <termref def="dt-attdecl">attribute-list declaration</termref>,
1068 an <termref def="dt-entdecl">entity declaration</termref>, or
1069 a <termref def="dt-notdecl">notation declaration</termref>.
1072 within <termref def="dt-PE">parameter entities</termref>,
1073 as described in the well-formedness and validity constraints below.
1075 <specref ref="sec-physical-struct"/>.</p>
1079 <prod id="NT-doctypedecl"><lhs>doctypedecl</lhs>
1080 <rhs>'&lt;!DOCTYPE' <nt def="NT-S">S</nt>
1081 <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt>
1082 <nt def="NT-ExternalID">ExternalID</nt>)?
1083 <nt def="NT-S">S</nt>? ('['
1084 (<nt def="NT-markupdecl">markupdecl</nt>
1085 | <nt def="NT-PEReference">PEReference</nt>
1086 | <nt def="NT-S">S</nt>)*
1088 <nt def="NT-S">S</nt>?)? '&gt;'</rhs>
1089 <vc def="vc-roottype"/>
1091 <prod id="NT-markupdecl"><lhs>markupdecl</lhs>
1092 <rhs><nt def="NT-elementdecl">elementdecl</nt>
1093 | <nt def="NT-AttlistDecl">AttlistDecl</nt>
1094 | <nt def="NT-EntityDecl">EntityDecl</nt>
1095 | <nt def="NT-NotationDecl">NotationDecl</nt>
1096 | <nt def="NT-PI">PI</nt>
1097 | <nt def="NT-Comment">Comment</nt>
1099 <vc def="vc-PEinMarkupDecl"/>
1100 <wfc def="wfc-PEinInternalSubset"/>
1107 the <termref def="dt-repltext">replacement text</termref> of
1108 <termref def="dt-PE">parameter entities</termref>.
1110 individual nonterminals (<nt def="NT-elementdecl">elementdecl</nt>,
1111 <nt def="NT-AttlistDecl">AttlistDecl</nt>, and so on) describe
1113 <termref def="dt-include">included</termref>.</p>
1115 <vcnote id="vc-roottype">
1118 The <nt def="NT-Name">Name</nt> in the document type declaration must
1119 match the element type of the <termref def="dt-root">root element</termref>.
1123 <vcnote id="vc-PEinMarkupDecl">
1125 <p>Parameter-entity
1126 <termref def="dt-repltext">replacement text</termref> must be properly nested
1130 declaration (<nt def="NT-markupdecl">markupdecl</nt> above)
1132 <termref def="dt-PERef">parameter-entity reference</termref>,
1135 <wfcnote id="wfc-PEinInternalSubset">
1138 <termref def="dt-PERef">parameter-entity references</termref>
1142 external parameter entities or to the external subset.)
1146 Like the internal subset, the external subset and
1147 any external parameter entities referred to in the DTD
1149 allowed by the non-terminal symbol
1150 <nt def="NT-markupdecl">markupdecl</nt>, interspersed with white space
1151 or <termref def="dt-PERef">parameter-entity references</termref>.
1154 external subset or of external parameter entities may conditionally be ignored
1156 the <termref def="dt-cond-section">conditional section</termref>
1159 <scrap id="ext-Subset">
1160 <head>External Subset</head>
1162 <prod id="NT-extSubset"><lhs>extSubset</lhs>
1163 <rhs><nt def="NT-TextDecl">TextDecl</nt>?
1164 <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs></prod>
1165 <prod id="NT-extSubsetDecl"><lhs>extSubsetDecl</lhs>
1167 <nt def="NT-markupdecl">markupdecl</nt>
1168 | <nt def="NT-conditionalSect">conditionalSect</nt>
1169 | <nt def="NT-PEReference">PEReference</nt>
1170 | <nt def="NT-S">S</nt>
1175 <p>The external subset and external parameter entities also differ
1177 <termref def="dt-PERef">parameter-entity references</termref>
1185 The <termref def="dt-sysid">system identifier</termref>
1189 <eg><![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
1195 If both the external and internal subsets are used, the
1196 internal subset is considered to occur before the external subset.
1197 <!-- 'is considered to'? boo. whazzat mean? -->
1198 This has the effect that entity and attribute-list declarations in the
1199 internal subset take precedence over those in the external subset.
1203 <div2 id="sec-rmd">
1206 as passed from an <termref def="dt-xml-proc">XML processor</termref>
1211 whether or not there are such declarations which appear external to
1212 the <termref def="dt-docent">document entity</termref>.
1216 <prod id="NT-SDDecl"><lhs>SDDecl</lhs>
1218 <nt def="NT-S">S</nt>
1219 'standalone' <nt def="NT-Eq">Eq</nt>
1222 <vc def="vc-check-rmd"/></prod>
1228 are no markup declarations external to the <termref def="dt-docent">document
1229 entity</termref> (either in the DTD external subset, or in an
1230 external parameter entity referenced from the internal subset)
1234 external markup declarations.
1236 denotes the presence of external <emph>declarations</emph>; the presence, in a
1238 references to external <emph>entities</emph>, when those entities are
1241 <p>If there are no external markup declarations, the standalone document
1243 If there are external markup declarations but there is no standalone
1248 <vcnote id="vc-check-rmd">
1251 the value "<code>no</code>" if any external markup declarations
1253 <item><p>attributes with <termref def="dt-default">default</termref> values, if
1258 if <termref def="dt-entref">references</termref> to those
1267 <p>element types with <termref def="dt-elemcontent">element content</termref>,
1276 <div2 id="sec-white-space">
1279 <p>In editing XML documents, it is often convenient to use "white space"
1281 <nt def="NT-S">S</nt> in this specification) to
1287 <p>An <termref def="dt-xml-proc">XML processor</termref>
1289 markup through to the application. A <termref def="dt-validating">
1292 in <termref def="dt-elemcontent">element content</termref>.
1294 <p>A special <termref def="dt-attr">attribute</termref>
1299 <termref def="dt-attdecl">declared</termref> if it is used.
1301 <termref def="dt-enumerated">enumerated type</termref> whose only
1305 default white-space processing modes are acceptable for this element; the
1312 <p>The <termref def="dt-root">root element</termref> of any document
1319 <div2 id="sec-line-ends">
1320 <head>End-of-Line Handling</head>
1321 <p>XML <termref def="dt-parsedent">parsed entities</termref> are often stored in
1324 carriage-return (#xD) and line-feed (#xA).</p>
1325 <p>To simplify the tasks of <termref def="dt-app">applications</termref>,
1326 wherever an external parsed entity or the literal entity value
1328 two-character sequence "#xD#xA" or a standalone literal
1329 #xD, an <termref def="dt-xml-proc">XML processor</termref> must
1336 <div2 id="sec-lang-tag">
1342 A special <termref def="dt-attr">attribute</termref> named
1348 <termref def="dt-attdecl">declared</termref> if it is used.
1353 <prod id="NT-LanguageID"><lhs>LanguageID</lhs>
1354 <rhs><nt def="NT-Langcode">Langcode</nt>
1355 ('-' <nt def="NT-Subcode">Subcode</nt>)*</rhs></prod>
1356 <prod id="NT-Langcode"><lhs>Langcode</lhs>
1357 <rhs><nt def="NT-ISO639Code">ISO639Code</nt> |
1358 <nt def="NT-IanaCode">IanaCode</nt> |
1359 <nt def="NT-UserCode">UserCode</nt></rhs>
1361 <prod id="NT-ISO639Code"><lhs>ISO639Code</lhs>
1362 <rhs>([a-z] | [A-Z]) ([a-z] | [A-Z])</rhs></prod>
1363 <prod id="NT-IanaCode"><lhs>IanaCode</lhs>
1364 <rhs>('i' | 'I') '-' ([a-z] | [A-Z])+</rhs></prod>
1365 <prod id="NT-UserCode"><lhs>UserCode</lhs>
1366 <rhs>('x' | 'X') '-' ([a-z] | [A-Z])+</rhs></prod>
1367 <prod id="NT-Subcode"><lhs>Subcode</lhs>
1368 <rhs>([a-z] | [A-Z])+</rhs></prod>
1370 The <nt def="NT-Langcode">Langcode</nt> may be any of the following:
1372 <item><p>a two-letter language code as defined by
1374 for the representation of names of languages"</p></item>
1377 prefix "<code>i-</code>" (or "<code>I-</code>")</p></item>
1379 between parties in private use; these must begin with the
1380 prefix "<code>x-</code>" or "<code>X-</code>" in order to ensure that they do not conflict
1381 with names later standardized or registered with IANA</p></item>
1383 <p>There may be any number of <nt def="NT-Subcode">Subcode</nt> segments; if
1388 for the representation of names of countries."
1392 unless the <nt def="NT-Langcode">Langcode</nt> begins with the prefix
1393 "<code>x-</code>" or
1394 "<code>X-</code>". </p>
1397 Note that these values, unlike other names in XML documents,
1401 <p xml:lang="en-GB">What colour is it?</p>
1402 <p xml:lang="en-US">What color is it?</p>
1409 <!--<p>The xml:lang value is considered to apply both to the contents of an
1412 values of all of its attributes with free-text (CDATA) values. -->
1417 <!--
1431 -->
1445 <!-- &Elements; -->
1447 <div1 id="sec-logical-struct">
1450 <p><termdef id="dt-element" term="Element">Each <termref def="dt-xml-doc">XML document</termref> co…
1452 either delimited by <termref def="dt-stag">start-tags</termref>
1453 …<termref def="dt-etag">end-tags</termref>, or, for <termref def="dt-empty">empty</termref> element…
1457 has a <termref def="dt-attrname">name</termref> and a <termref def="dt-attrval">value</termref>.
1460 <prod id="NT-element"><lhs>element</lhs>
1461 <rhs><nt def="NT-EmptyElemTag">EmptyElemTag</nt></rhs>
1462 <rhs>| <nt def="NT-STag">STag</nt> <nt def="NT-content">content</nt>
1463 <nt def="NT-ETag">ETag</nt></rhs>
1468 <p>This specification does not constrain the semantics, use, or (beyond
1469 syntax) names of the element types and attributes, except that names
1477 The <nt def="NT-Name">Name</nt> in an element's end-tag must match
1479 the start-tag.
1487 <nt def="NT-elementdecl">elementdecl</nt> where the
1488 <nt def="NT-Name">Name</nt> matches the element type, and
1492 <termref def="dt-content">content</termref>.</p></item>
1493 <item><p>The declaration matches <nt def="NT-children">children</nt> and
1495 <termref def="dt-parentchild">child elements</termref>
1498 matching the nonterminal <nt def="NT-S">S</nt>) between each pair
1500 <item><p>The declaration matches <nt def="NT-Mixed">Mixed</nt> and
1501 the content consists of <termref def="dt-chardata">character
1502 data</termref> and <termref def="dt-parentchild">child elements</termref>
1503 whose types match names in the content model.</p></item>
1505 of any <termref def="dt-parentchild">child elements</termref> have
1510 <div2 id="sec-starttags">
1511 <head>Start-Tags, End-Tags, and Empty-Element Tags</head>
1513 <p><termdef id="dt-stag" term="Start-Tag">The beginning of every
1514 non-empty XML element is marked by a <term>start-tag</term>.
1516 <head>Start-tag</head>
1518 <prod id="NT-STag"><lhs>STag</lhs>
1519 <rhs>'&lt;' <nt def="NT-Name">Name</nt>
1520 (<nt def="NT-S">S</nt> <nt def="NT-Attribute">Attribute</nt>)*
1521 <nt def="NT-S">S</nt>? '&gt;'</rhs>
1524 <prod id="NT-Attribute"><lhs>Attribute</lhs>
1525 <rhs><nt def="NT-Name">Name</nt> <nt def="NT-Eq">Eq</nt>
1526 <nt def="NT-AttValue">AttValue</nt></rhs>
1532 The <nt def="NT-Name">Name</nt> in
1533 the start- and end-tags gives the
1535 <termdef id="dt-attr" term="Attribute">
1536 The <nt def="NT-Name">Name</nt>-<nt def="NT-AttValue">AttValue</nt> pairs are
1539 <termdef id="dt-attrname" term="Attribute Name">with the
1540 <nt def="NT-Name">Name</nt> in each pair
1542 <termdef id="dt-attrval" term="Attribute Value">the content of the
1543 <nt def="NT-AttValue">AttValue</nt> (the text between the
1550 No attribute name may appear more than once in the same start-tag
1551 or empty-element tag.
1563 <head>No External Entity References</head>
1566 to external entities.
1571 <p>The <termref def="dt-repltext">replacement text</termref> of any entity
1576 <p>An example of a start-tag:
1577 <eg>&lt;termdef id="dt-dog" term="dog"&gt;</eg></p>
1578 <p><termdef id="dt-etag" term="End Tag">The end of every element
1579 that begins with a start-tag must
1580 be marked by an <term>end-tag</term>
1582 start-tag:
1584 <head>End-tag</head>
1586 <prod id="NT-ETag"><lhs>ETag</lhs>
1587 <rhs>'&lt;/' <nt def="NT-Name">Name</nt>
1588 <nt def="NT-S">S</nt>? '&gt;'</rhs></prod>
1592 <p>An example of an end-tag:<eg>&lt;/termdef&gt;</eg></p>
1593 <p><termdef id="dt-content" term="Content">The
1594 <termref def="dt-text">text</termref> between the start-tag and
1595 end-tag is called the element's
1600 <prod id="NT-content"><lhs>content</lhs>
1601 <rhs>(<nt def="NT-element">element</nt> | <nt def="NT-CharData">CharData</nt>
1602 | <nt def="NT-Reference">Reference</nt> | <nt def="NT-CDSect">CDSect</nt>
1603 | <nt def="NT-PI">PI</nt> | <nt def="NT-Comment">Comment</nt>)*</rhs>
1608 <p><termdef id="dt-empty" term="Empty">If an element is <term>empty</term>,
1609 it must be represented either by a start-tag immediately followed
1610 by an end-tag or by an empty-element tag.</termdef>
1611 <termdef id="dt-eetag" term="empty-element tag">An
1612 <term>empty-element tag</term> takes a special form:
1616 <prod id="NT-EmptyElemTag"><lhs>EmptyElemTag</lhs>
1617 <rhs>'&lt;' <nt def="NT-Name">Name</nt> (<nt def="NT-S">S</nt>
1618 <nt def="NT-Attribute">Attribute</nt>)* <nt def="NT-S">S</nt>?
1625 <p>Empty-element tags may be used for any element which has no
1628 <termref def="dt-interop">For interoperability</termref>, the empty-element
1630 <termref def="dt-eldecl">declared</termref> <kw>EMPTY</kw>.</p>
1641 <p>The <termref def="dt-element">element</termref> structure of an
1642 <termref def="dt-xml-doc">XML document</termref> may, for
1643 <termref def="dt-valid">validation</termref> purposes,
1645 using element type and attribute-list declarations.
1647 <termref def="dt-content">content</termref>.
1651 appear as <termref def="dt-parentchild">children</termref> of the element.
1655 <p><termdef id="dt-eldecl" term="Element Type declaration">An <term>element
1660 <prod id="NT-elementdecl"><lhs>elementdecl</lhs>
1661 <rhs>'&lt;!ELEMENT' <nt def="NT-S">S</nt>
1662 <nt def="NT-Name">Name</nt>
1663 <nt def="NT-S">S</nt>
1664 <nt def="NT-contentspec">contentspec</nt>
1665 <nt def="NT-S">S</nt>? '&gt;'</rhs>
1667 <prod id="NT-contentspec"><lhs>contentspec</lhs>
1670 | <nt def="NT-Mixed">Mixed</nt>
1671 | <nt def="NT-children">children</nt>
1676 where the <nt def="NT-Name">Name</nt> gives the element type
1693 <div3 id="sec-element-content">
1696 <p><termdef id="dt-elemcontent" term="Element content">An element <termref def="dt-stag">type</term…
1698 type must contain only <termref def="dt-parentchild">child</termref>
1701 <nt def="NT-S">S</nt>).
1708 content particles (<nt def="NT-cp">cp</nt>s), which consist of names,
1712 <head>Element-content Models</head>
1714 <prod id="NT-children"><lhs>children</lhs>
1715 <rhs>(<nt def="NT-choice">choice</nt>
1716 | <nt def="NT-seq">seq</nt>)
1718 <prod id="NT-cp"><lhs>cp</lhs>
1719 <rhs>(<nt def="NT-Name">Name</nt>
1720 | <nt def="NT-choice">choice</nt>
1721 | <nt def="NT-seq">seq</nt>)
1723 <prod id="NT-choice"><lhs>choice</lhs>
1724 <rhs>'(' <nt def="NT-S">S</nt>? cp
1725 ( <nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )*
1726 <nt def="NT-S">S</nt>? ')'</rhs>
1727 <vc def="vc-PEinGroup"/></prod>
1728 <prod id="NT-seq"><lhs>seq</lhs>
1729 <rhs>'(' <nt def="NT-S">S</nt>? cp
1730 ( <nt def="NT-S">S</nt>? ',' <nt def="NT-S">S</nt>? <nt def="NT-cp">cp</nt> )*
1731 <nt def="NT-S">S</nt>? ')'</rhs>
1732 <vc def="vc-PEinGroup"/></prod>
1736 where each <nt def="NT-Name">Name</nt> is the type of an element which may
1737 appear as a <termref def="dt-parentchild">child</termref>.
1739 particle in a choice list may appear in the <termref def="dt-elemcontent">element content</termref>…
1742 appear in the <termref def="dt-elemcontent">element content</termref> in the
1757 the content against an element type in the content model. <termref def="dt-compat">For compatibili…
1761 <!-- appendix <specref ref="determinism"/>. -->
1762 <!-- appendix on deterministic content models. -->
1764 <vcnote id="vc-PEinGroup">
1766 <p>Parameter-entity
1767 <termref def="dt-repltext">replacement text</termref> must be properly nested
1770 in a <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or
1771 <nt def="NT-Mixed">Mixed</nt> construct
1773 <termref def="dt-PERef">parameter entity</termref>,
1775 <p><termref def="dt-interop">For interoperability</termref>,
1776 if a parameter-entity reference appears in a
1777 <nt def="NT-choice">choice</nt>, <nt def="NT-seq">seq</nt>, or
1778 <nt def="NT-Mixed">Mixed</nt> construct, its replacement text
1780 neither the first nor last non-blank
1785 <p>Examples of element-content models:
1788 &lt;!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*&gt;</eg></p>
1791 <div3 id="sec-mixed-content">
1794 <p><termdef id="dt-mixed" term="Mixed Content">An element
1795 <termref def="dt-stag">type</termref> has
1798 <termref def="dt-parentchild">child</termref> elements.</termdef>
1802 <head>Mixed-content Declaration</head>
1804 <prod id="NT-Mixed"><lhs>Mixed</lhs>
1805 <rhs>'(' <nt def="NT-S">S</nt>?
1807 (<nt def="NT-S">S</nt>?
1809 <nt def="NT-S">S</nt>?
1810 <nt def="NT-Name">Name</nt>)*
1811 <nt def="NT-S">S</nt>?
1813 <rhs>| '(' <nt def="NT-S">S</nt>? '#PCDATA' <nt def="NT-S">S</nt>? ')'
1814 </rhs><vc def="vc-PEinGroup"/>
1815 <vc def="vc-MixedChildrenUnique"/>
1820 where the <nt def="NT-Name">Name</nt>s give the types of elements
1823 <vcnote id="vc-MixedChildrenUnique">
1825 <p>The same name must not appear more than once in a single mixed-content
1836 <head>Attribute-List Declarations</head>
1838 <p><termref def="dt-attr">Attributes</termref> are used to associate
1839 name-value pairs with <termref def="dt-element">elements</termref>.
1840 Attribute specifications may appear only within <termref def="dt-stag">start-tags</termref>
1841 and <termref def="dt-eetag">empty-element tags</termref>;
1843 recognize them appear in <specref ref="sec-starttags"/>.
1844 Attribute-list
1851 <item><p>To provide <termref def="dt-default">default values</termref>
1855 <p><termdef id="dt-attdecl" term="Attribute-List Declaration">
1856 <term>Attribute-list declarations</term> specify the name, data type, and default
1859 <head>Attribute-list Declaration</head>
1860 <prod id="NT-AttlistDecl"><lhs>AttlistDecl</lhs>
1861 <rhs>'&lt;!ATTLIST' <nt def="NT-S">S</nt>
1862 <nt def="NT-Name">Name</nt>
1863 <nt def="NT-AttDef">AttDef</nt>*
1864 <nt def="NT-S">S</nt>? '&gt;'</rhs>
1866 <prod id="NT-AttDef"><lhs>AttDef</lhs>
1867 <rhs><nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt>
1868 <nt def="NT-S">S</nt> <nt def="NT-AttType">AttType</nt>
1869 <nt def="NT-S">S</nt> <nt def="NT-DefaultDecl">DefaultDecl</nt></rhs>
1872 The <nt def="NT-Name">Name</nt> in the
1873 <nt def="NT-AttlistDecl">AttlistDecl</nt> rule is the type of an element. At
1876 error. The <nt def="NT-Name">Name</nt> in the
1877 <nt def="NT-AttDef">AttDef</nt> rule is
1880 When more than one <nt def="NT-AttlistDecl">AttlistDecl</nt> is provided for a
1885 <termref def="dt-interop">For interoperability,</termref> writers of DTDs
1886 may choose to provide at most one attribute-list declaration
1889 in each attribute-list declaration.
1891 issue a warning when more than one attribute-list declaration is
1897 <div3 id="sec-attribute-types">
1907 <prod id="NT-AttType"><lhs>AttType</lhs>
1908 <rhs><nt def="NT-StringType">StringType</nt>
1909 | <nt def="NT-TokenizedType">TokenizedType</nt>
1910 | <nt def="NT-EnumeratedType">EnumeratedType</nt>
1913 <prod id="NT-StringType"><lhs>StringType</lhs>
1916 <prod id="NT-TokenizedType"><lhs>TokenizedType</lhs>
1919 <vc def="one-id-per-el"/>
1920 <vc def="id-default"/>
1940 <nt def="NT-Name">Name</nt> production.
1946 <vcnote id="one-id-per-el">
1950 <vcnote id="id-default">
1959 the <nt def="NT-Name">Name</nt> production, and
1961 <nt def="NT-Names">Names</nt>;
1962 each <nt def="NT-Name">Name</nt> must match the value of an ID attribute on
1971 must match the <nt def="NT-Name">Name</nt> production,
1973 <nt def="NT-Names">Names</nt>;
1974 each <nt def="NT-Name">Name</nt> must
1976 name of an <termref def="dt-unparsed">unparsed entity</termref> declared in the
1977 <termref def="dt-doctype">DTD</termref>.
1984 <nt def="NT-Nmtoken">Nmtoken</nt> production;
1986 match <termref def="NT-Nmtokens">Nmtokens</termref>.
1989 <!-- why?
1992 <specref ref="AVNormalize"/>.</p>-->
1993 <p><termdef id="dt-enumerated" term="Enumerated Attribute Values"><term>Enumerated attributes</term…
1998 <prod id="NT-EnumeratedType"><lhs>EnumeratedType</lhs>
1999 <rhs><nt def="NT-NotationType">NotationType</nt>
2000 | <nt def="NT-Enumeration">Enumeration</nt>
2002 <prod id="NT-NotationType"><lhs>NotationType</lhs>
2004 <nt def="NT-S">S</nt>
2006 <nt def="NT-S">S</nt>?
2007 <nt def="NT-Name">Name</nt>
2008 (<nt def="NT-S">S</nt>? '|' <nt def="NT-S">S</nt>?
2009 <nt def="NT-Name">Name</nt>)*
2010 <nt def="NT-S">S</nt>? ')'
2013 <prod id="NT-Enumeration"><lhs>Enumeration</lhs>
2014 <rhs>'(' <nt def="NT-S">S</nt>?
2015 <nt def="NT-Nmtoken">Nmtoken</nt>
2016 (<nt def="NT-S">S</nt>? '|'
2017 <nt def="NT-S">S</nt>?
2018 <nt def="NT-Nmtoken">Nmtoken</nt>)*
2019 <nt def="NT-S">S</nt>?
2024 <termref def="dt-notation">notation</termref>, declared in the
2034 one of the <titleref href="Notations">notation</titleref> names included in
2035 the declaration; all notation names in the declaration must
2043 must match one of the <nt def="NT-Nmtoken">Nmtoken</nt> tokens in the
2047 <p><termref def="dt-interop">For interoperability,</termref> the same
2048 <nt def="NT-Nmtoken">Nmtoken</nt> should not occur more than once in the
2053 <div3 id="sec-attr-defaults">
2056 <p>An <termref def="dt-attdecl">attribute declaration</termref> provides
2063 <prod id="NT-DefaultDecl"><lhs>DefaultDecl</lhs>
2066 <rhs>| (('#FIXED' S)? <nt def="NT-AttValue">AttValue</nt>)</rhs>
2079 <!-- not any more!!
2084 of the application. -->
2085 <termdef id="dt-default" term="Attribute Default">If the
2088 <nt def="NT-AttValue">AttValue</nt> value contains the declared
2099 all elements of the type in the attribute-list declaration.
2115 <p>Examples of attribute-list declarations:
2125 <head>Attribute-Value Normalization</head>
2136 is appended for a "#xD#xA" sequence that is part of an external
2150 by a non-validating parser as if declared
2155 <div2 id="sec-condition-sect">
2157 <p><termdef id="dt-cond-section" term="conditional section">
2159 <termref def="dt-doctype">document type declaration external subset</termref>
2166 <prod id="NT-conditionalSect"><lhs>conditionalSect</lhs>
2167 <rhs><nt def="NT-includeSect">includeSect</nt>
2168 | <nt def="NT-ignoreSect">ignoreSect</nt>
2171 <prod id="NT-includeSect"><lhs>includeSect</lhs>
2174 <nt def="NT-extSubsetDecl">extSubsetDecl</nt>
2178 <prod id="NT-ignoreSect"><lhs>ignoreSect</lhs>
2180 <nt def="NT-ignoreSectContents">ignoreSectContents</nt>*
2184 <prod id="NT-ignoreSectContents"><lhs>ignoreSectContents</lhs>
2185 <rhs><nt def="NT-Ignore">Ignore</nt>
2186 ('&lt;![' <nt def="NT-ignoreSectContents">ignoreSectContents</nt> ']]&gt;'
2187 <nt def="NT-Ignore">Ignore</nt>)*</rhs></prod>
2188 <prod id="NT-Ignore"><lhs>Ignore</lhs>
2189 <rhs><nt def="NT-Char">Char</nt>* -
2190 (<nt def="NT-Char">Char</nt>* ('&lt;![' | ']]&gt;')
2191 <nt def="NT-Char">Char</nt>*)
2197 <p>Like the internal and external DTD subsets, a conditional section
2217 parameter-entity reference, the parameter entity must be replaced by its
2235 <!--
2236 <div2 id='sec-pass-to-app'>
2238 <p>When an XML processor encounters a start-tag, it must make
2245 <p>the names of attributes known to apply to this element type
2246 (validating processors must make available names of all attributes
2247 declared for the element type; non-validating processors must
2248 make available at least the names of the attributes for which
2255 -->
2258 <!-- &Entities; -->
2260 <div1 id="sec-physical-struct">
2263 <p><termdef id="dt-entity" term="Entity">An XML document may consist
2267 the <termref def="dt-doctype">external DTD subset</termref>)
2271 called the <termref def="dt-docent">document entity</termref>, which serves
2272 as the starting point for the <termref def="dt-xml-proc">XML
2275 <termdef id="dt-parsedent" term="Text Entity">A <term>parsed entity's</term>
2277 <termref def="dt-repltext">replacement text</termref>;
2278 this <termref def="dt-text">text</termref> is considered an
2281 <p><termdef id="dt-unparsed" term="Unparsed Entity">An
2284 <termref def="dt-text">text</termref>, and if text, may not be XML.
2286 has an associated <termref def="dt-notation">notation</termref>, identified by name.
2297 <p><termdef id="gen-entity" term="general entity"><term>General entities</term>
2298 are entities for use within the document content.
2302 <termdef id="dt-PE" term="Parameter entity">Parameter entities
2303 are parsed entities for use within the DTD.</termdef>
2304 These two types of entities use different forms of reference and
2310 <div2 id="sec-references">
2312 <p><termdef id="dt-charref" term="Character Reference">
2318 <prod id="NT-CharRef"><lhs>CharRef</lhs>
2319 <rhs>'&amp;#' [0-9]+ ';' </rhs>
2320 <rhs>| '&hcro;' [0-9a-fA-F]+ ';'</rhs>
2321 <wfc def="wf-Legalchar"/>
2324 <wfcnote id="wf-Legalchar">
2328 <termref def="NT-Char">Char</termref>.</p>
2338 <p><termdef id="dt-entref" term="Entity Reference">An <term>entity
2340 <termdef id="dt-GERef" term="General Entity Reference">References to
2342 use ampersand (<code>&amp;</code>) and semicolon (<code>;</code>) as
2344 <termdef id="dt-PERef" term="Parameter-entity reference">
2345 <term>Parameter-entity references</term> use percent-sign (<code>%</code>) and
2351 <prod id="NT-Reference"><lhs>Reference</lhs>
2352 <rhs><nt def="NT-EntityRef">EntityRef</nt>
2353 | <nt def="NT-CharRef">CharRef</nt></rhs></prod>
2354 <prod id="NT-EntityRef"><lhs>EntityRef</lhs>
2355 <rhs>'&amp;' <nt def="NT-Name">Name</nt> ';'</rhs>
2356 <wfc def="wf-entdeclared"/>
2357 <vc def="vc-entdeclared"/>
2361 <prod id="NT-PEReference"><lhs>PEReference</lhs>
2362 <rhs>'%' <nt def="NT-Name">Name</nt> ';'</rhs>
2363 <vc def="vc-entdeclared"/>
2369 <wfcnote id="wf-entdeclared">
2374 the <nt def="NT-Name">Name</nt> given in the entity reference must
2375 <termref def="dt-match">match</termref> that in an
2376 <titleref href="sec-entity-decl">entity declaration</titleref>, except that
2377 well-formed documents need not declare
2381 reference to it which appears in a default value in an attribute-list
2383 <p>Note that if entities are declared in the external subset or in
2384 external parameter entities, a non-validating processor is
2385 <titleref href="include-if-valid">not obligated to</titleref> read
2387 an entity must be declared is a well-formedness constraint only
2388 if <titleref href="sec-rmd">standalone='yes'</titleref>.</p>
2390 <vcnote id="vc-entdeclared">
2392 <p>In a document with an external subset or external parameter
2394 the <nt def="NT-Name">Name</nt> given in the entity reference must <termref def="dt-match">match</t…
2395 <titleref href="sec-entity-decl">entity declaration</titleref>.
2398 specified in <specref ref="sec-predefined-ent"/>.
2401 reference to it which appears in a default value in an attribute-list
2404 <!-- FINAL EDIT: is this duplication too clumsy? -->
2408 An entity reference must not contain the name of an <termref def="dt-unparsed">unparsed entity</ter…
2409 to only in <termref def="dt-attrval">attribute values</termref> declared to
2423 Parameter-entity references may only appear in the
2424 <termref def="dt-doctype">DTD</termref>.
2428 <eg>Type &lt;key&gt;less-than&lt;/key&gt; (&hcro;3C;) to save options.
2430 is classified &amp;security-level;.</eg></p>
2431 <p>Example of a parameter-entity reference:
2432 <eg><![CDATA[<!-- declare the parameter entity "ISOLat2"... -->
2434 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >
2435 <!-- ... now reference it. -->
2439 <div2 id="sec-entity-decl">
2442 <p><termdef id="dt-entdecl" term="entity declaration">
2447 <prod id="NT-EntityDecl"><lhs>EntityDecl</lhs>
2448 <rhs><nt def="NT-GEDecl">GEDecl</nt><!--</rhs><com>General entities</com>
2449 <rhs>--> | <nt def="NT-PEDecl">PEDecl</nt></rhs>
2450 <!--<com>Parameter entities</com>-->
2452 <prod id="NT-GEDecl"><lhs>GEDecl</lhs>
2453 <rhs>'&lt;!ENTITY' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt>
2454 <nt def="NT-S">S</nt> <nt def="NT-EntityDef">EntityDef</nt>
2455 <nt def="NT-S">S</nt>? '&gt;'</rhs>
2457 <prod id="NT-PEDecl"><lhs>PEDecl</lhs>
2458 <rhs>'&lt;!ENTITY' <nt def="NT-S">S</nt> '%' <nt def="NT-S">S</nt>
2459 <nt def="NT-Name">Name</nt> <nt def="NT-S">S</nt>
2460 <nt def="NT-PEDef">PEDef</nt> <nt def="NT-S">S</nt>? '&gt;'</rhs>
2461 <!--<com>Parameter entities</com>-->
2463 <prod id="NT-EntityDef"><lhs>EntityDef</lhs>
2464 <rhs><nt def="NT-EntityValue">EntityValue</nt>
2465 <!--</rhs>
2466 <rhs>-->| (<nt def="NT-ExternalID">ExternalID</nt>
2467 <nt def="NT-NDataDecl">NDataDecl</nt>?)</rhs>
2468 <!-- <nt def='NT-ExternalDef'>ExternalDef</nt></rhs> -->
2470 <!-- FINAL EDIT: what happened to WFs here? -->
2471 <prod id="NT-PEDef"><lhs>PEDef</lhs>
2472 <rhs><nt def="NT-EntityValue">EntityValue</nt>
2473 | <nt def="NT-ExternalID">ExternalID</nt></rhs></prod>
2476 The <nt def="NT-Name">Name</nt> identifies the entity in an
2477 <termref def="dt-entref">entity reference</termref> or, in the case of an
2485 <div3 id="sec-internal-ent">
2488 <p><termdef id="dt-internent" term="Internal Entity Replacement Text">If
2490 <nt def="NT-EntityValue">EntityValue</nt>,
2496 <termref def="dt-litentval">literal entity value</termref> may be required to
2497 produce the correct <termref def="dt-repltext">replacement
2498 text</termref>: see <specref ref="intern-replacement"/>.
2500 <p>An internal entity is a <termref def="dt-parsedent">parsed
2503 <eg>&lt;!ENTITY Pub-Status "This is a pre-release of the
2507 <div3 id="sec-external-ent">
2508 <head>External Entities</head>
2510 <p><termdef id="dt-extent" term="External Entity">If the entity is not
2511 internal, it is an <term>external
2514 <head>External Entity Declaration</head>
2515 <!--
2516 <prod id='NT-ExternalDef'><lhs>ExternalDef</lhs>
2517 <rhs></prod> -->
2518 <prod id="NT-ExternalID"><lhs>ExternalID</lhs>
2519 <rhs>'SYSTEM' <nt def="NT-S">S</nt>
2520 <nt def="NT-SystemLiteral">SystemLiteral</nt></rhs>
2521 <rhs>| 'PUBLIC' <nt def="NT-S">S</nt>
2522 <nt def="NT-PubidLiteral">PubidLiteral</nt>
2523 <nt def="NT-S">S</nt>
2524 <nt def="NT-SystemLiteral">SystemLiteral</nt>
2527 <prod id="NT-NDataDecl"><lhs>NDataDecl</lhs>
2528 <rhs><nt def="NT-S">S</nt> 'NDATA' <nt def="NT-S">S</nt>
2529 <nt def="NT-Name">Name</nt></rhs>
2530 <vc def="not-declared"/></prod>
2532 If the <nt def="NT-NDataDecl">NDataDecl</nt> is present, this is a
2533 general <termref def="dt-unparsed">unparsed
2535 <vcnote id="not-declared">
2538 The <nt def="NT-Name">Name</nt> must match the declared name of a
2539 <termref def="dt-notation">notation</termref>.
2542 <p><termdef id="dt-sysid" term="System Identifier">The
2543 <nt def="NT-SystemLiteral">SystemLiteral</nt>
2556 <termref def="dt-docent">document entity</termref>, to the entity
2557 containing the <termref def="dt-doctype">external DTD subset</termref>,
2558 or to some other <termref def="dt-extent">external parameter entity</termref>.
2560 <p>An XML processor should handle a non-ASCII character in a URI by
2561 representing the character in UTF-8 as one or more bytes, and then
2565 <p><termdef id="dt-pubid" term="Public identifier">
2566 In addition to a system identifier, an external identifier may
2568 An XML processor attempting to retrieve the entity's content may use the public
2570 is unable to do so, it must use the URI specified in the system
2574 <p>Examples of external entity declarations:
2575 <eg>&lt;!ENTITY open-hatch
2577 &lt;!ENTITY open-hatch
2578 PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
2580 &lt;!ENTITY hatch-pic
2589 <div3 id="sec-TextDecl">
2591 <p>External parsed entities may each begin with a <term>text
2596 <prod id="NT-TextDecl"><lhs>TextDecl</lhs>
2598 <nt def="NT-VersionInfo">VersionInfo</nt>?
2599 <nt def="NT-EncodingDecl">EncodingDecl</nt>
2600 <nt def="NT-S">S</nt>? &pic;</rhs>
2608 an external parsed entity.</p>
2610 <div3 id="wf-entities">
2611 <head>Well-Formed Parsed Entities</head>
2612 <p>The document entity is well-formed if it matches the production labeled
2613 <nt def="NT-document">document</nt>.
2614 An external general
2615 parsed entity is well-formed if it matches the production labeled
2616 <nt def="NT-extParsedEnt">extParsedEnt</nt>.
2617 An external parameter
2618 entity is well-formed if it matches the production labeled
2619 <nt def="NT-extPE">extPE</nt>.
2621 <head>Well-Formed External Parsed Entity</head>
2622 <prod id="NT-extParsedEnt"><lhs>extParsedEnt</lhs>
2623 <rhs><nt def="NT-TextDecl">TextDecl</nt>?
2624 <nt def="NT-content">content</nt></rhs>
2626 <prod id="NT-extPE"><lhs>extPE</lhs>
2627 <rhs><nt def="NT-TextDecl">TextDecl</nt>?
2628 <nt def="NT-extSubsetDecl">extSubsetDecl</nt></rhs>
2631 An internal general parsed entity is well-formed if its replacement text
2633 <nt def="NT-content">content</nt>.
2634 All internal parameter entities are well-formed by definition.
2636 <p>A consequence of well-formedness in entities is that the logical
2638 <termref def="dt-stag">start-tag</termref>,
2639 <termref def="dt-etag">end-tag</termref>,
2640 <termref def="dt-empty">empty-element tag</termref>,
2641 <termref def="dt-element">element</termref>,
2642 <termref def="dt-comment">comment</termref>,
2643 <termref def="dt-pi">processing instruction</termref>,
2644 <termref def="dt-charref">character
2646 <termref def="dt-entref">entity reference</termref>
2652 <p>Each external parsed entity in an XML document may use a different
2654 entities in either UTF-8 or UTF-16.
2657 <p>Entities encoded in UTF-16 must
2659 Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF).
2662 XML processors must be able to use this character to
2663 differentiate between UTF-8 and UTF-16 encoded documents.</p>
2665 the UTF-8 and UTF-16 encodings, it is recognized that other encodings are
2667 to read entities that use them.
2669 UTF-8 or UTF-16 must begin with a <titleref href="TextDecl">text
2673 <prod id="NT-EncodingDecl"><lhs>EncodingDecl</lhs>
2674 <rhs><nt def="NT-S">S</nt>
2675 'encoding' <nt def="NT-Eq">Eq</nt>
2676 ('"' <nt def="NT-EncName">EncName</nt> '"' |
2677 "'" <nt def="NT-EncName">EncName</nt> "'" )
2680 <prod id="NT-EncName"><lhs>EncName</lhs>
2681 <rhs>[A-Za-z] ([A-Za-z0-9._] | '-')*</rhs>
2685 In the <termref def="dt-docent">document entity</termref>, the encoding
2686 declaration is part of the <termref def="dt-xmldecl">XML declaration</termref>.
2687 The <nt def="NT-EncName">EncName</nt> is the name of the encoding used.
2689 <!-- FINAL EDIT: check name of IANA and charset names -->
2691 "<code>UTF-8</code>",
2692 "<code>UTF-16</code>",
2693 "<code>ISO-10646-UCS-2</code>", and
2694 "<code>ISO-10646-UCS-4</code>" should be
2697 "<code>ISO-8859-1</code>",
2698 "<code>ISO-8859-2</code>", ...
2699 "<code>ISO-8859-9</code>" should be used for the parts of ISO 8859, and
2701 "<code>ISO-2022-JP</code>",
2703 "<code>EUC-JP</code>"
2704 should be used for the various encoded forms of JIS X-0208-1997. XML
2710 using their registered names.
2711 Note that these registered names are defined to be
2712 case-insensitive, so processors wishing to match against them
2713 should do so in a case-insensitive
2715 <p>In the absence of information provided by an external
2717 it is an <termref def="dt-error">error</termref> for an entity including
2721 of an external entity, or for
2723 declaration to use an encoding other than UTF-8.
2725 is a subset of UTF-8, ordinary ASCII entities do not strictly need
2728 <p>It is a <termref def="dt-fatal">fatal error</termref> when an XML processor
2731 <eg>&lt;?xml encoding='UTF-8'?&gt;
2732 &lt;?xml encoding='EUC-JP'?&gt;</eg></p>
2739 required behavior of an <termref def="dt-xml-proc">XML processor</termref> in
2745 anywhere after the <termref def="dt-stag">start-tag</termref> and
2746 before the <termref def="dt-etag">end-tag</termref> of an element; corresponds
2747 to the nonterminal <nt def="NT-content">content</nt>.</p></def>
2752 <termref def="dt-stag">start-tag</termref>, or a default
2753 value in an <termref def="dt-attdecl">attribute declaration</termref>;
2755 <nt def="NT-AttValue">AttValue</nt>.</p></def></gitem>
2758 <def><p>as a <nt def="NT-Name">Name</nt>, not a reference, appearing either as
2761 the space-separated tokens in the value of an attribute which has been
2767 <termref def="dt-litentval">literal entity value</termref> in
2769 <nt def="NT-EntityValue">EntityValue</nt>.</p></def></gitem>
2771 <def><p>as a reference within either the internal or external subsets of the
2772 <termref def="dt-doctype">DTD</termref>, but outside
2773 of an <nt def="NT-EntityValue">EntityValue</nt> or
2774 <nt def="NT-AttValue">AttValue</nt>.</p></def>
2787 <td bgcolor="&cellback;">External Parsed
2795 <td bgcolor="&cellback;"><titleref href="not-recognized">Not recognized</titleref></td>
2797 <td bgcolor="&cellback;"><titleref href="include-if-valid">Included if validating</titleref></td>
2804 <td bgcolor="&cellback;"><titleref href="not-recognized">Not recognized</titleref></td>
2813 <td bgcolor="&cellback;"><titleref href="not-recognized">Not recognized</titleref></td>
2814 <td bgcolor="&cellback;"><titleref href="not-recognized">Forbidden</titleref></td>
2815 <td bgcolor="&cellback;"><titleref href="not-recognized">Forbidden</titleref></td>
2831 <td bgcolor="&cellback;"><titleref href="as-PE">Included as PE</titleref></td>
2839 <div3 id="not-recognized">
2843 DTD are not recognized as markup in <nt def="NT-content">content</nt>.
2844 Similarly, the names of unparsed entities are not recognized except
2850 <p><termdef id="dt-include" term="Include">An entity is
2852 <termref def="dt-repltext">replacement text</termref> is retrieved
2857 <termref def="dt-chardata">character data</termref>
2858 and (except for parameter entities) <termref def="dt-markup">markup</termref>,
2864 as an entity-reference delimiter.)
2869 <div3 id="include-if-valid">
2872 to <termref def="dt-valid">validate</termref>
2874 <termref def="dt-include">include</termref> its
2876 If the entity is external, and the processor is not
2878 processor <termref def="dt-may">may</termref>, but need not,
2880 If a non-validating parser does not include the replacement text,
2887 Browsers, for example, when encountering an external parsed entity reference,
2895 <termref def="dt-fatal">fatal</termref> errors:
2898 <termref def="dt-unparsed">unparsed entity</termref>.
2900 <item><p>the appearance of any character or general-entity reference in the
2901 DTD except within an <nt def="NT-EntityValue">EntityValue</nt> or
2902 <nt def="NT-AttValue">AttValue</nt>.</p></item>
2903 <item><p>a reference to an external entity in an attribute value.</p>
2910 <p>When an <termref def="dt-entref">entity reference</termref> appears in an
2912 value, its <termref def="dt-repltext">replacement text</termref> is
2918 For example, this is well-formed:
2923 &lt;element attribute='a-&amp;EndAttr;&gt;</eg>
2927 <p>When the name of an <termref def="dt-unparsed">unparsed
2931 application of the <termref def="dt-sysid">system</termref>
2932 and <termref def="dt-pubid">public</termref> (if any)
2934 <termref def="dt-notation">notation</termref>.</p>
2939 <nt def="NT-EntityValue">EntityValue</nt> in an entity declaration,
2942 <div3 id="as-PE">
2944 <p>Just as with external parsed entities, parameter entities
2945 need only be <titleref href="include-if-valid">included if
2947 When a parameter-entity reference is recognized in the DTD
2949 <termref def="dt-repltext">replacement
2958 <div2 id="intern-replacement">
2963 <termdef id="dt-litentval" term="Literal Entity Value">The <term>literal
2966 non-terminal <nt def="NT-EntityValue">EntityValue</nt>.</termdef>
2967 <termdef id="dt-repltext" term="Replacement Text">The <term>replacement
2969 replacement of character references and parameter-entity
2975 (<nt def="NT-EntityValue">EntityValue</nt>) may contain character,
2976 parameter-entity, and general-entity references.
2980 <termref def="dt-include">included</termref> as described above
2985 general-entity references must be left as-is, unexpanded.
2995 The general-entity reference "<code>&amp;rights;</code>" would be expanded
3000 <specref ref="sec-entexpand"/>.
3004 <div2 id="sec-predefined-ent">
3006 <p><termdef id="dt-escape" term="escape">Entity and character
3018 <termref def="dt-interop">For interoperability</termref>,
3034 be well-formed.
3041 <p><termdef id="dt-notation" term="Notation"><term>Notations</term> identify by
3042 name the format of <termref def="dt-extent">unparsed
3046 a <termref def="dt-pi">processing instruction</termref> is
3048 <p><termdef id="dt-notdecl" term="Notation Declaration">
3050 provide a name for the notation, for use in
3051 entity and attribute-list declarations and in attribute specifications,
3052 and an external identifier for the notation which may allow an XML
3057 <prod id="NT-NotationDecl"><lhs>NotationDecl</lhs>
3058 <rhs>'&lt;!NOTATION' <nt def="NT-S">S</nt> <nt def="NT-Name">Name</nt>
3059 <nt def="NT-S">S</nt>
3060 (<nt def="NT-ExternalID">ExternalID</nt> |
3061 <nt def="NT-PublicID">PublicID</nt>)
3062 <nt def="NT-S">S</nt>? '&gt;'</rhs></prod>
3063 <prod id="NT-PublicID"><lhs>PublicID</lhs>
3064 <rhs>'PUBLIC' <nt def="NT-S">S</nt>
3065 <nt def="NT-PubidLiteral">PubidLiteral</nt>
3069 <p>XML processors must provide applications with the name and external
3072 additionally resolve the external identifier into the
3073 <termref def="dt-sysid">system identifier</termref>,
3077 notations for which notation-specific applications are not available on
3082 <div2 id="sec-doc-entity">
3085 <p><termdef id="dt-docent" term="Document Entity">The <term>document
3087 tree and a starting-point for an <termref def="dt-xml-proc">XML
3098 <!-- &Conformance; -->
3100 <div1 id="sec-conformance">
3103 <div2 id="proc-types">
3104 <head>Validating and Non-Validating Processors</head>
3105 <p>Conforming <termref def="dt-xml-proc">XML processors</termref> fall into two
3106 classes: validating and non-validating.</p>
3107 <p>Validating and non-validating processors alike must report
3108 violations of this specification's well-formedness constraints
3110 <termref def="dt-docent">document entity</termref> and any
3111 other <termref def="dt-parsedent">parsed entities</termref> that
3113 <p><termdef id="dt-validating" term="Validating Processor">
3116 <termref def="dt-doctype">DTD</termref>, and
3121 DTD and all external parsed entities referenced in the document.
3123 <p>Non-validating processors are required to check only the
3124 <termref def="dt-docent">document entity</termref>, including
3125 the entire internal DTD subset, for well-formedness.
3126 <termdef id="dt-use-mdecl" term="Process Declarations">
3134 use the information in those declarations to
3138 <titleref href="sec-attr-defaults">default attribute values</titleref>.
3140 They must not <termref def="dt-use-mdecl">process</termref>
3141 <termref def="dt-entdecl">entity declarations</termref> or
3142 <termref def="dt-attdecl">attribute-list declarations</termref>
3147 <div2 id="safe-behavior">
3150 must read every piece of a document and report all well-formedness and
3152 Less is required of a non-validating processor; it need not read any
3156 <item><p>Certain well-formedness errors, specifically those that require
3157 reading external entities, may not be detected by a non-validating processor.
3159 <titleref href="wf-entdeclared">Entity Declared</titleref>,
3160 <titleref href="wf-textent">Parsed Entity</titleref>, and
3161 <titleref href="wf-norecursion">No Recursion</titleref>, as well
3167 parameter and external entities.
3168 For example, a non-validating processor may not
3172 <titleref href="sec-attr-defaults">default attribute values</titleref>,
3174 external or parameter entities.</p></item>
3178 processors, applications which use non-validating processors should not
3180 Applications which require facilities such as the use of default
3181 attributes or internal entities which are declared in external
3182 entities should use validating XML processors.</p>
3186 <div1 id="sec-notation">
3190 Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines
3200 <p>Within the expression on the right-hand side of a rule, the following
3207 (UCS-4)
3213 encoding in use and is not significant for XML.</p></def>
3216 <label><code>[a-zA-Z]</code>, <code>[#xN-#xN]</code></label>
3217 <def><p>matches any <termref def="dt-character">character</termref>
3221 <label><code>[^a-z]</code>, <code>[^#xN-#xN]</code></label>
3222 <def><p>matches any <termref def="dt-character">character</termref>
3228 <def><p>matches any <termref def="dt-character">character</termref>
3233 <def><p>matches a literal string <termref def="dt-match">matching</termref>
3238 <def><p>matches a literal string <termref def="dt-match">matching</termref>
3263 <label><code>A - B</code></label>
3286 <def><p>well-formedness constraint; this identifies by name a
3288 <termref def="dt-wellformed">well-formed</termref> documents
3294 <termref def="dt-valid">valid</termref> documents associated with
3302 <!-- &SGML; -->
3305 <!-- &Biblio; -->
3306 <div1 id="sec-bibliography">
3309 <div2 id="sec-existing-stds">
3314 (Internet Assigned Numbers Authority) <emph>Official Names for
3317 …oc href="ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets">ftp://ftp.isi.edu/in-notes/ia…
3330 Code for the representation of names of languages.</emph>
3336 <emph>ISO 3166-1:1997 (E).
3337 Codes for the representation of names of countries and their subdivisions
3344 <emph>ISO/IEC 10646-1993 (E). Information technology &mdash; Universal
3345 Multiple-Octet Coded Character Set (UCS) &mdash; Part 1:
3353 Reading, Mass.: Addison-Wesley Developers Press, 1996.</bibl>
3366 Reading: Addison-Wesley, 1986, rpt. corr. 1988.</bibl>
3368 <bibl id="Berners-Lee" xml-link="simple" key="Berners-Lee et al.">
3369 Berners-Lee, T., R. Fielding, and L. Masinter.
3375 <bibl id="ABK" key="Br�ggemann-Klein">Br�ggemann-Klein, Anne.
3378 S. 97-98. Springer-Verlag, Berlin 1992.
3379 Full Version in Theoretical Computer Science 120: 197-213, 1993.
3383 <bibl id="ABKDW" key="Br�ggemann-Klein and Wood">Br�ggemann-Klein, Anne,
3392 <loc href="http://www.w3.org/TR/NOTE-sgml-xml-971215">http://www.w3.org/TR/NOTE-sgml-xml-971215</lo…
3394 <bibl id="RFC1738" xml-link="simple" key="IETF RFC1738">
3397 ed. T. Berners-Lee, L. Masinter, M. McCahill.
3401 <bibl id="RFC1808" xml-link="simple" key="IETF RFC1808">
3408 <bibl id="RFC2141" xml-link="simple" key="IETF RFC2141">
3419 edition &mdash; 1986-10-15. [Geneva]: International Organization for
3426 <emph>ISO/IEC 10744-1992 (E). Information technology &mdash;
3427 Hypermedia/Time-based Structuring Language (HyTime).
3453 <prod id="NT-Letter"><lhs>Letter</lhs>
3454 <rhs><nt def="NT-BaseChar">BaseChar</nt>
3455 | <nt def="NT-Ideographic">Ideographic</nt></rhs> </prod>
3456 <prod id="NT-BaseChar"><lhs>BaseChar</lhs>
3457 <rhs>[#x0041-#x005A]
3458 |&nbsp;[#x0061-#x007A]
3459 |&nbsp;[#x00C0-#x00D6]
3460 |&nbsp;[#x00D8-#x00F6]
3461 |&nbsp;[#x00F8-#x00FF]
3462 |&nbsp;[#x0100-#x0131]
3463 |&nbsp;[#x0134-#x013E]
3464 |&nbsp;[#x0141-#x0148]
3465 |&nbsp;[#x014A-#x017E]
3466 |&nbsp;[#x0180-#x01C3]
3467 |&nbsp;[#x01CD-#x01F0]
3468 |&nbsp;[#x01F4-#x01F5]
3469 |&nbsp;[#x01FA-#x0217]
3470 |&nbsp;[#x0250-#x02A8]
3471 |&nbsp;[#x02BB-#x02C1]
3473 |&nbsp;[#x0388-#x038A]
3475 |&nbsp;[#x038E-#x03A1]
3476 |&nbsp;[#x03A3-#x03CE]
3477 |&nbsp;[#x03D0-#x03D6]
3482 |&nbsp;[#x03E2-#x03F3]
3483 |&nbsp;[#x0401-#x040C]
3484 |&nbsp;[#x040E-#x044F]
3485 |&nbsp;[#x0451-#x045C]
3486 |&nbsp;[#x045E-#x0481]
3487 |&nbsp;[#x0490-#x04C4]
3488 |&nbsp;[#x04C7-#x04C8]
3489 |&nbsp;[#x04CB-#x04CC]
3490 |&nbsp;[#x04D0-#x04EB]
3491 |&nbsp;[#x04EE-#x04F5]
3492 |&nbsp;[#x04F8-#x04F9]
3493 |&nbsp;[#x0531-#x0556]
3495 |&nbsp;[#x0561-#x0586]
3496 |&nbsp;[#x05D0-#x05EA]
3497 |&nbsp;[#x05F0-#x05F2]
3498 |&nbsp;[#x0621-#x063A]
3499 |&nbsp;[#x0641-#x064A]
3500 |&nbsp;[#x0671-#x06B7]
3501 |&nbsp;[#x06BA-#x06BE]
3502 |&nbsp;[#x06C0-#x06CE]
3503 |&nbsp;[#x06D0-#x06D3]
3505 |&nbsp;[#x06E5-#x06E6]
3506 |&nbsp;[#x0905-#x0939]
3508 |&nbsp;[#x0958-#x0961]
3509 |&nbsp;[#x0985-#x098C]
3510 |&nbsp;[#x098F-#x0990]
3511 |&nbsp;[#x0993-#x09A8]
3512 |&nbsp;[#x09AA-#x09B0]
3514 |&nbsp;[#x09B6-#x09B9]
3515 |&nbsp;[#x09DC-#x09DD]
3516 |&nbsp;[#x09DF-#x09E1]
3517 |&nbsp;[#x09F0-#x09F1]
3518 |&nbsp;[#x0A05-#x0A0A]
3519 |&nbsp;[#x0A0F-#x0A10]
3520 |&nbsp;[#x0A13-#x0A28]
3521 |&nbsp;[#x0A2A-#x0A30]
3522 |&nbsp;[#x0A32-#x0A33]
3523 |&nbsp;[#x0A35-#x0A36]
3524 |&nbsp;[#x0A38-#x0A39]
3525 |&nbsp;[#x0A59-#x0A5C]
3527 |&nbsp;[#x0A72-#x0A74]
3528 |&nbsp;[#x0A85-#x0A8B]
3530 |&nbsp;[#x0A8F-#x0A91]
3531 |&nbsp;[#x0A93-#x0AA8]
3532 |&nbsp;[#x0AAA-#x0AB0]
3533 |&nbsp;[#x0AB2-#x0AB3]
3534 |&nbsp;[#x0AB5-#x0AB9]
3537 |&nbsp;[#x0B05-#x0B0C]
3538 |&nbsp;[#x0B0F-#x0B10]
3539 |&nbsp;[#x0B13-#x0B28]
3540 |&nbsp;[#x0B2A-#x0B30]
3541 |&nbsp;[#x0B32-#x0B33]
3542 |&nbsp;[#x0B36-#x0B39]
3544 |&nbsp;[#x0B5C-#x0B5D]
3545 |&nbsp;[#x0B5F-#x0B61]
3546 |&nbsp;[#x0B85-#x0B8A]
3547 |&nbsp;[#x0B8E-#x0B90]
3548 |&nbsp;[#x0B92-#x0B95]
3549 |&nbsp;[#x0B99-#x0B9A]
3551 |&nbsp;[#x0B9E-#x0B9F]
3552 |&nbsp;[#x0BA3-#x0BA4]
3553 |&nbsp;[#x0BA8-#x0BAA]
3554 |&nbsp;[#x0BAE-#x0BB5]
3555 |&nbsp;[#x0BB7-#x0BB9]
3556 |&nbsp;[#x0C05-#x0C0C]
3557 |&nbsp;[#x0C0E-#x0C10]
3558 |&nbsp;[#x0C12-#x0C28]
3559 |&nbsp;[#x0C2A-#x0C33]
3560 |&nbsp;[#x0C35-#x0C39]
3561 |&nbsp;[#x0C60-#x0C61]
3562 |&nbsp;[#x0C85-#x0C8C]
3563 |&nbsp;[#x0C8E-#x0C90]
3564 |&nbsp;[#x0C92-#x0CA8]
3565 |&nbsp;[#x0CAA-#x0CB3]
3566 |&nbsp;[#x0CB5-#x0CB9]
3568 |&nbsp;[#x0CE0-#x0CE1]
3569 |&nbsp;[#x0D05-#x0D0C]
3570 |&nbsp;[#x0D0E-#x0D10]
3571 |&nbsp;[#x0D12-#x0D28]
3572 |&nbsp;[#x0D2A-#x0D39]
3573 |&nbsp;[#x0D60-#x0D61]
3574 |&nbsp;[#x0E01-#x0E2E]
3576 |&nbsp;[#x0E32-#x0E33]
3577 |&nbsp;[#x0E40-#x0E45]
3578 |&nbsp;[#x0E81-#x0E82]
3580 |&nbsp;[#x0E87-#x0E88]
3583 |&nbsp;[#x0E94-#x0E97]
3584 |&nbsp;[#x0E99-#x0E9F]
3585 |&nbsp;[#x0EA1-#x0EA3]
3588 |&nbsp;[#x0EAA-#x0EAB]
3589 |&nbsp;[#x0EAD-#x0EAE]
3591 |&nbsp;[#x0EB2-#x0EB3]
3593 |&nbsp;[#x0EC0-#x0EC4]
3594 |&nbsp;[#x0F40-#x0F47]
3595 |&nbsp;[#x0F49-#x0F69]
3596 |&nbsp;[#x10A0-#x10C5]
3597 |&nbsp;[#x10D0-#x10F6]
3599 |&nbsp;[#x1102-#x1103]
3600 |&nbsp;[#x1105-#x1107]
3602 |&nbsp;[#x110B-#x110C]
3603 |&nbsp;[#x110E-#x1112]
3610 |&nbsp;[#x1154-#x1155]
3612 |&nbsp;[#x115F-#x1161]
3617 |&nbsp;[#x116D-#x116E]
3618 |&nbsp;[#x1172-#x1173]
3623 |&nbsp;[#x11AE-#x11AF]
3624 |&nbsp;[#x11B7-#x11B8]
3626 |&nbsp;[#x11BC-#x11C2]
3630 |&nbsp;[#x1E00-#x1E9B]
3631 |&nbsp;[#x1EA0-#x1EF9]
3632 |&nbsp;[#x1F00-#x1F15]
3633 |&nbsp;[#x1F18-#x1F1D]
3634 |&nbsp;[#x1F20-#x1F45]
3635 |&nbsp;[#x1F48-#x1F4D]
3636 |&nbsp;[#x1F50-#x1F57]
3640 |&nbsp;[#x1F5F-#x1F7D]
3641 |&nbsp;[#x1F80-#x1FB4]
3642 |&nbsp;[#x1FB6-#x1FBC]
3644 |&nbsp;[#x1FC2-#x1FC4]
3645 |&nbsp;[#x1FC6-#x1FCC]
3646 |&nbsp;[#x1FD0-#x1FD3]
3647 |&nbsp;[#x1FD6-#x1FDB]
3648 |&nbsp;[#x1FE0-#x1FEC]
3649 |&nbsp;[#x1FF2-#x1FF4]
3650 |&nbsp;[#x1FF6-#x1FFC]
3652 |&nbsp;[#x212A-#x212B]
3654 |&nbsp;[#x2180-#x2182]
3655 |&nbsp;[#x3041-#x3094]
3656 |&nbsp;[#x30A1-#x30FA]
3657 |&nbsp;[#x3105-#x312C]
3658 |&nbsp;[#xAC00-#xD7A3]
3660 <prod id="NT-Ideographic"><lhs>Ideographic</lhs>
3661 <rhs>[#x4E00-#x9FA5]
3663 |&nbsp;[#x3021-#x3029]
3665 <prod id="NT-CombiningChar"><lhs>CombiningChar</lhs>
3666 <rhs>[#x0300-#x0345]
3667 |&nbsp;[#x0360-#x0361]
3668 |&nbsp;[#x0483-#x0486]
3669 |&nbsp;[#x0591-#x05A1]
3670 |&nbsp;[#x05A3-#x05B9]
3671 |&nbsp;[#x05BB-#x05BD]
3673 |&nbsp;[#x05C1-#x05C2]
3675 |&nbsp;[#x064B-#x0652]
3677 |&nbsp;[#x06D6-#x06DC]
3678 |&nbsp;[#x06DD-#x06DF]
3679 |&nbsp;[#x06E0-#x06E4]
3680 |&nbsp;[#x06E7-#x06E8]
3681 |&nbsp;[#x06EA-#x06ED]
3682 |&nbsp;[#x0901-#x0903]
3684 |&nbsp;[#x093E-#x094C]
3686 |&nbsp;[#x0951-#x0954]
3687 |&nbsp;[#x0962-#x0963]
3688 |&nbsp;[#x0981-#x0983]
3692 |&nbsp;[#x09C0-#x09C4]
3693 |&nbsp;[#x09C7-#x09C8]
3694 |&nbsp;[#x09CB-#x09CD]
3696 |&nbsp;[#x09E2-#x09E3]
3701 |&nbsp;[#x0A40-#x0A42]
3702 |&nbsp;[#x0A47-#x0A48]
3703 |&nbsp;[#x0A4B-#x0A4D]
3704 |&nbsp;[#x0A70-#x0A71]
3705 |&nbsp;[#x0A81-#x0A83]
3707 |&nbsp;[#x0ABE-#x0AC5]
3708 |&nbsp;[#x0AC7-#x0AC9]
3709 |&nbsp;[#x0ACB-#x0ACD]
3710 |&nbsp;[#x0B01-#x0B03]
3712 |&nbsp;[#x0B3E-#x0B43]
3713 |&nbsp;[#x0B47-#x0B48]
3714 |&nbsp;[#x0B4B-#x0B4D]
3715 |&nbsp;[#x0B56-#x0B57]
3716 |&nbsp;[#x0B82-#x0B83]
3717 |&nbsp;[#x0BBE-#x0BC2]
3718 |&nbsp;[#x0BC6-#x0BC8]
3719 |&nbsp;[#x0BCA-#x0BCD]
3721 |&nbsp;[#x0C01-#x0C03]
3722 |&nbsp;[#x0C3E-#x0C44]
3723 |&nbsp;[#x0C46-#x0C48]
3724 |&nbsp;[#x0C4A-#x0C4D]
3725 |&nbsp;[#x0C55-#x0C56]
3726 |&nbsp;[#x0C82-#x0C83]
3727 |&nbsp;[#x0CBE-#x0CC4]
3728 |&nbsp;[#x0CC6-#x0CC8]
3729 |&nbsp;[#x0CCA-#x0CCD]
3730 |&nbsp;[#x0CD5-#x0CD6]
3731 |&nbsp;[#x0D02-#x0D03]
3732 |&nbsp;[#x0D3E-#x0D43]
3733 |&nbsp;[#x0D46-#x0D48]
3734 |&nbsp;[#x0D4A-#x0D4D]
3737 |&nbsp;[#x0E34-#x0E3A]
3738 |&nbsp;[#x0E47-#x0E4E]
3740 |&nbsp;[#x0EB4-#x0EB9]
3741 |&nbsp;[#x0EBB-#x0EBC]
3742 |&nbsp;[#x0EC8-#x0ECD]
3743 |&nbsp;[#x0F18-#x0F19]
3749 |&nbsp;[#x0F71-#x0F84]
3750 |&nbsp;[#x0F86-#x0F8B]
3751 |&nbsp;[#x0F90-#x0F95]
3753 |&nbsp;[#x0F99-#x0FAD]
3754 |&nbsp;[#x0FB1-#x0FB7]
3756 |&nbsp;[#x20D0-#x20DC]
3758 |&nbsp;[#x302A-#x302F]
3762 <prod id="NT-Digit"><lhs>Digit</lhs>
3763 <rhs>[#x0030-#x0039]
3764 |&nbsp;[#x0660-#x0669]
3765 |&nbsp;[#x06F0-#x06F9]
3766 |&nbsp;[#x0966-#x096F]
3767 |&nbsp;[#x09E6-#x09EF]
3768 |&nbsp;[#x0A66-#x0A6F]
3769 |&nbsp;[#x0AE6-#x0AEF]
3770 |&nbsp;[#x0B66-#x0B6F]
3771 |&nbsp;[#x0BE7-#x0BEF]
3772 |&nbsp;[#x0C66-#x0C6F]
3773 |&nbsp;[#x0CE6-#x0CEF]
3774 |&nbsp;[#x0D66-#x0D6F]
3775 |&nbsp;[#x0E50-#x0E59]
3776 |&nbsp;[#x0ED0-#x0ED9]
3777 |&nbsp;[#x0F20-#x0F29]
3779 <prod id="NT-Extender"><lhs>Extender</lhs>
3788 |&nbsp;[#x3031-#x3035]
3789 |&nbsp;[#x309D-#x309E]
3790 |&nbsp;[#x30FC-#x30FE]
3804 <p>Name characters other than Name-start characters
3810 names.</p>
3814 with a "compatibility formatting tag" in field 5 of the database --
3818 <p>The following characters are treated as name-start characters
3820 them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.</p>
3823 <p>Characters #x20DD-#x20E0 are excluded (in accordance with
3835 <p>Characters ':' and '_' are allowed as name-start characters.</p>
3838 <p>Characters '-' and '.' are allowed as name characters.</p>
3843 <inform-div1 id="sec-xml-and-sgml">
3847 <termref def="dt-valid">valid</termref> XML document should also be a
3852 </inform-div1>
3853 <inform-div1 id="sec-entexpand">
3856 sequence of entity- and character-reference recognition and
3874 start- and end-tags of the "<code>p</code>" element will be recognized
3890 5 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
3906 "<code>&lt;!ENTITY tricky "error-prone" &gt;</code>",
3907 which is a well-formed entity declaration.</p></item>
3912 ("<code>&lt;!ENTITY tricky "error-prone" &gt;</code>") is parsed.
3914 declared, with the replacement text "<code>error-prone</code>".</p></item>
3918 "<code>test</code>" element is the self-describing (and ungrammatical) string
3919 <emph>This sample shows a error-prone method.</emph>
3923 </inform-div1>
3924 <inform-div1 id="determinism">
3926 <p><termref def="dt-compat">For compatibility</termref>, it is
3930 <!-- FINAL EDIT: WebSGML allows ambiguity? -->
3934 flag non-deterministic content models as errors.</p>
3936 non-deterministic, because given an initial <code>b</code> the parser
3960 <p>Algorithms exist which allow many but not all non-deterministic
3962 models; see Br�ggemann-Klein 1991 <bibref ref="ABK"/>.</p>
3963 </inform-div1>
3964 <inform-div1 id="sec-guessing">
3967 entity, indicating which character encoding is in use. Before an XML
3969 know what character encoding is in use&mdash;which is what the internal label
3975 make it feasible to autodetect the character encoding in use in each
3981 (external) information. We consider the first case first.
3984 Because each XML entity not in UTF-8 or UTF-16 format <emph>must</emph>
3988 In reading this list, it may help to know that in UCS-4, '&lt;' is
3990 Order Mark required of UTF-16 data streams is "<code>#xFEFF</code>".</p>
3994 <p><code>00 00 00 3C</code>: UCS-4, big-endian machine (1234 order)</p>
3997 <p><code>3C 00 00 00</code>: UCS-4, little-endian machine (4321 order)</p>
4000 <p><code>00 00 3C 00</code>: UCS-4, unusual octet order (2143)</p>
4003 <p><code>00 3C 00 00</code>: UCS-4, unusual octet order (3412)</p>
4006 <p><code>FE FF</code>: UTF-16, big-endian</p>
4009 <p><code>FF FE</code>: UTF-16, little-endian</p>
4012 <p><code>00 3C 00 3F</code>: UTF-16, big-endian, no Byte Order Mark
4016 <p><code>3C 00 3F 00</code>: UTF-16, little-endian, no Byte Order Mark
4020 <p><code>3C 3F 78 6D</code>: UTF-8, ISO 646, ASCII, some part of ISO 8859,
4021 Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding
4026 use the same bit patterns for the ASCII characters, the encoding
4033 use)</p>
4036 <p>other: UTF-8 without an encoding declaration, or else
4044 declaration and parse the character-encoding identifier, which is
4046 of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859
4048 use, and so on).
4054 use. Since in practice, all widely used character encodings fall into
4056 reasonably reliable in-band labeling of character encodings, even when
4057 external sources of information at the operating-system or
4058 transport-protocol level are unreliable.
4061 Once the processor has detected the character encoding in use, it can
4067 Like any self-labeling system, the XML encoding declaration will not
4070 character-encoding routines should be careful to ensure the accuracy
4071 of the internal and external information used to label the entity.
4080 specified as part of the higher-level protocol used to deliver XML.
4082 MIME-type label in an external header, for example, should be part of the
4087 <item><p>If an XML entity is in a file, the Byte-Order Mark
4088 and encoding-declaration PI are used (if present) to determine the
4100 MIME type of application/xml, then the Byte-Order Mark and
4101 encoding-declaration PI are used (if present) to determine the
4106 These rules apply only in the absence of protocol-level documentation;
4112 </inform-div1>
4114 <inform-div1 id="sec-xml-wg">
4125 <member><name>Tim Bray, Textuality and Netscape</name><role>XML Co-editor</role></member>
4126 <member><name>Jean Paoli, Microsoft</name><role>XML Co-editor</role></member>
4127 <member><name>C. M. Sperberg-McQueen, U. of Ill.</name><role>XML
4128 Co-editor</role></member>
4144 </inform-div1>
4147 <!-- Keep this comment at the end of the file
4150 sgml-default-dtd-file:"~/sgml/spec.ced"
4151 sgml-omittag:t
4152 sgml-shorttag:t
4154 -->