1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3<html> 4<head> 5 <meta http-equiv="Content-Type" content="text/html"> 6 <style type="text/css"></style> 7<!-- 8TD {font-family: Verdana,Arial,Helvetica} 9BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 10H1 {font-family: Verdana,Arial,Helvetica} 11H2 {font-family: Verdana,Arial,Helvetica} 12H3 {font-family: Verdana,Arial,Helvetica} 13A:link, A:visited, A:active { text-decoration: underline } 14 </style> 15--> 16 <title>XML resources publication guidelines</title> 17</head> 18 19<body bgcolor="#fffacd" text="#000000"> 20<h1 align="center">XML resources publication guidelines</h1> 21 22<p></p> 23 24<p>The goal of this document is to provide a set of guidelines and tips 25helping the publication and deployment of <a 26href="http://www.w3.org/XML/">XML</a> resources for the <a 27href="http://www.gnome.org/">GNOME project</a>. However it is not tied to 28GNOME and might be helpful more generally. I welcome <a 29href="mailto:veillard@redhat.com">feedback</a> on this document.</p> 30 31<p>The intended audience is the software developers who started using XML 32for some of the resources of their project, as a storage format, for data 33exchange, checking or transformations. There have been an increasing number 34of new XML formats defined, but not all steps have been taken, possibly because of 35lack of documentation, to truly gain all the benefits of the use of XML. 36These guidelines hope to improve the matter and provide a better overview of 37the overall XML processing and associated steps needed to deploy it 38successfully:</p> 39 40<p>Table of contents:</p> 41<ol> 42 <li><a href="#Design">Design guidelines</a></li> 43 <li><a href="#Canonical">Canonical URL</a></li> 44 <li><a href="#Catalog">Catalog setup</a></li> 45 <li><a href="#Package">Package integration</a></li> 46</ol> 47 48<h2><a name="Design">Design guidelines</a></h2> 49 50<p>This part intends to focus on the format itself of XML. It may arrive 51a bit too late since the structure of the document may already be cast in 52existing and deployed code. Still, here are a few rules which might be helpful 53when designing a new XML vocabulary or making the revision of an existing 54format:</p> 55 56<h3>Reuse existing formats:</h3> 57 58<p>This may sounds a bit simplistic, but before designing your own format, 59try to lookup existing XML vocabularies on similar data. Ideally this allows 60you to reuse them, in which case a lot of the existing tools like DTD, schemas 61and stylesheets may already be available. If you are looking at a 62documentation format, <a href="http://www.docbook.org/">DocBook</a> should 63handle your needs. If reuse is not possible because some semantic or use case 64aspects are too different this will be helpful avoiding design errors like 65targeting the vocabulary to the wrong abstraction level. In this format 66design phase try to be synthetic and be sure to express the real content of 67your data and use the XML structure to express the semantic and context of 68those data.</p> 69 70<h3>DTD rules:</h3> 71 72<p>Building a DTD (Document Type Definition) or a Schema describing the 73structure allowed by instances is the core of the design process of the 74vocabulary. Here are a few tips:</p> 75<ul> 76 <li>use significant words for the element and attributes names.</li> 77 <li>do not use attributes for general textual content, attributes 78 will be modified by the parser before reaching the application, 79 spaces and line information will be modified.</li> 80 <li>use single elements for every string that might be subject to 81 localization. The canonical way to localize XML content is to use 82 siblings element carrying different xml:lang attributes like in the 83 following: 84 <pre><welcome> 85 <msg xml:lang="en">hello</msg> 86 <msg xml:lang="fr">bonjour</msg> 87</welcome></pre> 88 </li> 89 <li>use attributes to refine the content of an element but avoid them for 90 more complex tasks, attribute parsing is not cheaper than an element and 91 it is far easier to make an element content more complex while attribute 92 will have to remain very simple.</li> 93</ul> 94 95<h3>Versioning:</h3> 96 97<p>As part of the design, make sure the structure you define will be usable 98for future extension that you may not consider for the current version. There 99are two parts to this:</p> 100<ul> 101 <li>Make sure the instance contains a version number which will allow to 102 make backward compatibility easy. Something as simple as having a 103 <code>version="1.0"</code> on the root document of the instance is 104 sufficient.</li> 105 <li>While designing the code doing the analysis of the data provided by the 106 XML parser, make sure you can work with unknown versions, generate a UI 107 warning and process only the tags recognized by your version but keep in 108 mind that you should not break on unknown elements if the version 109 attribute was not in the recognized set.</li> 110</ul> 111 112<h3>Other design parts:</h3> 113 114<p>While defining you vocabulary, try to think in term of other usage of your 115data, for example how using XSLT stylesheets could be used to make an HTML 116view of your data, or to convert it into a different format. Checking XML 117Schemas and looking at defining an XML Schema with a more complete 118validation and datatyping of your data structures is important, this helps 119avoiding some mistakes in the design phase.</p> 120 121<h3>Namespace:</h3> 122 123<p>If you expect your XML vocabulary to be used or recognized outside of your 124application (for example binding a specific processing from a graphic shell 125like Nautilus to an instance of your data) then you should really define an <a 126href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your 127vocabulary. A namespace name is an URL (absolute URI more precisely). It is 128generally recommended to anchor it as an HTTP resource to a server associated 129with the software project. See the next section about this. In practice this 130will mean that XML parsers will not handle your element names as-is but as a 131couple based on the namespace name and the element name. This allows it to 132recognize and disambiguate processing. Unicity of the namespace name can be 133for the most part guaranteed by the use of the DNS registry. Namespace can 134also be used to carry versioning information like:</p> 135 136<p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p> 137 138<p>An easy way to use them is to make them the default namespace on the 139root element of the XML instance like:</p> 140<pre><structure xmlns="http://www.gnome.org/project/projectname/1.0/"> 141 <data> 142 ... 143 </data> 144</structure></pre> 145 146<p>In that document, structure and all descendant elements like data are in 147the given namespace.</p> 148 149<h2><a name="Canonical">Canonical URL</a></h2> 150 151<p>As seen in the previous namespace section, while XML processing is not 152tied to the Web there is a natural synergy between both. XML was designed to 153be available on the Web, and keeping the infrastructure that way helps 154deploying the XML resources. The core of this issue is the notion of 155"Canonical URL" of an XML resource. The resource can be an XML document, a 156DTD, a stylesheet, a schema, or even non-XML data associated with an XML 157resource, the canonical URL is the URL where the "master" copy of that 158resource is expected to be present on the Web. Usually when processing XML a 159copy of the resource will be present on the local disk, maybe in 160/usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\ 161(horror !). The key point is that the way to name that resource should be 162independent of the actual place where it resides on disk if it is available, 163and the fact that the processing will still work if there is no local copy 164(and that the machine where the processing is connected to the Internet).</p> 165 166<p>What this really means is that one should never use the local name of a 167resource to reference it but always use the canonical URL. For example in a 168DocBook instance the following should not be used:</p> 169<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br> 170 171 172 "/usr/share/xml/docbook/4.2/docbookx.dtd"></pre> 173 174<p>But always reference the canonical URL for the DTD:</p> 175<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br> 176 177 178 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> </pre> 179 180<p>Similarly, the document instance may reference the <a 181href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to 182generate HTML, and the canonical URL should be used:</p> 183<pre><?xml-stylesheet 184 href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl" 185 type="text/xsl"?></pre> 186 187<p>Defining the canonical URL for the resources needed should obey a few 188simple rules similar to those used to design namespace names:</p> 189<ul> 190 <li>use a DNS name you know is associated to the project and will be 191 available on the long term</li> 192 <li>within that server space, reserve the right to the subtree where you 193 intend to keep those data</li> 194 <li>version the URL so that multiple concurrent versions of the resources 195 can be hosted simultaneously</li> 196</ul> 197 198<h2><a name="Catalog">Catalog setup</a></h2> 199 200<h3>How catalogs work:</h3> 201 202<p>The catalogs are the technical mechanism which allow the XML processing 203tools to use a local copy of the resources if it is available even if the 204instance document references the canonical URL. <a 205href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are 206anchored in the root catalog (usually <code>/etc/xml/catalog</code> or 207defined by the user). They are a tree of XML documents defining the mappings 208between the canonical naming space and the local installed ones, this can be 209seen as a static cache structure.</p> 210 211<p>When the XML processor is asked to process a resource it will 212automatically test for a locally available version in the catalog, starting 213from the root catalog, and possibly fetching sub-catalog resources until it 214finds that the catalog has that resource or not. If not the default 215processing of fetching the resource from the Web is done, allowing in most 216case to recover from a catalog miss. The key point is that the document 217instances are totally independent of the availability of a catalog or from 218the actual place where the local resource they reference may be installed. 219This greatly improves the management of the documents in the long run, making 220them independent of the platform or toolchain used to process them. The 221figure below tries to express that mechanism:<img src="catalog.gif" 222alt="Picture describing the catalog "></p> 223 224<h3>Usual catalog setup:</h3> 225 226<p>Usually catalogs for a project are setup as a 2 level hierarchical cache, 227the root catalog containing only "delegates" indicating a separate subcatalog 228dedicated to the project. The goal is to keep the root catalog clean and 229simplify the maintenance of the catalog by using separate catalogs per 230project. For example when creating a catalog for the <a 231href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to 232the root catalog:</p> 233<pre> <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0" 234 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/> 235 <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD" 236 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/> 237 <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD" 238 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/></pre> 239 240<p>They are all "delegates" meaning that if the catalog system is asked to 241resolve a reference corresponding to them, it has to lookup a sub catalog. 242Here the subcatalog was installed as 243<code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That 244decision is left to the sysadmin or the packager for that system and may 245obey different rules, but the actual place on the filesystem (or on a 246resource cache on the local network) will not influence the processing as 247long as it is available. The first rule indicate that if the reference uses a 248PUBLIC identifier beginning with the</p> 249 250<p><code>"-//W3C//DTD XHTML 1.0"</code></p> 251 252<p>substring, then the catalog lookup should be limited to the specific given 253lookup catalog. Similarly the second and third entries indicate those 254delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL 255starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring 256which indicates the location on the W3C server where the XHTML1 resources are 257stored. Those are the beginning of all Canonical URLs for XHTML1 resources. 258Those three rules are sufficient in practice to capture all references to XHTML1 259resources and direct the processing tools to the right subcatalog.</p> 260 261<h3>A subcatalog example:</h3> 262 263<p>Here is the complete subcatalog used for XHTML1:</p> 264<pre><?xml version="1.0"?> 265<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 266 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 267<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 268 <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN" 269 uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/> 270 <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN" 271 uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/> 272 <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN" 273 uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/> 274 <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD" 275 rewritePrefix="xhtml1-20020801/DTD"/> 276 <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD" 277 rewritePrefix="xhtml1-20020801/DTD"/> 278</catalog></pre> 279 280<p>There are a few things to notice:</p> 281<ul> 282 <li>this is an XML resource, it points to the DTD using Canonical URLs, the 283 root element defines a namespace (but based on an URN not an HTTP 284 URL).</li> 285 <li>it contains 5 rules, the 3 first ones are direct mapping for the 3 286 PUBLIC identifiers defined by the XHTML1 specification and associating 287 them with the local resource containing the DTD, the 2 last ones are 288 rewrite rules allowing to build the local filename for any URL based on 289 "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by 290 keeping the same structure as the on-line server at the Canonical URL</li> 291 <li>the local resources are designated using URI references (the uri or 292 rewritePrefix attributes), the base being the containing sub-catalog URL, 293 which means that in practice the copy of the XHTML1 strict DTD is stored 294 locally in 295 <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li> 296</ul> 297 298<p>Those 5 rules are sufficient to cover all references to the resources held 299at the Canonical URL for the XHTML1 DTDs.</p> 300 301<h2><a name="Package">Package integration</a></h2> 302 303<p>Creating and removing catalogs should be handled as part of the process of 304(un)installing the local copy of the resources. The catalog files being XML 305resources should be processed with XML based tools to avoid problems with the 306generated files, the xmlcatalog command coming with libxml2 allows you to create 307catalogs, and add or remove rules at that time. Here is a complete example 308coming from the RPM for the XHTML1 DTDs post install script. While this example 309is platform and packaging specific, this can be useful as a an example in 310other contexts:</p> 311<pre>%post 312CATALOG=/usr/share/sgml/xhtml1/xmlcatalog 313# 314# Register it in the super catalog with the appropriate delegates 315# 316ROOTCATALOG=/etc/xml/catalog 317 318if [ ! -r $ROOTCATALOG ] 319then 320 /usr/bin/xmlcatalog --noout --create $ROOTCATALOG 321fi 322 323if [ -w $ROOTCATALOG ] 324then 325 /usr/bin/xmlcatalog --noout --add "delegatePublic" \ 326 "-//W3C//DTD XHTML 1.0" \ 327 "file://$CATALOG" $ROOTCATALOG 328 /usr/bin/xmlcatalog --noout --add "delegateSystem" \ 329 "http://www.w3.org/TR/xhtml1/DTD" \ 330 "file://$CATALOG" $ROOTCATALOG 331 /usr/bin/xmlcatalog --noout --add "delegateURI" \ 332 "http://www.w3.org/TR/xhtml1/DTD" \ 333 "file://$CATALOG" $ROOTCATALOG 334fi</pre> 335 336<p>The XHTML1 subcatalog is not created on-the-fly in that case, it is 337installed as part of the files of the packages. So the only work needed is to 338make sure the root catalog exists and register the delegate rules.</p> 339 340<p>Similarly, the script for the post-uninstall just remove the rules from the 341catalog:</p> 342<pre>%postun 343# 344# On removal, unregister the xmlcatalog from the supercatalog 345# 346if [ "$1" = 0 ]; then 347 CATALOG=/usr/share/sgml/xhtml1/xmlcatalog 348 ROOTCATALOG=/etc/xml/catalog 349 350 if [ -w $ROOTCATALOG ] 351 then 352 /usr/bin/xmlcatalog --noout --del \ 353 "-//W3C//DTD XHTML 1.0" $ROOTCATALOG 354 /usr/bin/xmlcatalog --noout --del \ 355 "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG 356 /usr/bin/xmlcatalog --noout --del \ 357 "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG 358 fi 359fi</pre> 360 361<p>Note the test against $1, this is needed to not remove the delegate rules 362in case of upgrade of the package.</p> 363 364<p>Following the set of guidelines and tips provided in this document should 365help deploy the XML resources in the GNOME framework without much pain and 366ensure a smooth evolution of the resource and instances.</p> 367 368<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p> 369 370<p>$Id$</p> 371 372<p></p> 373</body> 374</html> 375