guidelines.html revision 6943a4db3e3d515fca820586fdab4c95ceac1d40
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3<html> 4<head> 5 <meta http-equiv="Content-Type" content="text/html"> 6 <style type="text/css"> 7<!-- 8TD {font-family: Verdana,Arial,Helvetica} 9BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 10H1 {font-family: Verdana,Arial,Helvetica} 11H2 {font-family: Verdana,Arial,Helvetica} 12H3 {font-family: Verdana,Arial,Helvetica} 13A:link, A:visited, A:active { text-decoration: underline }--> 14 15 </style> 16 <title>XML resources publication guidelines</title> 17</head> 18 19<body bgcolor="#fffacd" text="#000000"> 20<h1 align="center">XML resources publication guidelines</h1> 21 22<p></p> 23 24<p>The goal of this document is to provide a set of guidelines and tips 25helping the publication and deployment of <a 26href="http://www.w3.org.XML/">XML</a> resources for the <a 27href="http://www.gnome.org/">GNOME project</a>. However it is not tied to 28GNOME and might be helpful more generally, I welcome <a 29href="mailto:veillard@redhat.com">feedback</a> on this document.</p> 30 31<p>The intended audience are the software developpers who started using XML 32for some of the resources of their project, as a storage format, for data 33exchange, checking or transformations. There have been an increasing number 34of new XML format defined, but not all steps have been taken, possibly by 35lack of documentation, to truely gain all the benefits of the use of XML. 36Those guidelines hopes to improve the matter and provide a better overview of 37the overall XML processing and associated steps needed deploy it 38successfully:</p> 39 40<p>Table of content:</p> 41<ol> 42 <li><a href="#Design">Design guidelines</a></li> 43 <li><a href="#Canonical">Canonical URL</a></li> 44 <li><a href="#Catalog">Catalog setup</a></li> 45 <li><a href="#Package">Package integration</a></li> 46</ol> 47 48<h2><a name="Design">Design guidelines</a></h2> 49 50<p>This part intend to focuse on the format itself of XML, those may arrive 51a bit too late since the structure of the document may already be cast in 52existing and deployed code. Still here are a few rules which might be helpful 53when designing a new XML vocabulary or making the revision of an existing 54format:</p> 55 56<h3>Reuse existing formats:</h3> 57 58<p>This may sounds a bit simplistic, but before designing your own format, 59try to lookup existing XML vocabularies on similar data. Ideally this allows 60to reuse them, in which case a lot of the existing tools like DTD, schemas 61and stylesheets may already be available. If you are looking at a 62documentation format, <a href="http://www.docbook.org/">DocBook</a> should 63handle your needs. If reuse is not possible because some semantic or use case 64aspects are too differents this will be helpful avoiding design errors like 65targetting the vocabulary to the wrong abstraction level. In this format 66design phase try to be synthetic and be sure to express the real content of 67your data and use the XML structure to express the semantic and context of 68those data.</p> 69 70<h3>DTD rules:</h3> 71 72<p>Building a DTD (Document Type Definition) or a Schema describing the 73structure allowed by instances is the core of the design process of the 74vocabulary. Here are a few tips:</p> 75<ul> 76 <li>use significant words for the element and attributes names</li> 77 <li>do not use attributes for textual content, attributes will be modified 78 by the parser before reaching the application</li> 79 <li>use single elements for every strings which might be subject to 80 localization, the canonical way to localize XML content is to use 81 siblings element carrying different xml:lang attributes like in the 82 following: 83 <pre><welcome> 84 <msg xml:lang="en">hello</msg> 85 <msg xml:lang="fr">bonjour</msg> 86</welcome></pre> 87 </li> 88 <li>use attribute to refine the content of an element but avoid them for 89 more complex tasks, attribute parsing is not cheaper than an element and 90 it is far easier to make an element content more complex while attribute 91 will have to remain very simple.</li> 92</ul> 93 94<h3>Versioning:</h3> 95 96<p>As part of the design, make sure the structure you define will be usable 97for future extension that you may not consider for the current version, there 98is 2 parts for this:</p> 99<ul> 100 <li>make sure the instance contains a version number which will allow to 101 make backward compatibility easy, something as simple as having a 102 <code>version="1.0"</code> on the root document of the instance is 103 sufficient</li> 104 <li>while designing the code doing the analysis of the data provided by the 105 XML parser, make sure you can work with unknown versions, generate a UI 106 warning and process only the tags recognized by your version but keep in 107 mind that you should not break on unknown elements if the version 108 attribute was not in the recognized set.</li> 109</ul> 110 111<h3>Other design parts:</h3> 112 113<p>While defining you vocabulary, try to think in term of other usage to your 114data, for example how using XSLT stylesheets could be used to make an HTML 115view of your data, or to convert it into a different format. Checking XML 116Schemas and looking at defining an XML Schemas with a more complete 117validation and datatyping of your data structures are important, this helps 118avoiding some mistakes in the design phase.</p> 119 120<h3>Namespace:</h3> 121 122<p>If you expect your XML vocabulary to be used or recognized outside of your 123application (for example binding a specific processing from a graphic shell 124like Nautilus to instance of your data) then you should really define an <a 125href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your 126vocabulary. A namespace name is an URL (absolute URI more precisely), it is 127generally recommended to anchor it as an HTTP resource to a server associated 128with the software project, see the next section about this. In practice this 129will mean that XML parsers will not handle your element names as-is but as a 130couple based on the namespace name and the element name. This allow to 131recognize and disambiguate processing. Unicity of the namespace name can be 132for the most part garanteed by the use of the DNS registry. Namespace can 133also be used to carry versionning informations like:</p> 134 135<p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p> 136 137<p>an an easy way to use them is to make them the default namespace on the 138root element of the XML instance like:</p> 139<pre><structure xmlns="http://www.gnome.org/project/projectname/1.0/"> 140 <data> 141 ... 142 </data> 143</structure></pre> 144 145<p>In that document, structure and all descendant elements like data are in 146the given namespace.</p> 147 148<h2><a name="Canonical">Canonical URL</a></h2> 149 150<p>As seen in the previous namespace section, while XML processing is not 151tied to the Web there is a natural synergy between both, XML was designed to 152be available on the Web, and keeping the infrastructure that way helps 153deploying the XML resources. The core of this issue is the notion of 154"Canonical URL" of an XML resource, the resource can be an XML document, a 155DTD, a stylesheet, a schemas, or even non-XML data associated to an XML 156resource, the canonical URL is the URL where the "master" copy of that 157resource is expected to be present on the Web. Usually when processing XML a 158copy of the resource will be present on the local disk, maybe in 159/usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\ 160(horror !), the key point is that the way to name that resource should be 161independant of the actual place where it reside on disk if it is available, 162and the fact that the processing will still work if there is no local copy 163(and that the machine where the processing is connected to the Internet).</p> 164 165<p>What this really mean is that one should never use the local name of a 166resource to reference it but always use the canonical URL. For example in a 167DocBook instance the following should not be used:</p> 168<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br> 169 170 171 "/usr/share/xml/docbook/4.2/docbookx.dtd"></pre> 172 173<p>But always reference the canonical URL for the DTD:</p> 174<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br> 175 176 177 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> </pre> 178 179<p>Similary, the document instance may reference the <a 180href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to 181generate HTML, and the canonical URL should be used:</p> 182<pre><?xml-stylesheet 183 href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl" 184 type="text/xsl"?></pre> 185 186<p>Defining the canonical URL for the resources needed should obey a few 187simple rules similar to those used to design namespace names:</p> 188<ul> 189 <li>use a DNS name you know is associated to the project and will be 190 available on the long term</li> 191 <li>whithin that server space, reserve the right to the subtree where you 192 intend to keep those data</li> 193 <li>version the URL so that multiple concurent versions of the resources 194 can be hosted simultaneously</li> 195</ul> 196 197<h2><a name="Catalog">Catalog setup</a></h2> 198 199<h3>How catalog works:</h3> 200 201<p>The catalogs are the technical mechanism which allow the XML processing 202tools to use a local copy of the resources if it is available even if the 203instance document references the canonical URL. <a 204href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are 205anchored in the root catalog (usually <code>/etc/xml/catalog</code> or 206defined by the user). They are a tree of XML documents defining the mappings 207between the canonical naming space and the local installed ones, this can be 208seen as a static cache structure.</p> 209 210<p>When the XML processor is asked to process a resource it will 211automatically test for a locally available version in the catalog, starting 212from the root catalog, and possibly fetching sub-catalog resources until it 213finds that the catalog has that resource or not. If not the default 214processing of fetching the resource from the Web is done, allowing in most 215case to recover from a catalog miss. The key point is that the document 216instances are totally independant of the availability of a catalog or from 217the actual place where the loacl resource they reference may be installed. 218This greatly improve the management of the document in the long run, making 219them independant of the platform or toolchain used to process them. The 220figure below tries to express that mechanism:<img src="catalog.gif" 221alt="Picture describing the catalog "></p> 222 223<h3>Usual catalog setup:</h3> 224 225<p>Usually catalogs for a project are setup as a 2 level hierarchical cache, 226the root catalog containing only "delegates" indicating a separate subcatalog 227dedicated to the project. The goal is to keep the root catalog clean and 228simplify the maintainance of the catalog by using separate catalogs per 229project. For example when creating a catalog for the <a 230href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to 231the root catalog:</p> 232<pre> <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0" 233 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/> 234 <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD" 235 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/> 236 <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD" 237 catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/></pre> 238 239<p>They are all "delegates" meaning that if the catalog system is asked to 240resolve a reference corresponding to them, it has to lookup a sub catalog. 241Here the subcatalog was installed as 242<code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree, that 243decision is left to the sysadmin or the packager for that system and may 244obbey different rules, but the actual place on the filesystem (or on a 245resource cache on the local network) will not influence the processing as 246long as it is available. The first rule indicate that if the reference uses a 247PUBLIC identifier beginning with the</p> 248 249<p><code>"-//W3C//DTD XHTML 1.0"</code></p> 250 251<p>substring, then the catalog lookup should be limited to the specific given 252lookup catalog. Similary the second and third entries indicate those 253delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL 254starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> subscting 255which indicates the location on the W3C server where the XHTML1 resources are 256stored, those are the beginning of all Canonical URLs for XHTML1 resources. 257Those 3 rules are sufficient in practice to capture all references to XHTML1 258resources and direct the processing tools to the right subcatalog.</p> 259 260<h3>A subcatalog example:</h3> 261 262<p>Here is the complete subcatalog used for XHTML1:</p> 263<pre><?xml version="1.0"?> 264<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 265 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 266<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 267 <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN" 268 uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/> 269 <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN" 270 uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/> 271 <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN" 272 uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/> 273 <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD" 274 rewritePrefix="xhtml1-20020801/DTD"/> 275 <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD" 276 rewritePrefix="xhtml1-20020801/DTD"/> 277</catalog></pre> 278 279<p>There is a few things to notice:</p> 280<ul> 281 <li>this is an XML resource, it points to the DTD using Canonical URLs, the 282 root element defines a namespace (but based on an URN not an HTTP 283 URL).</li> 284 <li>it contains 5 rules, the 3 first ones are direct mapping for the 3 285 PUBLIC identifiers defined by the XHTML1 specification and associating 286 them with the local resource containing the DTD, the 2 last ones are 287 rewrite rules allowing to build the local filename for any URL based on 288 "http://www.w3.org/TR/xhtml1/DTD", the local cache simplify the rules by 289 keeping the same structure as the on-line server at the Canonical URL</li> 290 <li>the local resources are designated using URI references (the uri or 291 rewritePrefix attributes), the base being the containing sub-catalog URL, 292 which means that in practice the copy of the XHTML1 strict DTD is stored 293 locally in 294 <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li> 295</ul> 296 297<p>Those 5 rules are sufficient to cover all references to the resources held 298at the Canonical URL for the XHTML1 DTDs.</p> 299 300<h2><a name="Package">Package integration</a></h2> 301 302<p>Creating and removing catalogs should be handled as part of the process of 303(un)installing the local copy of the resources. The catalog files being XML 304resources should be processed with XML based tools to avoid problems with the 305generated files, the xmlcatalog command coming with libxml2 allows to create 306catalogs, and add or remove rules at that time. Here is a complete example 307coming from RPM for the XHTML1 DTDs post install script. While this example 308is platform and packaging specific, this can be useful as a an example in 309other contexts:</p> 310<pre>%post 311CATALOG=/usr/share/sgml/xhtml1/xmlcatalog 312# 313# Register it in the super catalog with the appropriate delegates 314# 315ROOTCATALOG=/etc/xml/catalog 316 317if [ ! -r $ROOTCATALOG ] 318then 319 /usr/bin/xmlcatalog --noout --create $ROOTCATALOG 320fi 321 322if [ -w $ROOTCATALOG ] 323then 324 /usr/bin/xmlcatalog --noout --add "delegatePublic" \ 325 "-//W3C//DTD XHTML 1.0" \ 326 "file://$CATALOG" $ROOTCATALOG 327 /usr/bin/xmlcatalog --noout --add "delegateSystem" \ 328 "http://www.w3.org/TR/xhtml1/DTD" \ 329 "file://$CATALOG" $ROOTCATALOG 330 /usr/bin/xmlcatalog --noout --add "delegateURI" \ 331 "http://www.w3.org/TR/xhtml1/DTD" \ 332 "file://$CATALOG" $ROOTCATALOG 333fi</pre> 334 335<p>The XHTML1 subcatalog is not created on-the-fly in that case, it is 336installed as part of the files of the packages. So the only work needed is to 337make sure the root catalog exists and register the delegate rules.</p> 338 339<p>Similary, the script for the post-uninstall just remove the rules from the 340catalog:</p> 341<pre>%postun 342# 343# On removal, unregister the xmlcatalog from the supercatalog 344# 345if [ "$1" = 0 ]; then 346 CATALOG=/usr/share/sgml/xhtml1/xmlcatalog 347 ROOTCATALOG=/etc/xml/catalog 348 349 if [ -w $ROOTCATALOG ] 350 then 351 /usr/bin/xmlcatalog --noout --del \ 352 "-//W3C//DTD XHTML 1.0" $ROOTCATALOG 353 /usr/bin/xmlcatalog --noout --del \ 354 "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG 355 /usr/bin/xmlcatalog --noout --del \ 356 "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG 357 fi 358fi</pre> 359 360<p>Note the test against $1, this is needed to not remove the delegate rules 361in case of upgrade of the package.</p> 362 363<p>Following the set of guidelines and tips provided in this document should 364help deploy the XML resources in the GNOME framework without much pain and 365ensure a smooth evolution of the resource and instances.</p> 366 367<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p> 368 369<p>$Id$</p> 370 371<p></p> 372</body> 373</html> 374