catalog.html revision 52dcab3999cc9c480b275bc3ddf66dbb66f92d68
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 2<html> 3<head> 4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> 5<style type="text/css"><!-- 6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica} 7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt} 8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica} 9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica} 10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica} 11A:link, A:visited, A:active { text-decoration: underline } 12--></style> 13<title>Catalog support</title> 14</head> 15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> 16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> 17<td width="180"> 18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a> 19</td> 20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> 21<h1>The XML C library for Gnome</h1> 22<h2>Catalog support</h2> 23</td></tr></table></td></tr></table></td> 24</tr></table> 25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> 26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> 27<table width="100%" border="0" cellspacing="1" cellpadding="3"> 28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> 29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt"> 30<li><a href="index.html">Home</a></li> 31<li><a href="intro.html">Introduction</a></li> 32<li><a href="FAQ.html">FAQ</a></li> 33<li><a href="docs.html">Documentation</a></li> 34<li><a href="bugs.html">Reporting bugs and getting help</a></li> 35<li><a href="help.html">How to help</a></li> 36<li><a href="downloads.html">Downloads</a></li> 37<li><a href="news.html">News</a></li> 38<li><a href="XML.html">XML</a></li> 39<li><a href="XSLT.html">XSLT</a></li> 40<li><a href="architecture.html">libxml architecture</a></li> 41<li><a href="tree.html">The tree output</a></li> 42<li><a href="interface.html">The SAX interface</a></li> 43<li><a href="xmldtd.html">Validation & DTDs</a></li> 44<li><a href="xmlmem.html">Memory Management</a></li> 45<li><a href="encoding.html">Encodings support</a></li> 46<li><a href="xmlio.html">I/O Interfaces</a></li> 47<li><a href="catalog.html">Catalog support</a></li> 48<li><a href="library.html">The parser interfaces</a></li> 49<li><a href="entities.html">Entities or no entities</a></li> 50<li><a href="namespaces.html">Namespaces</a></li> 51<li><a href="upgrade.html">Upgrading 1.x code</a></li> 52<li><a href="threads.html">Thread safety</a></li> 53<li><a href="DOM.html">DOM Principles</a></li> 54<li><a href="example.html">A real example</a></li> 55<li><a href="contribs.html">Contributions</a></li> 56<li> 57<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> 58</li> 59</ul></td></tr> 60</table> 61<table width="100%" border="0" cellspacing="1" cellpadding="3"> 62<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> 63<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt"> 64<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> 65<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> 66<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li> 67<li><a href="ftp://xmlsoft.org/">FTP</a></li> 68<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> 69<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li> 70<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li> 71</ul></td></tr> 72</table> 73</td></tr></table></td> 74<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> 75<p>Table of Content:</p> 76<ol> 77<li><a href="General2">General overview</a></li> 78<li><a href="#definition">The definition</a></li> 79<li><a href="#Simple">Using catalogs</a></li> 80<li><a href="#Some">Some examples</a></li> 81<li><a href="#reference">How to tune catalog usage</a></li> 82<li><a href="#validate">How to debug catalog processing</a></li> 83<li><a href="#Declaring">How to create and maintain catalogs</a></li> 84<li><a href="#implemento">The implementor corner quick review of the 85 API</a></li> 86<li><a href="#Other">Other resources</a></li> 87</ol> 88<h3><a name="General2">General overview</a></h3> 89<p>What is a catalog? Basically it's a lookup mechanism used when an entity 90(a file or a remote resource) references another entity. The catalog lookup 91is inserted between the moment the reference is recognized by the software 92(XML parser, stylesheet processing, or even images referenced for inclusion 93in a rendering) and the time where loading that resource is actually 94started.</p> 95<p>It is basically used for 3 things:</p> 96<ul> 97<li>mapping from "logical" names, the public identifiers and a more 98 concrete name usable for download (and URI). For example it can associate 99 the logical name 100 <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p> 101<p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be 102 downloaded</p> 103<p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p> 104</li> 105<li>remapping from a given URL to another one, like an HTTP indirection 106 saying that 107 <p>"http://www.oasis-open.org/committes/tr.xsl"</p> 108<p>should really be looked at</p> 109<p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p> 110</li> 111<li>providing a local cache mechanism allowing to load the entities 112 associated to public identifiers or remote resources, this is a really 113 important feature for any significant deployment of XML or SGML since it 114 allows to avoid the aleas and delays associated to fetching remote 115 resources.</li> 116</ul> 117<h3><a name="definition">The definitions</a></h3> 118<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p> 119<ul> 120<li>the older SGML catalogs, the official spec is SGML Open Technical 121 Resolution TR9401:1997, but is better understood by reading <a href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from 122 James Clark. This is relatively old and not the preferred mode of 123 operation of libxml.</li> 124<li> 125<a href="http://www.oasis-open.org/committees/entity/spec.html">XML 126 Catalogs</a> 127 is far more flexible, more recent, uses an XML syntax and should scale 128 quite better. This is the default option of libxml.</li> 129</ul> 130<p> 131<h3><a name="Simple">Using catalog</a></h3> 132<p>In a normal environment libxml will by default check the presence of a 133catalog in /etc/xml/catalog, and assuming it has been correctly populated, 134the processing is completely transparent to the document user. To take a 135concrete example, suppose you are authoring a DocBook document, this one 136starts with the following DOCTYPE definition:</p> 137<pre><?xml version='1.0'?> 138<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" 139 "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre> 140<p>When validating the document with libxml, the catalog will be 141automatically consulted to lookup the public identifier "-//Norman Walsh//DTD 142DocBk XML V3.1.4//EN" and the system identifier 143"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have 144been installed on your system and the catalogs actually point to them, libxml 145will fetch them from the local disk.</p> 146<p style="font-size: 10pt"> 147<strong>Note</strong>: Really don't use this 148DOCTYPE example it's a really old version, but is fine as an example.</p> 149<p>Libxml will check the catalog each time that it is requested to load an 150entity, this includes DTD, external parsed entities, stylesheets, etc ... If 151your system is correctly configured all the authoring phase and processing 152should use only local files, even if your document stays portable because it 153uses the canonical public and system ID, referencing the remote document.</p> 154<h3><a name="Some">Some examples:</a></h3> 155<p>Here is a couple of fragments from XML Catalogs used in libxml early 156regression tests in <code>test/catalogs</code> :</p> 157<pre><?xml version="1.0"?> 158<!DOCTYPE catalog PUBLIC 159 "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 160 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 161<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 162 <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" 163 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> 164...</pre> 165<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are 166written in XML, there is a specific namespace for catalog elements 167"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this 168catalog is a <code>public</code> mapping it allows to associate a Public 169Identifier with an URI.</p> 170<pre>... 171 <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/" 172 rewritePrefix="file:///usr/share/xml/docbook/"/> 173...</pre> 174<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that 175any URI starting with a given prefix should be looked at another URI 176constructed by replacing the prefix with an new one. In effect this acts like 177a cache system for a full area of the Web. In practice it is extremely useful 178with a file prefix if you have installed a copy of those resources on your 179local system.</p> 180<pre>... 181<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //" 182 catalog="file:///usr/share/xml/docbook.xml"/> 183<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML" 184 catalog="file:///usr/share/xml/docbook.xml"/> 185<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML" 186 catalog="file:///usr/share/xml/docbook.xml"/> 187<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/" 188 catalog="file:///usr/share/xml/docbook.xml"/> 189<delegateURI uriStartString="http://www.oasis-open.org/docbook/" 190 catalog="file:///usr/share/xml/docbook.xml"/> 191...</pre> 192<p>Delegation is the core features which allows to build a tree of catalogs, 193easier to maintain than a single catalog, based on Public Identifier, System 194Identifier or URI prefixes it instructs the catalog software to look up 195entries in another resource. This feature allow to build hierarchies of 196catalogs, the set of entries presented should be sufficient to redirect the 197resolution of all DocBook references to the specific catalog in 198<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all 199references for DocBook 4.2.1 to a specific catalog installed at the same time 200as the DocBook resources on the local machine.</p> 201<h3><a name="reference">How to tune catalog usage:</a></h3> 202<p>The user can change the default catalog behaviour by redirecting queries 203to its own set of catalogs, this can be done by setting the 204<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an 205empty one should deactivate loading the default <code>/etc/xml/catalog</code> 206default catalog</p> 207<h3><a name="validate">How to debug catalog processing:</a></h3> 208<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will 209make libxml output debugging informations for each catalog operations, for 210example:</p> 211<pre>orchis:~/XML -> xmllint --memory --noout test/ent2 212warning: failed to load external entity "title.xml" 213orchis:~/XML -> export XML_DEBUG_CATALOG= 214orchis:~/XML -> xmllint --memory --noout test/ent2 215Failed to parse catalog /etc/xml/catalog 216Failed to parse catalog /etc/xml/catalog 217warning: failed to load external entity "title.xml" 218Catalogs cleanup 219orchis:~/XML -> </pre> 220<p>The test/ent2 references an entity, running the parser from memory makes 221the base URI unavailable and the the "title.xml" entity cannot be loaded. 222Setting up the debug environment variable allows to detect that an attempt is 223made to load the <code>/etc/xml/catalog</code> but since it's not present the 224resolution fails.</p> 225<p>But the most advanced way to debug XML catalog processing is to use the 226<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load 227catalogs and make resolution queries to see what is going on. This is also 228used for the regression tests:</p> 229<pre>orchis:~/XML -> /xmlcatalog test/catalogs/docbook.xml \ 230 "-//OASIS//DTD DocBook XML V4.1.2//EN" 231http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 232orchis:~/XML -> </pre> 233<p>For debugging what is going on, adding one -v flags increase the verbosity 234level to indicate the processing done (adding a second flag also indicate 235what elements are recognized at parsing):</p> 236<pre>orchis:~/XML -> /xmlcatalog -v test/catalogs/docbook.xml \ 237 "-//OASIS//DTD DocBook XML V4.1.2//EN" 238Parsing catalog test/catalogs/docbook.xml's content 239Found public match -//OASIS//DTD DocBook XML V4.1.2//EN 240http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 241Catalogs cleanup 242orchis:~/XML -> </pre> 243<p>A shell interface is also available to debug and process multiple queries 244(and for regression tests):</p> 245<pre>orchis:~/XML -> /xmlcatalog -shell test/catalogs/docbook.xml \ 246 "-//OASIS//DTD DocBook XML V4.1.2//EN" 247> help 248Commands available: 249public PublicID: make a PUBLIC identifier lookup 250system SystemID: make a SYSTEM identifier lookup 251resolve PublicID SystemID: do a full resolver lookup 252add 'type' 'orig' 'replace' : add an entry 253del 'values' : remove values 254dump: print the current catalog state 255debug: increase the verbosity level 256quiet: decrease the verbosity level 257exit: quit the shell 258> public "-//OASIS//DTD DocBook XML V4.1.2//EN" 259http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 260> quit 261orchis:~/XML -> </pre> 262<p>This should be sufficient for most debugging purpose, this was actually 263used heavily to debug the XML Catalog implementation itself.</p> 264<h3> 265<a name="Declaring">How to create and maintain</a> catalogs:</h3> 266<p>Basically XML Catalogs are XML files, you can either use XML tools to 267manage them or use <strong>xmlcatalog</strong> for this. The basic step is 268to create a catalog the -create option provide this facility:</p> 269<pre>orchis:~/XML -> /xmlcatalog --create tst.xml 270<?xml version="1.0"?> 271<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 272 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 273<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> 274orchis:~/XML -> </pre> 275<p>By default xmlcatalog does not overwrite the original catalog and save the 276result on the standard output, this can be overridden using the -noout 277option. The <code>-add</code> command allows to add entries in the 278catalog:</p> 279<pre>orchis:~/XML -> /xmlcatalog --noout --create --add "public" \ 280 "-//OASIS//DTD DocBook XML V4.1.2//EN" \ 281 http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml 282orchis:~/XML -> cat tst.xml 283<?xml version="1.0"?> 284<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \ 285 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 286<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 287<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" 288 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> 289</catalog> 290orchis:~/XML -> </pre> 291<p>The <code>-add</code> option will always take 3 parameters even if some of 292the XML Catalog constructs (like nextCatalog) will have only a single 293argument, just pass a third empty string, it will be ignored.</p> 294<p>Similarly the <code>-del</code> option remove matching entries from the 295catalog:</p> 296<pre>orchis:~/XML -> /xmlcatalog --del \ 297 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml 298<?xml version="1.0"?> 299<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 300 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 301<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> 302orchis:~/XML -> </pre> 303<p>The catalog is now empty. Note that the matching of <code>-del</code> is 304exact and would have worked in a similar fashion with the Public ID 305string.</p> 306<p>This is rudimentary but should be sufficient to manage a not too complex 307catalog tree of resources.</p> 308<h3><a name="implemento">The implementor corner quick review of the 309API:</a></h3> 310<p>First, and like for every other module of libxml, there is an 311automatically generated <a href="html/libxml-catalog.html">API page for 312catalog support</a>.</p> 313<p>The header for the catalog interfaces should be included as:</p> 314<pre>#include <libxml/catalog.h></pre> 315<p>The API is voluntarily kept very simple. First it is not obvious that 316applications really need access to it since it is the default behaviour of 317libxml (Note: it is possible to completely override libxml default catalog by 318using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to 319plug an application specific resolver).</p> 320<p>Basically libxml support 2 catalog lists:</p> 321<ul> 322<li>the default one, global shared by all the application</li> 323<li>a per-document catalog, this one is built if the document uses the 324 <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is 325 associated to the parser context and destroyed when the parsing context 326 is destroyed.</li> 327</ul> 328<p>the document one will be used first if it exists.</p> 329<h4>Initialization routines:</h4> 330<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be 331used at startup to initialize the catalog, if the catalog should be 332initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs() 333should be called before xmlInitializeCatalog() which would otherwise do a 334default initialization first.</p> 335<p>The xmlCatalogAddLocal() call is used by the parser to grow the document 336own catalog list if needed.</p> 337<h4>Preferences setup:</h4> 338<p>The XML Catalog spec requires the possibility to select default 339preferences between public and system delegation, 340xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and 341xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should 342be forbidden, allowed for global catalog, for document catalog or both, the 343default is to allow both.</p> 344<p>And of course xmlCatalogSetDebug() allows to generate debug messages 345(through the xmlGenericError() mechanism).</p> 346<h4>Querying routines:</h4> 347<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic() 348and xmlCatalogResolveURI() are relatively explicit if you read the XML 349Catalog specification they correspond to section 7 algorithms, they should 350also work if you have loaded an SGML catalog with a simplified semantic.</p> 351<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but 352operate on the document catalog list</p> 353<h4>Cleanup and Miscellaneous:</h4> 354<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is 355the per-document equivalent.</p> 356<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the 357first catalog in the global list, and xmlCatalogDump() allows to dump a 358catalog state, those routines are primarily designed for xmlcatalog, I'm not 359sure that exposing more complex interfaces (like navigation ones) would be 360really useful.</p> 361<p>The xmlParseCatalogFile() is a function used to load XML Catalog files, 362it's similar as xmlParseFile() except it bypass all catalog lookups, it's 363provided because this functionality may be useful for client tools.</p> 364<h4>threaded environments:</h4> 365<p>Since the catalog tree is built progressively, some care has been taken to 366try to avoid troubles in multithreaded environments. The code is now thread 367safe assuming that the libxml library has been compiled with threads 368support.</p> 369<p> 370<h3><a name="Other">Other resources</a></h3> 371<p>The XML Catalog specification is relatively recent so there isn't much 372literature to point at:</p> 373<ul> 374<li>You can find an good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the 375 need for catalogs</a>, it provides a lot of context informations even if 376 I don't agree with everything presented.</li> 377<li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML 378 catalog proposal</a> from John Cowan</li> 379<li>The <a href="http://www.rddl.org/">Resource Directory Description 380 Language</a> (RDDL) another catalog system but more oriented toward 381 providing metadata for XML namespaces.</li> 382<li>the page from the OASIS Technical <a href="http://www.oasis-open.org/committees/entity/">Committee on Entity 383 Resolution</a> who maintains XML Catalog, you will find pointers to the 384 specification update, some background and pointers to others tools 385 providing XML Catalog support</li> 386<li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a 387 mall tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to 388 work fine for me</li> 389<li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog 390 manual page</a> 391</li> 392</ul> 393<p>If you have suggestions for corrections or additions, simply contact 394me:</p> 395<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p> 396</td></tr></table></td></tr></table></td></tr></table></td> 397</tr></table></td></tr></table> 398</body> 399</html> 400