xmldtd.html revision 64e739068d00607b0702e582e8a18f82d20bad88
1<html> 2<head> 3 <title>Libxml Input/Output handling</title> 4 <meta name="GENERATOR" content="amaya V4.1"> 5 <meta http-equiv="Content-Type" content="text/html"> 6</head> 7 8<body bgcolor="#ffffff"> 9<h1 align="center">Libxml DTD support</h1> 10 11<p>Location: <a 12href="http://xmlsoft.org/xmlio.html">http://xmlsoft.org/xmldtd.html</a></p> 13 14<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p> 15 16<p>Mailing-list archive: <a 17href="http://xmlsoft.org/messages/">http://xmlsoft.org/messages/</a></p> 18 19<p>Version: $Revision$</p> 20 21<p>Table of Content:</p> 22<ol> 23 <li><a href="#General">General overview</a></li> 24 <li><a href="#definition">The definition</a></li> 25 <li><a href="#Simple">Simple rules</a> 26 <ol> 27 <li><a href="#reference">How to reference a DTD from a document</a></li> 28 <li><a href="#Declaring">Declaring elements</a></li> 29 <li><a href="#Declaring1">Declaring attributes</a></li> 30 </ol> 31 </li> 32 <li><a href="#Some">Some examples</a></li> 33 <li><a href="#validate">How to validate</a></li> 34 <li><a href="#Other">Other resources</a></li> 35</ol> 36 37<h2><a name="General">General overview</a></h2> 38 39<p>DTD is the acronym for Document Type Definition. This is a description of 40the content for a familly of XML files. This is part of the XML 1.0 41specification, and alows to describe and check that a given document instance 42conforms to a set of rules detailing its structure and content.</p> 43 44<h2><a name="definition">The definition</a></h2> 45 46<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a 47href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of 48Rev1</a>):</p> 49<ul> 50 <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring 51 elements</a></li> 52 <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring 53 attributes</a></li> 54</ul> 55 56<p>(unfortunately) all this is inherited from the SGML world, the syntax is 57ancient...</p> 58 59<h2><a name="Simple">Simple rules</a></h2> 60 61<p>Writing DTD can be done in multiple ways, the rules to build them if you 62need something fixed or something which can evolve over time can be radically 63different. Really complex DTD like Docbook ones are flexible but quite harder 64to design. I will just focuse on DTDs for a formats with a fixed simple 65structure. It is just a set of basic rules, and definitely not exhaustive nor 66useable for complex DTD design.</p> 67 68<h3><a name="reference">How to reference a DTD from a document</a>:</h3> 69 70<p>Assuming the top element of the document is <code>spec</code> and the dtd 71is placed in the file <code>mydtd</code> in the subdirectory <code>dtds</code> 72of the directory from where the document were loaded:</p> 73 74<p><code><!DOCTYPE spec SYSTEM "dtds/mydtd"></code></p> 75 76<p>Notes:</p> 77<ul> 78 <li>the system string is actually an URI-Reference (as defined in <a 79 href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a 80 full URL string indicating the location of your DTD on the Web, this is a 81 really good thing to do if you want others to validate your document</li> 82 <li>it is also possible to associate a <code>PUBLIC</code> identifier (a 83 magic string) so that the DTd is looked up in catalogs on the client side 84 without having to locate it on the web</li> 85 <li>a dtd contains a set of elements and attributes declarations, but they 86 don't define what the root of the document should be. This is explicitely 87 told to the parser/validator as the first element of the 88 <code>DOCTYPE</code> declaration.</li> 89</ul> 90 91<h3><a name="Declaring">Declaring elements</a>:</h3> 92 93<p>The following declares an element <code>spec</code>:</p> 94 95<p><code><!ELEMENT spec (front, body, back?)></code></p> 96 97<p>it also expresses that the spec element contains one <code>front</code>, 98one <code>body</code> and one optionnal <code>back</code> children elements in 99this order. The declaration of one element of the structure and its content 100are done in a single declaration. Similary the following declares 101<code>div1</code> elements:</p> 102 103<p><code><!ELEMENT div1 (head, (p | list | note)*, div2*)></code></p> 104 105<p>means div1 contains one <code>head</code> then a series of optional 106<code>p</code>, <code>list</code>s and <code>note</code>s and then an optional 107<code>div2</code>. And last but not least an element can contain text:</p> 108 109<p><code><!ELEMENT b (#PCDATA)></code></p> 110 111<p><code>b</code> contains text or being of mixed content (text and elements 112in no particular order):</p> 113 114<p><code><!ELEMENT p (#PCDATA|a|ul|b|i|em)*></code></p> 115 116<p><code>p </code>can contain text or <code>a</code>, <code>ul</code>, 117<code>b</code>, <code>i </code>or <code>em</code> elements in no particular 118order.</p> 119 120<h3><a name="Declaring1">Declaring attributes</a>:</h3> 121 122<p>again the attributes declaration includes their content definition:</p> 123 124<p><code><!ATTLIST termdef name CDATA #IMPLIED></code></p> 125 126<p>means that the element <code>termdef</code> can have a <code>name</code> 127attribute containing text (<code>CDATA</code>) and which is optionnal 128(<code>#IMPLIED</code>). The attribute value can also be defined within a 129set:</p> 130 131<p><code><!ATTLIST list type (bullets|ordered|glossary) 132"ordered"></code></p> 133 134<p>means <code>list</code> element have a <code>type</code> attribute with 3 135allowed values "bullets", "ordered" or "glossary" and which default to 136"ordered" if the attribute is not explicitely specified.</p> 137 138<p>The content type of an attribute can be text (<code>CDATA</code>), 139anchor/reference/references 140(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies) 141(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s) 142(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a 143<code>chapter</code> element can have an optional <code>id</code> attribute of 144type <code>ID</code>, usable for reference from attribute of type IDREF:</p> 145 146<p><code><!ATTLIST chapter id ID #IMPLIED></code></p> 147 148<p>The last value of an attribute definition can be <code>#REQUIRED 149</code>meaning that the attribute has to be given, <code>#IMPLIED</code> 150meaning that it is optional, or the default value (possibly prefixed by 151<code>#FIXED</code> if it is the only allowed).</p> 152 153<p>Notes:</p> 154<ul> 155 <li>usually the attributes pertaining to a given element are declared in a 156 single expression, but it is just a convention adopted by a lot of DTD 157 writers: 158 <pre><!ATTLIST termdef 159 id ID #REQUIRED 160 name CDATA #IMPLIED></pre> 161 <p>The previous construct defines both <code>id</code> and 162 <code>name</code> attributes for the element <code>termdef</code></p> 163 </li> 164</ul> 165 166<h2><a name="Some">Some examples</a></h2> 167 168<p>The directory <code>test/valid/dtds/</code> in the libxml distribution 169contains some complex DTD examples. The <code>test/valid/dia.xml</code> 170example shows an XML file where the simple DTD is directly included within the 171document.</p> 172 173<h2><a name="validate">How to validate</a></h2> 174 175<p>The simplest is to use the xmllint program comming with libxml. The 176<code>--valid</code> option turn on validation of the files given as input, 177for example the following validates a copy of the first revision of the XML 1781.0 specification:</p> 179 180<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p> 181 182<p>the -- noout is used to not output the resulting tree.</p> 183 184<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against 185a given DTD.</p> 186 187<p>Libxml exports an API to handle DTDs and validation, check the <a 188href="http://xmlsoft.org/html/gnome-xml-valid.html">associated 189description</a>.</p> 190 191<h2><a name="Other">Other resources</a></h2> 192 193<p>DTDs are as old as SGML. So there may be a number of examples on-line, I 194will just list one for now, others pointers welcome:</p> 195<ul> 196 <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li> 197</ul> 198 199<p></p> 200 201<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p> 202 203<p>$Id$</p> 204</body> 205</html> 206