xmldtd.html revision 64e739068d00607b0702e582e8a18f82d20bad88
1<html>
2<head>
3  <title>Libxml Input/Output handling</title>
4  <meta name="GENERATOR" content="amaya V4.1">
5  <meta http-equiv="Content-Type" content="text/html">
6</head>
7
8<body bgcolor="#ffffff">
9<h1 align="center">Libxml DTD support</h1>
10
11<p>Location: <a
12href="http://xmlsoft.org/xmlio.html">http://xmlsoft.org/xmldtd.html</a></p>
13
14<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
15
16<p>Mailing-list archive:  <a
17href="http://xmlsoft.org/messages/">http://xmlsoft.org/messages/</a></p>
18
19<p>Version: $Revision$</p>
20
21<p>Table of Content:</p>
22<ol>
23  <li><a href="#General">General overview</a></li>
24  <li><a href="#definition">The definition</a></li>
25  <li><a href="#Simple">Simple rules</a>
26    <ol>
27      <li><a href="#reference">How to reference a DTD from a document</a></li>
28      <li><a href="#Declaring">Declaring elements</a></li>
29      <li><a href="#Declaring1">Declaring attributes</a></li>
30    </ol>
31  </li>
32  <li><a href="#Some">Some examples</a></li>
33  <li><a href="#validate">How to validate</a></li>
34  <li><a href="#Other">Other resources</a></li>
35</ol>
36
37<h2><a name="General">General overview</a></h2>
38
39<p>DTD is the acronym for Document Type Definition. This is a description of
40the content for a familly of XML files. This is part of the XML 1.0
41specification, and alows to describe and check that a given document instance
42conforms to a set of rules detailing its structure and content.</p>
43
44<h2><a name="definition">The definition</a></h2>
45
46<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a
47href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
48Rev1</a>):</p>
49<ul>
50  <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
51  elements</a></li>
52  <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
53  attributes</a></li>
54</ul>
55
56<p>(unfortunately) all this is inherited from the SGML world, the syntax is
57ancient...</p>
58
59<h2><a name="Simple">Simple rules</a></h2>
60
61<p>Writing DTD can be done in multiple ways, the rules to build them if you
62need something fixed or something which can evolve over time can be radically
63different. Really complex DTD like Docbook ones are flexible but quite harder
64to design. I will just focuse on DTDs for a formats with a fixed simple
65structure. It is just a set of basic rules, and definitely not exhaustive nor
66useable for complex DTD design.</p>
67
68<h3><a name="reference">How to reference a DTD from a document</a>:</h3>
69
70<p>Assuming the top element of the document is <code>spec</code> and the dtd
71is placed in the file <code>mydtd</code> in the subdirectory <code>dtds</code>
72of the directory from where the document were loaded:</p>
73
74<p><code>&lt;!DOCTYPE spec SYSTEM "dtds/mydtd"&gt;</code></p>
75
76<p>Notes:</p>
77<ul>
78  <li>the system string is actually an URI-Reference (as defined in <a
79    href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a
80    full URL string indicating the location of your DTD on the Web, this is a
81    really good thing to do if you want others to validate your document</li>
82  <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
83    magic string) so that the DTd is looked up in catalogs on the client side
84    without having to locate it on the web</li>
85  <li>a dtd contains a set of elements and attributes declarations, but they
86    don't define what the root of the document should be. This is explicitely
87    told to the parser/validator as the first element of the
88    <code>DOCTYPE</code> declaration.</li>
89</ul>
90
91<h3><a name="Declaring">Declaring elements</a>:</h3>
92
93<p>The following declares an element <code>spec</code>:</p>
94
95<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
96
97<p>it also expresses that the spec element contains one <code>front</code>,
98one <code>body</code> and one optionnal <code>back</code> children elements in
99this order. The declaration of one element of the structure and its content
100are done in a single declaration. Similary the following declares
101<code>div1</code> elements:</p>
102
103<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)&gt;</code></p>
104
105<p>means div1 contains one <code>head</code> then a series of optional
106<code>p</code>, <code>list</code>s and <code>note</code>s and then an optional
107<code>div2</code>. And last but not least an element can contain text:</p>
108
109<p><code>&lt;!ELEMENT b (#PCDATA)&gt;</code></p>
110
111<p><code>b</code> contains text or being of mixed content (text and elements
112in no particular order):</p>
113
114<p><code>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*&gt;</code></p>
115
116<p><code>p </code>can contain text or <code>a</code>, <code>ul</code>,
117<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
118order.</p>
119
120<h3><a name="Declaring1">Declaring attributes</a>:</h3>
121
122<p>again the attributes declaration includes their content definition:</p>
123
124<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
125
126<p>means that the element <code>termdef</code> can have a <code>name</code>
127attribute containing text (<code>CDATA</code>) and which is optionnal
128(<code>#IMPLIED</code>). The attribute value can also be defined within a
129set:</p>
130
131<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
132"ordered"&gt;</code></p>
133
134<p>means <code>list</code> element have a <code>type</code> attribute with 3
135allowed values "bullets", "ordered" or "glossary" and which default to
136"ordered" if the attribute is not explicitely specified.</p>
137
138<p>The content type of an attribute can be text (<code>CDATA</code>),
139anchor/reference/references
140(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
141(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
142(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
143<code>chapter</code> element can have an optional <code>id</code> attribute of
144type <code>ID</code>, usable for reference from attribute of type IDREF:</p>
145
146<p><code>&lt;!ATTLIST chapter id ID #IMPLIED&gt;</code></p>
147
148<p>The last value of an attribute definition can be <code>#REQUIRED
149</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
150meaning that it is optional, or the default value (possibly prefixed by
151<code>#FIXED</code> if it is the only allowed).</p>
152
153<p>Notes:</p>
154<ul>
155  <li>usually the attributes pertaining to a given element are declared in a
156    single expression, but it is just a convention adopted by a lot of DTD
157    writers:
158    <pre>&lt;!ATTLIST termdef
159          id      ID      #REQUIRED
160          name    CDATA   #IMPLIED&gt;</pre>
161    <p>The previous construct defines both <code>id</code> and
162    <code>name</code> attributes for the element <code>termdef</code></p>
163  </li>
164</ul>
165
166<h2><a name="Some">Some examples</a></h2>
167
168<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
169contains some complex DTD examples. The  <code>test/valid/dia.xml</code>
170example shows an XML file where the simple DTD is directly included within the
171document.</p>
172
173<h2><a name="validate">How to validate</a></h2>
174
175<p>The simplest is to use the xmllint program comming with libxml. The
176<code>--valid</code> option turn on validation of the files given as input,
177for example the following validates a copy of the first revision of the XML
1781.0 specification:</p>
179
180<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
181
182<p>the -- noout is used to not output the resulting tree.</p>
183
184<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against
185a given DTD.</p>
186
187<p>Libxml exports an API to handle DTDs and validation, check the <a
188href="http://xmlsoft.org/html/gnome-xml-valid.html">associated
189description</a>.</p>
190
191<h2><a name="Other">Other resources</a></h2>
192
193<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
194will just list one for now, others pointers welcome:</p>
195<ul>
196  <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li>
197</ul>
198
199<p></p>
200
201<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p>
202
203<p>$Id$</p>
204</body>
205</html>
206