example.html revision 63d83142ffbff50f2c33c73415aa400ca920042c
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<link rel="SHORTCUT ICON" href="/favicon.ico">
6<style type="text/css"><!--
7TD {font-family: Verdana,Arial,Helvetica}
8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
9H1 {font-family: Verdana,Arial,Helvetica}
10H2 {font-family: Verdana,Arial,Helvetica}
11H3 {font-family: Verdana,Arial,Helvetica}
12A:link, A:visited, A:active { text-decoration: underline }
13--></style>
14<title>A real example</title>
15</head>
16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
18<td width="180">
19<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
20</td>
21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
22<h1>The XML C library for Gnome</h1>
23<h2>A real example</h2>
24</td></tr></table></td></tr></table></td>
25</tr></table>
26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
27<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
28<table width="100%" border="0" cellspacing="1" cellpadding="3">
29<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
30<tr><td bgcolor="#fffacd"><ul>
31<li><a href="index.html">Home</a></li>
32<li><a href="intro.html">Introduction</a></li>
33<li><a href="FAQ.html">FAQ</a></li>
34<li><a href="docs.html">Documentation</a></li>
35<li><a href="bugs.html">Reporting bugs and getting help</a></li>
36<li><a href="help.html">How to help</a></li>
37<li><a href="downloads.html">Downloads</a></li>
38<li><a href="news.html">News</a></li>
39<li><a href="XMLinfo.html">XML</a></li>
40<li><a href="XSLT.html">XSLT</a></li>
41<li><a href="python.html">Python and bindings</a></li>
42<li><a href="architecture.html">libxml architecture</a></li>
43<li><a href="tree.html">The tree output</a></li>
44<li><a href="interface.html">The SAX interface</a></li>
45<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
46<li><a href="xmlmem.html">Memory Management</a></li>
47<li><a href="encoding.html">Encodings support</a></li>
48<li><a href="xmlio.html">I/O Interfaces</a></li>
49<li><a href="catalog.html">Catalog support</a></li>
50<li><a href="library.html">The parser interfaces</a></li>
51<li><a href="entities.html">Entities or no entities</a></li>
52<li><a href="namespaces.html">Namespaces</a></li>
53<li><a href="upgrade.html">Upgrading 1.x code</a></li>
54<li><a href="threads.html">Thread safety</a></li>
55<li><a href="DOM.html">DOM Principles</a></li>
56<li><a href="example.html">A real example</a></li>
57<li><a href="contribs.html">Contributions</a></li>
58<li>
59<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
60</li>
61</ul></td></tr>
62</table>
63<table width="100%" border="0" cellspacing="1" cellpadding="3">
64<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
65<tr><td bgcolor="#fffacd"><ul>
66<li><a href="APIchunk0.html">Alphabetic</a></li>
67<li><a href="APIconstructors.html">Constructors</a></li>
68<li><a href="APIfunctions.html">Functions/Types</a></li>
69<li><a href="APIfiles.html">Modules</a></li>
70<li><a href="APIsymbols.html">Symbols</a></li>
71</ul></td></tr>
72</table>
73<table width="100%" border="0" cellspacing="1" cellpadding="3">
74<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
75<tr><td bgcolor="#fffacd"><ul>
76<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
77<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
78<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
79<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
80<li><a href="ftp://xmlsoft.org/">FTP</a></li>
81<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
82<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
83<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li>
84<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&product=libxml2">Bug Tracker</a></li>
85</ul></td></tr>
86</table>
87</td></tr></table></td>
88<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
89<p>Here is a real size example, where the actual content of the application
90data is not kept in the DOM tree but uses internal structures. It is based on
91a proposal to keep a database of jobs related to Gnome, with an XML based
92storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
93base</a>:</p>
94<pre>&lt;?xml version=&quot;1.0&quot;?&gt;
95&lt;gjob:Helping xmlns:gjob=&quot;http://www.gnome.org/some-location">;
96  &lt;gjob:Jobs&gt;
97
98    &lt;gjob:Job&gt;
99      &lt;gjob:Project ID=&quot;3&quot;/&gt;
100      &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
101      &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
102
103      &lt;gjob:Update&gt;
104        &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
105        &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
106        &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
107      &lt;/gjob:Update&gt;
108
109      &lt;gjob:Developers&gt;
110        &lt;gjob:Developer&gt;
111        &lt;/gjob:Developer&gt;
112      &lt;/gjob:Developers&gt;
113
114      &lt;gjob:Contact&gt;
115        &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
116        &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
117        &lt;gjob:Company&gt;
118        &lt;/gjob:Company&gt;
119        &lt;gjob:Organisation&gt;
120        &lt;/gjob:Organisation&gt;
121        &lt;gjob:Webpage&gt;
122        &lt;/gjob:Webpage&gt;
123        &lt;gjob:Snailmail&gt;
124        &lt;/gjob:Snailmail&gt;
125        &lt;gjob:Phone&gt;
126        &lt;/gjob:Phone&gt;
127      &lt;/gjob:Contact&gt;
128
129      &lt;gjob:Requirements&gt;
130      The program should be released as free software, under the GPL.
131      &lt;/gjob:Requirements&gt;
132
133      &lt;gjob:Skills&gt;
134      &lt;/gjob:Skills&gt;
135
136      &lt;gjob:Details&gt;
137      A GNOME based system that will allow a superuser to configure 
138      compressed and uncompressed files and/or file systems to be backed 
139      up with a supported media in the system.  This should be able to 
140      perform via find commands generating a list of files that are passed 
141      to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 
142      or via operations performed on the filesystem itself. Email 
143      notification and GUI status display very important.
144      &lt;/gjob:Details&gt;
145
146    &lt;/gjob:Job&gt;
147
148  &lt;/gjob:Jobs&gt;
149&lt;/gjob:Helping&gt;</pre>
150<p>While loading the XML file into an internal DOM tree is a matter of
151calling only a couple of functions, browsing the tree to gather the data and
152generate the internal structures is harder, and more error prone.</p>
153<p>The suggested principle is to be tolerant with respect to the input
154structure. For example, the ordering of the attributes is not significant,
155the XML specification is clear about it. It's also usually a good idea not to
156depend on the order of the children of a given node, unless it really makes
157things harder. Here is some code to parse the information for a person:</p>
158<pre>/*
159 * A person record
160 */
161typedef struct person {
162    char *name;
163    char *email;
164    char *company;
165    char *organisation;
166    char *smail;
167    char *webPage;
168    char *phone;
169} person, *personPtr;
170
171/*
172 * And the code needed to parse it
173 */
174personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
175    personPtr ret = NULL;
176
177DEBUG(&quot;parsePerson\n&quot;);
178    /*
179     * allocate the struct
180     */
181    ret = (personPtr) malloc(sizeof(person));
182    if (ret == NULL) {
183        fprintf(stderr,&quot;out of memory\n&quot;);
184        return(NULL);
185    }
186    memset(ret, 0, sizeof(person));
187
188    /* We don't care what the top level element name is */
189    cur = cur-&gt;xmlChildrenNode;
190    while (cur != NULL) {
191        if ((!strcmp(cur-&gt;name, &quot;Person&quot;)) &amp;&amp; (cur-&gt;ns == ns))
192            ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
193        if ((!strcmp(cur-&gt;name, &quot;Email&quot;)) &amp;&amp; (cur-&gt;ns == ns))
194            ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
195        cur = cur-&gt;next;
196    }
197
198    return(ret);
199}</pre>
200<p>Here are a couple of things to notice:</p>
201<ul>
202<li>Usually a recursive parsing style is the more convenient one: XML data
203    is by nature subject to repetitive constructs and usually exhibits highly
204    structured patterns.</li>
205<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
206    i.e. the pointer to the global XML document and the namespace reserved to
207    the application. Document wide information are needed for example to
208    decode entities and it's a good coding practice to define a namespace for
209    your application set of data and test that the element and attributes
210    you're analyzing actually pertains to your application space. This is
211    done by a simple equality test (cur-&gt;ns == ns).</li>
212<li>To retrieve text and attributes value, you can use the function
213    <em>xmlNodeListGetString</em> to gather all the text and entity reference
214    nodes generated by the DOM output and produce an single text string.</li>
215</ul>
216<p>Here is another piece of code used to parse another level of the
217structure:</p>
218<pre>#include &lt;libxml/tree.h&gt;
219/*
220 * a Description for a Job
221 */
222typedef struct job {
223    char *projectID;
224    char *application;
225    char *category;
226    personPtr contact;
227    int nbDevelopers;
228    personPtr developers[100]; /* using dynamic alloc is left as an exercise */
229} job, *jobPtr;
230
231/*
232 * And the code needed to parse it
233 */
234jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
235    jobPtr ret = NULL;
236
237DEBUG(&quot;parseJob\n&quot;);
238    /*
239     * allocate the struct
240     */
241    ret = (jobPtr) malloc(sizeof(job));
242    if (ret == NULL) {
243        fprintf(stderr,&quot;out of memory\n&quot;);
244        return(NULL);
245    }
246    memset(ret, 0, sizeof(job));
247
248    /* We don't care what the top level element name is */
249    cur = cur-&gt;xmlChildrenNode;
250    while (cur != NULL) {
251        
252        if ((!strcmp(cur-&gt;name, &quot;Project&quot;)) &amp;&amp; (cur-&gt;ns == ns)) {
253            ret-&gt;projectID = xmlGetProp(cur, &quot;ID&quot;);
254            if (ret-&gt;projectID == NULL) {
255                fprintf(stderr, &quot;Project has no ID\n&quot;);
256            }
257        }
258        if ((!strcmp(cur-&gt;name, &quot;Application&quot;)) &amp;&amp; (cur-&gt;ns == ns))
259            ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
260        if ((!strcmp(cur-&gt;name, &quot;Category&quot;)) &amp;&amp; (cur-&gt;ns == ns))
261            ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
262        if ((!strcmp(cur-&gt;name, &quot;Contact&quot;)) &amp;&amp; (cur-&gt;ns == ns))
263            ret-&gt;contact = parsePerson(doc, ns, cur);
264        cur = cur-&gt;next;
265    }
266
267    return(ret);
268}</pre>
269<p>Once you are used to it, writing this kind of code is quite simple, but
270boring. Ultimately, it could be possible to write stubbers taking either C
271data structure definitions, a set of XML examples or an XML DTD and produce
272the code needed to import and export the content between C data and XML
273storage. This is left as an exercise to the reader :-)</p>
274<p>Feel free to use <a href="example/gjobread.c">the code for the full C
275parsing example</a> as a template, it is also available with Makefile in the
276Gnome CVS base under gnome-xml/example</p>
277<p><a href="bugs.html">Daniel Veillard</a></p>
278</td></tr></table></td></tr></table></td></tr></table></td>
279</tr></table></td></tr></table>
280</body>
281</html>
282