example.html revision 63d83142ffbff50f2c33c73415aa400ca920042c
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> 2<html> 3<head> 4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> 5<link rel="SHORTCUT ICON" href="/favicon.ico"> 6<style type="text/css"><!-- 7TD {font-family: Verdana,Arial,Helvetica} 8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 9H1 {font-family: Verdana,Arial,Helvetica} 10H2 {font-family: Verdana,Arial,Helvetica} 11H3 {font-family: Verdana,Arial,Helvetica} 12A:link, A:visited, A:active { text-decoration: underline } 13--></style> 14<title>A real example</title> 15</head> 16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> 17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> 18<td width="180"> 19<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a> 20</td> 21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> 22<h1>The XML C library for Gnome</h1> 23<h2>A real example</h2> 24</td></tr></table></td></tr></table></td> 25</tr></table> 26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> 27<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> 28<table width="100%" border="0" cellspacing="1" cellpadding="3"> 29<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> 30<tr><td bgcolor="#fffacd"><ul> 31<li><a href="index.html">Home</a></li> 32<li><a href="intro.html">Introduction</a></li> 33<li><a href="FAQ.html">FAQ</a></li> 34<li><a href="docs.html">Documentation</a></li> 35<li><a href="bugs.html">Reporting bugs and getting help</a></li> 36<li><a href="help.html">How to help</a></li> 37<li><a href="downloads.html">Downloads</a></li> 38<li><a href="news.html">News</a></li> 39<li><a href="XMLinfo.html">XML</a></li> 40<li><a href="XSLT.html">XSLT</a></li> 41<li><a href="python.html">Python and bindings</a></li> 42<li><a href="architecture.html">libxml architecture</a></li> 43<li><a href="tree.html">The tree output</a></li> 44<li><a href="interface.html">The SAX interface</a></li> 45<li><a href="xmldtd.html">Validation & DTDs</a></li> 46<li><a href="xmlmem.html">Memory Management</a></li> 47<li><a href="encoding.html">Encodings support</a></li> 48<li><a href="xmlio.html">I/O Interfaces</a></li> 49<li><a href="catalog.html">Catalog support</a></li> 50<li><a href="library.html">The parser interfaces</a></li> 51<li><a href="entities.html">Entities or no entities</a></li> 52<li><a href="namespaces.html">Namespaces</a></li> 53<li><a href="upgrade.html">Upgrading 1.x code</a></li> 54<li><a href="threads.html">Thread safety</a></li> 55<li><a href="DOM.html">DOM Principles</a></li> 56<li><a href="example.html">A real example</a></li> 57<li><a href="contribs.html">Contributions</a></li> 58<li> 59<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> 60</li> 61</ul></td></tr> 62</table> 63<table width="100%" border="0" cellspacing="1" cellpadding="3"> 64<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr> 65<tr><td bgcolor="#fffacd"><ul> 66<li><a href="APIchunk0.html">Alphabetic</a></li> 67<li><a href="APIconstructors.html">Constructors</a></li> 68<li><a href="APIfunctions.html">Functions/Types</a></li> 69<li><a href="APIfiles.html">Modules</a></li> 70<li><a href="APIsymbols.html">Symbols</a></li> 71</ul></td></tr> 72</table> 73<table width="100%" border="0" cellspacing="1" cellpadding="3"> 74<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> 75<tr><td bgcolor="#fffacd"><ul> 76<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> 77<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> 78<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li> 79<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li> 80<li><a href="ftp://xmlsoft.org/">FTP</a></li> 81<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> 82<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> 83<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li> 84<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&product=libxml2">Bug Tracker</a></li> 85</ul></td></tr> 86</table> 87</td></tr></table></td> 88<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> 89<p>Here is a real size example, where the actual content of the application 90data is not kept in the DOM tree but uses internal structures. It is based on 91a proposal to keep a database of jobs related to Gnome, with an XML based 92storage structure. Here is an <a href="gjobs.xml">XML encoded jobs 93base</a>:</p> 94<pre><?xml version="1.0"?> 95<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location"> 96 <gjob:Jobs> 97 98 <gjob:Job> 99 <gjob:Project ID="3"/> 100 <gjob:Application>GBackup</gjob:Application> 101 <gjob:Category>Development</gjob:Category> 102 103 <gjob:Update> 104 <gjob:Status>Open</gjob:Status> 105 <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified> 106 <gjob:Salary>USD 0.00</gjob:Salary> 107 </gjob:Update> 108 109 <gjob:Developers> 110 <gjob:Developer> 111 </gjob:Developer> 112 </gjob:Developers> 113 114 <gjob:Contact> 115 <gjob:Person>Nathan Clemons</gjob:Person> 116 <gjob:Email>nathan@windsofstorm.net</gjob:Email> 117 <gjob:Company> 118 </gjob:Company> 119 <gjob:Organisation> 120 </gjob:Organisation> 121 <gjob:Webpage> 122 </gjob:Webpage> 123 <gjob:Snailmail> 124 </gjob:Snailmail> 125 <gjob:Phone> 126 </gjob:Phone> 127 </gjob:Contact> 128 129 <gjob:Requirements> 130 The program should be released as free software, under the GPL. 131 </gjob:Requirements> 132 133 <gjob:Skills> 134 </gjob:Skills> 135 136 <gjob:Details> 137 A GNOME based system that will allow a superuser to configure 138 compressed and uncompressed files and/or file systems to be backed 139 up with a supported media in the system. This should be able to 140 perform via find commands generating a list of files that are passed 141 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 142 or via operations performed on the filesystem itself. Email 143 notification and GUI status display very important. 144 </gjob:Details> 145 146 </gjob:Job> 147 148 </gjob:Jobs> 149</gjob:Helping></pre> 150<p>While loading the XML file into an internal DOM tree is a matter of 151calling only a couple of functions, browsing the tree to gather the data and 152generate the internal structures is harder, and more error prone.</p> 153<p>The suggested principle is to be tolerant with respect to the input 154structure. For example, the ordering of the attributes is not significant, 155the XML specification is clear about it. It's also usually a good idea not to 156depend on the order of the children of a given node, unless it really makes 157things harder. Here is some code to parse the information for a person:</p> 158<pre>/* 159 * A person record 160 */ 161typedef struct person { 162 char *name; 163 char *email; 164 char *company; 165 char *organisation; 166 char *smail; 167 char *webPage; 168 char *phone; 169} person, *personPtr; 170 171/* 172 * And the code needed to parse it 173 */ 174personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 175 personPtr ret = NULL; 176 177DEBUG("parsePerson\n"); 178 /* 179 * allocate the struct 180 */ 181 ret = (personPtr) malloc(sizeof(person)); 182 if (ret == NULL) { 183 fprintf(stderr,"out of memory\n"); 184 return(NULL); 185 } 186 memset(ret, 0, sizeof(person)); 187 188 /* We don't care what the top level element name is */ 189 cur = cur->xmlChildrenNode; 190 while (cur != NULL) { 191 if ((!strcmp(cur->name, "Person")) && (cur->ns == ns)) 192 ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 193 if ((!strcmp(cur->name, "Email")) && (cur->ns == ns)) 194 ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 195 cur = cur->next; 196 } 197 198 return(ret); 199}</pre> 200<p>Here are a couple of things to notice:</p> 201<ul> 202<li>Usually a recursive parsing style is the more convenient one: XML data 203 is by nature subject to repetitive constructs and usually exhibits highly 204 structured patterns.</li> 205<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, 206 i.e. the pointer to the global XML document and the namespace reserved to 207 the application. Document wide information are needed for example to 208 decode entities and it's a good coding practice to define a namespace for 209 your application set of data and test that the element and attributes 210 you're analyzing actually pertains to your application space. This is 211 done by a simple equality test (cur->ns == ns).</li> 212<li>To retrieve text and attributes value, you can use the function 213 <em>xmlNodeListGetString</em> to gather all the text and entity reference 214 nodes generated by the DOM output and produce an single text string.</li> 215</ul> 216<p>Here is another piece of code used to parse another level of the 217structure:</p> 218<pre>#include <libxml/tree.h> 219/* 220 * a Description for a Job 221 */ 222typedef struct job { 223 char *projectID; 224 char *application; 225 char *category; 226 personPtr contact; 227 int nbDevelopers; 228 personPtr developers[100]; /* using dynamic alloc is left as an exercise */ 229} job, *jobPtr; 230 231/* 232 * And the code needed to parse it 233 */ 234jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 235 jobPtr ret = NULL; 236 237DEBUG("parseJob\n"); 238 /* 239 * allocate the struct 240 */ 241 ret = (jobPtr) malloc(sizeof(job)); 242 if (ret == NULL) { 243 fprintf(stderr,"out of memory\n"); 244 return(NULL); 245 } 246 memset(ret, 0, sizeof(job)); 247 248 /* We don't care what the top level element name is */ 249 cur = cur->xmlChildrenNode; 250 while (cur != NULL) { 251 252 if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) { 253 ret->projectID = xmlGetProp(cur, "ID"); 254 if (ret->projectID == NULL) { 255 fprintf(stderr, "Project has no ID\n"); 256 } 257 } 258 if ((!strcmp(cur->name, "Application")) && (cur->ns == ns)) 259 ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 260 if ((!strcmp(cur->name, "Category")) && (cur->ns == ns)) 261 ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 262 if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns)) 263 ret->contact = parsePerson(doc, ns, cur); 264 cur = cur->next; 265 } 266 267 return(ret); 268}</pre> 269<p>Once you are used to it, writing this kind of code is quite simple, but 270boring. Ultimately, it could be possible to write stubbers taking either C 271data structure definitions, a set of XML examples or an XML DTD and produce 272the code needed to import and export the content between C data and XML 273storage. This is left as an exercise to the reader :-)</p> 274<p>Feel free to use <a href="example/gjobread.c">the code for the full C 275parsing example</a> as a template, it is also available with Makefile in the 276Gnome CVS base under gnome-xml/example</p> 277<p><a href="bugs.html">Daniel Veillard</a></p> 278</td></tr></table></td></tr></table></td></tr></table></td> 279</tr></table></td></tr></table> 280</body> 281</html> 282