xml.html revision a2679fa4215ba10722c89ec71d5a395e81ec66fa
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 2 "http://www.w3.org/TR/REC-html40/loose.dtd"> 3<html> 4<head> 5 <title>The XML library for Gnome</title> 6 <meta name="GENERATOR" content="amaya V3.2"> 7 <meta http-equiv="Content-Type" content="text/html"> 8</head> 9 10<body bgcolor="#ffffff"> 11<p><a href="http://www.gnome.org/"><img src="smallfootonly.gif" 12alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" 13alt="W3C Logo"></a></p> 14 15<h1 align="center">The XML library for Gnome</h1> 16 17<h2 style="text-align: center">libxml, a.k.a. gnome-xml</h2> 18 19<p></p> 20<ul> 21 <li><a href="#Introducti">Introduction</a></li> 22 <li><a href="#Documentat">Documentation</a></li> 23 <li><a href="#Downloads">Downloads</a></li> 24 <li><a href="#News">News</a></li> 25 <li><a href="#XML">XML</a></li> 26 <li><a href="#tree">The tree output</a></li> 27 <li><a href="#interface">The SAX interface</a></li> 28 <li><a href="#library">The XML library interfaces</a> 29 <ul> 30 <li><a href="#Invoking">Invoking the parser: the pull way</a></li> 31 <li><a href="#Invoking">Invoking the parser: the push way</a></li> 32 <li><a href="#Invoking2">Invoking the parser: the SAX interface</a></li> 33 <li><a href="#Building">Building a tree from scratch</a></li> 34 <li><a href="#Traversing">Traversing the tree</a></li> 35 <li><a href="#Modifying">Modifying the tree</a></li> 36 <li><a href="#Saving">Saving the tree</a></li> 37 <li><a href="#Compressio">Compression</a></li> 38 </ul> 39 </li> 40 <li><a href="#Entities">Entities or no entities</a></li> 41 <li><a href="#Namespaces">Namespaces</a></li> 42 <li><a href="#Validation">Validation</a></li> 43 <li><a href="#Principles">DOM principles</a></li> 44 <li><a href="#real">A real example</a></li> 45 <li><a href="#Contributi">Contribution</a></li> 46</ul> 47 48<h2><a name="Introducti">Introduction</a></h2> 49 50<p>This document describes libxml, the <a 51href="http://www.w3.org/XML/">XML</a> library provided in the <a 52href="http://www.gnome.org/">Gnome</a> framework. XML is a standard for 53building tag-based structured documents/data.</p> 54 55<p>Here are some key points about libxml:</p> 56<ul> 57 <li>It is written in plain C, making as few assumptions as possible and 58 sticking closely to ANSI C for easy embedding.</li> 59 <li>The internal document repesentation is as close as possible to the <a 60 href="http://www.w3.org/DOM/">DOM</a> interfaces.</li> 61 <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX 62 like interface</a>; the interface is designed to be compatible with <a 63 href="http://www.jclark.com/xml/expat.html">Expat</a>.</li> 64 <li>Libxml now includes a nearly complete <a 65 href="http://www.w3.org/TR/xpath">XPath</a> implementation.</li> 66 <li>Libxml exports Push and Pull type parser interfaces for both XML and 67 HTML.</li> 68 <li>This library is released both under the <a 69 href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C 70 IPR</a> and the GNU LGPL. Use either at your convenience, basically this 71 should make everybody happy, if not, drop me a mail.</li> 72 <li>There is <a href="upgrade.html">a first set of instructions</a> 73 concerning upgrade from libxml-1.x to libxml-2.x</li> 74</ul> 75 76<h2><a name="Documentat">Documentation</a></h2> 77 78<p>There are some on-line resources about using libxml:</p> 79<ol> 80 <li>Check the <a href="FAQ.html">FAQ</a></li> 81 <li>Check the <a href="http://xmlsoft.org/html/libxml-lib.html">extensive 82 documentation</a> automatically extracted from code comments.</li> 83 <li>Look at the documentation about <a href="encoding.html">libxml 84 internationalization support</a></li> 85 <li>This page provides a global overview and <a href="#real">some 86 examples</a> on how to use libxml.</li> 87 <li><a href="mailto:james@daa.com.au">James Henstridge</a> wrote <a 88 href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">some nice 89 documentation</a> explaining how to use the libxml SAX interface.</li> 90 <li>George Lebl wrote <a 91 href="http://www-4.ibm.com/software/developer/library/gnome3/">an article 92 for IBM developerWorks</a> about using libxml.</li> 93 <li>It is also a good idea to check to <a href="mailto:raph@levien.com">Raph 94 Levien</a> <a href="http://levien.com/gnome/">web site</a> since he is 95 building the <a href="http://levien.com/gnome/gdome.html">DOM interface 96 gdome</a> on top of libxml result tree and an implementation of <a 97 href="http://www.w3.org/Graphics/SVG/">SVG</a> called <a 98 href="http://www.levien.com/svg/">gill</a>. Check his <a 99 href="http://www.levien.com/gnome/domination.html">DOMination 100 paper</a>.</li> 101 <li>Check <a href="http://cvs.gnome.org/lxr/source/gnome-xml/TODO">the TODO 102 file</a></li> 103 <li>Read the <a href="upgrade.html">1.x to 2.x upgrade path</a>. If you are 104 starting a new project using libxml you should really use the 2.x 105 version.</li> 106 <li>And don't forget to look at the <a href="/messages/">mailing-list 107 archive</a>.</li> 108</ol> 109 110<h3>Reporting bugs and getting help</h3> 111 112<p>Well, bugs or missing features are always possible, and I will make a point 113of fixing them in a timely fashion. The best way to report a bug is to <a 114href="http://bugs.gnome.org/db/pa/lgnome-xml.html">use the Gnome bug tracking 115database</a>. I look at reports there regularly and it's good to have a 116reminder when a bug is still open. Check the <a 117href="http://bugs.gnome.org/Reporting.html">instructions on reporting bugs</a> 118and be sure to specify that the bug is for the package gnome-xml.</p> 119 120<p>There is also a mailing-list <a 121href="mailto:xml@rufus.w3.org">xml@rufus.w3.org</a> for libxml, with an <a 122href="http://xmlsoft.org/messages">on-line archive</a>. To subscribe to this 123majordomo based list, send a mail message to <a 124href="mailto:majordomo@rufus.w3.org">majordomo@rufus.w3.org</a> with 125"subscribe xml" in the <strong>content</strong> of the message.</p> 126 127<p>Alternatively, you can just send the bug to the <a 128href="mailto:xml@rufus.w3.org">xml@rufus.w3.org</a> list.</p> 129 130<p>Of course, bugs reports with a suggested patch for fixing them will 131probably be processed faster.</p> 132 133<p>If you're looking for help, a quick look at <a 134href="http://xmlsoft.org/messages/#407">the list archive</a> may actually 135provide the answer, I usually send source samples when answering libxml usage 136questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated 137documentantion</a> is not as polished as I would like (i need to learn more 138about Docbook), but it's a good starting point.</p> 139 140<h2><a name="Downloads">Downloads</a></h2> 141 142<p>The latest versions of libxml can be found on <a 143href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or on the <a 144href="ftp://ftp.gnome.org/pub/GNOME/MIRRORS.html">Gnome FTP server</a> either 145as a <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">source 146archive</a> or <a 147href="ftp://ftp.gnome.org/pub/GNOME/contrib/redhat/SRPMS/">RPM packages</a>. 148(NOTE that you need both the <a 149href="http://rpmfind.net/linux/RPM/libxml2.html">libxml(2)</a> and <a 150href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml(2)-devel</a> 151packages installed to compile applications using libxml.)</p> 152 153<p><a name="Snapshot">Snapshot:</a></p> 154<ul> 155 <li>Code from the W3C cvs base libxml <a 156 href="ftp://rpmfind.net/pub/libxml/cvs-snapshot.tar.gz">cvs-snapshot.tar.gz</a></li> 157 <li>Docs, content of the web site, the list archive included <a 158 href="ftp://rpmfind.net/pub/libxml/libxml-docs.tar.gz">libxml-docs.tar.gz</a></li> 159</ul> 160 161<p><a name="Contribs">Contribs:</a></p> 162 163<p>I do accept external contributions, especially if compiling on another 164platform, get in touch with me to upload the package. I will keep them in the 165<a href="ftp://rpmfind.net/pub/libxml/contribs/">contrib directory</a></p> 166 167<p>Libxml is also available from 2 CVS bases:</p> 168<ul> 169 <li><p>The <a href="http://dev.w3.org/cvsweb/XML/">W3C CVS base</a>, 170 available read-only using the CVS pserver authentification (I tend to use 171 this base for my own development, so it's updated more regularly, but the 172 content may not be as stable):</p> 173 <pre>CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public 174 password: anonymous 175 module: XML</pre> 176 </li> 177 <li><p>The <a 178 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gnome-xml">Gnome 179 CVS base</a>. Check the <a 180 href="http://developer.gnome.org/tools/cvs.html">Gnome CVS Tools</a> page; 181 the CVS module is <b>gnome-xml</b>.</p> 182 </li> 183</ul> 184 185<h2><a name="News">News</a></h2> 186 187<h3>CVS only : check the <a 188href="http://cvs.gnome.org/lxr/source/gnome-xml/ChangeLog">Changelog</a> file 189for really accurate description</h3> 190<ul> 191 <li>working on HTML and XML links recognition layers, get in touch with me 192 if you want to test those.</li> 193</ul> 194 195<h2>2.2.1: July 21 2000</h2> 196<ul> 197 <li>a purely bug fixes release</li> 198 <li>fixed an encoding support problem when parsing from a memory block</li> 199 <li>fixed a DOCTYPE parsing problem</li> 200 <li>removed a bug in the function allowing to override the memory allocation 201 routines</li> 202</ul> 203 204<h2>2.2.0: July 14 2000</h2> 205<ul> 206 <li>applied a lot of portability fixes</li> 207 <li>better encoding support/cleanup and saving (content is now always 208 encoded in UTF-8)</li> 209 <li>the HTML parser now correctly handles encodings</li> 210 <li>added xmlHasProp()</li> 211 <li>fixed a serious problem with &#38;</li> 212 <li>propagated the fix to FTP client</li> 213 <li>cleanup, bugfixes, etc ...</li> 214 <li>Added a page about <a href="encoding.html">libxml Internationalization 215 support</a></li> 216</ul> 217 218<h3>1.8.9: July 9 2000</h3> 219<ul> 220 <li>fixed the spec the RPMs should be better</li> 221 <li>fixed a serious bug in the FTP implementation, released 1.8.9 to solve 222 rpmfind users problem</li> 223</ul> 224 225<h3>2.1.1: July 1 2000</h3> 226<ul> 227 <li>fixes a couple of bugs in the 2.1.0 packaging</li> 228 <li>improvements on the HTML parser</li> 229</ul> 230 231<h3>2.1.0 and 1.8.8: June 29 2000</h3> 232<ul> 233 <li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to 234 <a href="upgrade.html">new instructions</a>. It fixes a nasty problem 235 about &#38; charref parsing</li> 236 <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it 237 also contains numerous fixes and enhancements: 238 <ul> 239 <li>added xmlStopParser() to stop parsing</li> 240 <li>improved a lot parsing speed when there is large CDATA blocs</li> 241 <li>includes XPath patches provided by Picdar Technology</li> 242 <li>tried to fix as much as possible DtD validation and namespace 243 related problems</li> 244 <li>output to a given encoding has been added/tested</li> 245 <li>lot of various fixes</li> 246 </ul> 247 </li> 248</ul> 249 250<h3>2.0.0: Apr 12 2000</h3> 251<ul> 252 <li>First public release of libxml2. If you are using libxml, it's a good 253 idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally 254 scheduled for Apr 3 the relase occured only on Apr 12 due to massive 255 workload.</li> 256 <li>The include are now located under $prefix/include/libxml (instead of 257 $prefix/include/gnome-xml), they also are referenced by 258 <pre>#include <libxml/xxx.h></pre> 259 <p>instead of</p> 260 <pre>#include "xxx.h"</pre> 261 </li> 262 <li>a new URI module for parsing URIs and following strictly RFC 2396</li> 263 <li>the memory allocation routines used by libxml can now be overloaded 264 dynamically by using xmlMemSetup()</li> 265 <li>The previously CVS only tool tester has been renamed 266 <strong>xmllint</strong> and is now installed as part of the libxml2 267 package</li> 268 <li>The I/O interface has been revamped. There is now ways to plug in 269 specific I/O modules, either at the URI scheme detection level using 270 xmlRegisterInputCallbacks() or by passing I/O functions when creating a 271 parser context using xmlCreateIOParserCtxt()</li> 272 <li>there is a C preprocessor macro LIBXML_VERSION providing the version 273 number of the libxml module in use</li> 274 <li>a number of optional features of libxml can now be excluded at configure 275 time (FTP/HTTP/HTML/XPath/Debug)</li> 276</ul> 277 278<h3>2.0.0beta: Mar 14 2000</h3> 279<ul> 280 <li>This is a first Beta release of libxml version 2</li> 281 <li>It's available only from<a href="ftp://rpmfind.net/pub/libxml/"> 282 rpmfind.net FTP</a>, it's packaged as libxml2-2.0.0beta and available as 283 tar and RPMs</li> 284 <li>This version is now the head in the Gnome CVS base, the old one is 285 available under the tag LIB_XML_1_X</li> 286 <li>This includes a very large set of changes. Froma programmatic point of 287 view applications should not have to be modified too much, check the <a 288 href="upgrade.html">upgrade page</a></li> 289 <li>Some interfaces may changes (especially a bit about encoding).</li> 290 <li>the updates includes: 291 <ul> 292 <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly 293 handled now</li> 294 <li>Better handling of entities, especially well formedness checking and 295 proper PEref extensions in external subsets</li> 296 <li>DTD conditional sections</li> 297 <li>Validation now correcly handle entities content</li> 298 <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change 299 structures to accomodate DOM</a></li> 300 </ul> 301 </li> 302 <li>Serious progress were made toward compliance, <a 303 href="conf/result.html">here are the result of the test</a> against the 304 OASIS testsuite (except the japanese tests since I don't support that 305 encoding yet). This URL is rebuilt every couple of hours using the CVS 306 head version.</li> 307</ul> 308 309<h3>1.8.7: Mar 6 2000</h3> 310<ul> 311 <li>This is a bug fix release:</li> 312 <li>It is possible to disable the ignorable blanks heuristic used by 313 libxml-1.x, a new function xmlKeepBlanksDefault(0) will allow this. Note 314 that for adherence to XML spec, this behaviour will be disabled by default 315 in 2.x . The same function will allow to keep compatibility for old 316 code.</li> 317 <li>Blanks in <a> </a> constructs are not ignored anymore, 318 avoiding heuristic is really the Right Way :-\</li> 319 <li>The unchecked use of snprintf which was breaking libxml-1.8.6 320 compilation on some platforms has been fixed</li> 321 <li>nanoftp.c nanohttp.c: Fixed '#' and '?' stripping when processing 322 URIs</li> 323</ul> 324 325<h3>1.8.6: Jan 31 2000</h3> 326<ul> 327 <li>added a nanoFTP transport module, debugged until the new version of <a 328 href="http://rpmfind.net/linux/rpm2html/rpmfind.html">rpmfind</a> can use 329 it without troubles</li> 330</ul> 331 332<h3>1.8.5: Jan 21 2000</h3> 333<ul> 334 <li>adding APIs to parse a well balanced chunk of XML (production <a 335 href="http://www.w3.org/TR/REC-xml#NT-content">[43] content</a> of the XML 336 spec)</li> 337 <li>fixed a hideous bug in xmlGetProp pointed by Rune.Djurhuus@fast.no</li> 338 <li>Jody Goldberg <jgoldberg@home.com> provided another patch trying 339 to solve the zlib checks problems</li> 340 <li>The current state in gnome CVS base is expected to ship as 1.8.5 with 341 gnumeric soon</li> 342</ul> 343 344<h3>1.8.4: Jan 13 2000</h3> 345<ul> 346 <li>bug fixes, reintroduced xmlNewGlobalNs(), fixed xmlNewNs()</li> 347 <li>all exit() call should have been removed from libxml</li> 348 <li>fixed a problem with INCLUDE_WINSOCK on WIN32 platform</li> 349 <li>added newDocFragment()</li> 350</ul> 351 352<h3>1.8.3: Jan 5 2000</h3> 353<ul> 354 <li>a Push interface for the XML and HTML parsers</li> 355 <li>a shell-like interface to the document tree (try tester --shell :-)</li> 356 <li>lots of bug fixes and improvement added over XMas hollidays</li> 357 <li>fixed the DTD parsing code to work with the xhtml DTD</li> 358 <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li> 359 <li>Fixed bugs in xmlNewNs()</li> 360 <li>External entity loading code has been revamped, now it uses 361 xmlLoadExternalEntity(), some fix on entities processing were added</li> 362 <li>cleaned up WIN32 includes of socket stuff</li> 363</ul> 364 365<h3>1.8.2: Dec 21 1999</h3> 366<ul> 367 <li>I got another problem with includes and C++, I hope this issue is fixed 368 for good this time</li> 369 <li>Added a few tree modification functions: xmlReplaceNode, 370 xmlAddPrevSibling, xmlAddNextSibling, xmlNodeSetName and 371 xmlDocSetRootElement</li> 372 <li>Tried to improve the HTML output with help from <a 373 href="mailto:clahey@umich.edu">Chris Lahey</a></li> 374</ul> 375 376<h3>1.8.1: Dec 18 1999</h3> 377<ul> 378 <li>various patches to avoid troubles when using libxml with C++ compilers 379 the "namespace" keyword and C escaping in include files</li> 380 <li>a problem in one of the core macros IS_CHAR was corrected</li> 381 <li>fixed a bug introduced in 1.8.0 breaking default namespace processing, 382 and more specifically the Dia application</li> 383 <li>fixed a posteriori validation (validation after parsing, or by using a 384 Dtd not specified in the original document)</li> 385 <li>fixed a bug in</li> 386</ul> 387 388<h3>1.8.0: Dec 12 1999</h3> 389<ul> 390 <li>cleanup, especially memory wise</li> 391 <li>the parser should be more reliable, especially the HTML one, it should 392 not crash, whatever the input !</li> 393 <li>Integrated various patches, especially a speedup improvement for large 394 dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>, 395 configure with --with-buffers to enable them.</li> 396 <li>attribute normalization, oops should have been added long ago !</li> 397 <li>attributes defaulted from Dtds should be available, xmlSetProp() now 398 does entities escapting by default.</li> 399</ul> 400 401<h3>1.7.4: Oct 25 1999</h3> 402<ul> 403 <li>Lots of HTML improvement</li> 404 <li>Fixed some errors when saving both XML and HTML</li> 405 <li>More examples, the regression tests should now look clean</li> 406 <li>Fixed a bug with contiguous charref</li> 407</ul> 408 409<h3>1.7.3: Sep 29 1999</h3> 410<ul> 411 <li>portability problems fixed</li> 412 <li>snprintf was used unconditionnally, leading to link problems on system 413 were it's not available, fixed</li> 414</ul> 415 416<h3>1.7.1: Sep 24 1999</h3> 417<ul> 418 <li>The basic type for strings manipulated by libxml has been renamed in 419 1.7.1 from <strong>CHAR</strong> to <strong>xmlChar</strong>. The reason 420 is that CHAR was conflicting with a predefined type on Windows. However on 421 non WIN32 environment, compatibility is provided by the way of a 422 <strong>#define </strong>.</li> 423 <li>Changed another error : the use of a structure field called errno, and 424 leading to troubles on platforms where it's a macro</li> 425</ul> 426 427<h3>1.7.0: sep 23 1999</h3> 428<ul> 429 <li>Added the ability to fetch remote DTD or parsed entities, see the <a 430 href="html/gnome-xml-nanohttp.html">nanohttp</a> module.</li> 431 <li>Added an errno to report errors by another mean than a simple printf 432 like callback</li> 433 <li>Finished ID/IDREF support and checking when validation</li> 434 <li>Serious memory leaks fixed (there is now a <a 435 href="html/gnome-xml-xmlmemory.html">memory wrapper</a> module)</li> 436 <li>Improvement of <a href="http://www.w3.org/TR/xpath">XPath</a> 437 implementation</li> 438 <li>Added an HTML parser front-end</li> 439</ul> 440 441<h2><a name="XML">XML</a></h2> 442 443<p><a href="http://www.w3.org/TR/REC-xml">XML is a standard</a> for 444markup-based structured documents. Here is <a name="example">an example XML 445document</a>:</p> 446<pre><?xml version="1.0"?> 447<EXAMPLE prop1="gnome is great" prop2="&amp; linux too"> 448 <head> 449 <title>Welcome to Gnome</title> 450 </head> 451 <chapter> 452 <title>The Linux adventure</title> 453 <p>bla bla bla ...</p> 454 <image href="linus.gif"/> 455 <p>...</p> 456 </chapter> 457</EXAMPLE></pre> 458 459<p>The first line specifies that it's an XML document and gives useful 460information about its encoding. Then the document is a text format whose 461structure is specified by tags between brackets. <strong>Each tag opened has 462to be closed</strong>. XML is pedantic about this. However, if a tag is empty 463(no content), a single tag can serve as both the opening and closing tag if it 464ends with <code>/></code> rather than with <code>></code>. Note that, 465for example, the image tag has no content (just an attribute) and is closed by 466ending the tag with <code>/></code>.</p> 467 468<p>XML can be applied sucessfully to a wide range of uses, from long term 469structured document maintenance (where it follows the steps of SGML) to simple 470data encoding mechanisms like configuration file formatting (glade), 471spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where 472it is used to encode remote calls between a client and a server.</p> 473 474<h2><a name="tree">The tree output</a></h2> 475 476<p>The parser returns a tree built during the document analysis. The value 477returned is an <strong>xmlDocPtr</strong> (i.e., a pointer to an 478<strong>xmlDoc</strong> structure). This structure contains information such 479as the file name, the document type, and a <strong>children</strong> pointer 480which is the root of the document (or more exactly the first child under the 481root which is the document). The tree is made of <strong>xmlNode</strong>s, 482chained in double-linked lists of siblings and with children<->parent 483relationship. An xmlNode can also carry properties (a chain of xmlAttr 484structures). An attribute may have a value which is a list of TEXT or 485ENTITY_REF nodes.</p> 486 487<p>Here is an example (erroneous with respect to the XML spec since there 488should be only one ELEMENT under the root):</p> 489 490<p><img src="structure.gif" alt=" structure.gif "></p> 491 492<p>In the source package there is a small program (not installed by default) 493called <strong>xmllint</strong> which parses XML files given as argument and 494prints them back as parsed. This is useful for detecting errors both in XML 495code and in the XML parser itself. It has an option <strong>--debug</strong> 496which prints the actual in-memory structure of the document, here is the 497result with the <a href="#example">example</a> given before:</p> 498<pre>DOCUMENT 499version=1.0 500standalone=true 501 ELEMENT EXAMPLE 502 ATTRIBUTE prop1 503 TEXT 504 content=gnome is great 505 ATTRIBUTE prop2 506 ENTITY_REF 507 TEXT 508 content= linux too 509 ELEMENT head 510 ELEMENT title 511 TEXT 512 content=Welcome to Gnome 513 ELEMENT chapter 514 ELEMENT title 515 TEXT 516 content=The Linux adventure 517 ELEMENT p 518 TEXT 519 content=bla bla bla ... 520 ELEMENT image 521 ATTRIBUTE href 522 TEXT 523 content=linus.gif 524 ELEMENT p 525 TEXT 526 content=...</pre> 527 528<p>This should be useful for learning the internal representation model.</p> 529 530<h2><a name="interface">The SAX interface</a></h2> 531 532<p>Sometimes the DOM tree output is just too large to fit reasonably into 533memory. In that case (and if you don't expect to save back the XML document 534loaded using libxml), it's better to use the SAX interface of libxml. SAX is a 535<strong>callback-based interface</strong> to the parser. Before parsing, the 536application layer registers a customized set of callbacks which are called by 537the library as it progresses through the XML input.</p> 538 539<p>To get more detailed step-by-step guidance on using the SAX interface of 540libxml, see the 541href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">nice 542documentation.written by <a href="mailto:james@daa.com.au">James 543Henstridge</a>.</p> 544 545<p>You can debug the SAX behaviour by using the <strong>testSAX</strong> 546program located in the gnome-xml module (it's usually not shipped in the 547binary packages of libxml, but you can find it in the tar source 548distribution). Here is the sequence of callbacks that would be reported by 549testSAX when parsing the example XML document shown earlier:</p> 550<pre>SAX.setDocumentLocator() 551SAX.startDocument() 552SAX.getEntity(amp) 553SAX.startElement(EXAMPLE, prop1='gnome is great', prop2='&amp; linux too') 554SAX.characters( , 3) 555SAX.startElement(head) 556SAX.characters( , 4) 557SAX.startElement(title) 558SAX.characters(Welcome to Gnome, 16) 559SAX.endElement(title) 560SAX.characters( , 3) 561SAX.endElement(head) 562SAX.characters( , 3) 563SAX.startElement(chapter) 564SAX.characters( , 4) 565SAX.startElement(title) 566SAX.characters(The Linux adventure, 19) 567SAX.endElement(title) 568SAX.characters( , 4) 569SAX.startElement(p) 570SAX.characters(bla bla bla ..., 15) 571SAX.endElement(p) 572SAX.characters( , 4) 573SAX.startElement(image, href='linus.gif') 574SAX.endElement(image) 575SAX.characters( , 4) 576SAX.startElement(p) 577SAX.characters(..., 3) 578SAX.endElement(p) 579SAX.characters( , 3) 580SAX.endElement(chapter) 581SAX.characters( , 1) 582SAX.endElement(EXAMPLE) 583SAX.endDocument()</pre> 584 585<p>Most of the other functionalities of libxml are based on the DOM 586tree-building facility, so nearly everything up to the end of this document 587presupposes the use of the standard DOM tree build. Note that the DOM tree 588itself is built by a set of registered default callbacks, without internal 589specific interface.</p> 590 591<h2><a name="library">The XML library interfaces</a></h2> 592 593<p>This section is directly intended to help programmers getting bootstrapped 594using the XML library from the C language. It is not intended to be extensive. 595I hope the automatically generated documents will provide the completeness 596required, but as a separate set of documents. The interfaces of the XML 597library are by principle low level, there is nearly zero abstraction. Those 598interested in a higher level API should <a href="#DOM">look at DOM</a>.</p> 599 600<p>The <a href="html/gnome-xml-parser.html">parser interfaces for XML</a> are 601separated from the <a href="html/gnome-xml-htmlparser.html">HTML parser 602interfaces</a>. Let's have a look at how the XML parser can be called:</p> 603 604<h3><a name="Invoking">Invoking the parser : the pull method</a></h3> 605 606<p>Usually, the first thing to do is to read an XML input. The parser accepts 607documents either from in-memory strings or from files. The functions are 608defined in "parser.h":</p> 609<dl> 610 <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt> 611 <dd><p>Parse a null-terminated string containing the document.</p> 612 </dd> 613</dl> 614<dl> 615 <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt> 616 <dd><p>Parse an XML document contained in a (possibly compressed) 617 file.</p> 618 </dd> 619</dl> 620 621<p>The parser returns a pointer to the document structure (or NULL in case of 622failure).</p> 623 624<h3 id="Invoking1">Invoking the parser: the push method</h3> 625 626<p>In order for the application to keep the control when the document is been 627fetched (which is common for GUI based programs) libxml provides a push 628interface, too, as of version 1.8.3. Here are the interface functions:</p> 629<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax, 630 void *user_data, 631 const char *chunk, 632 int size, 633 const char *filename); 634int xmlParseChunk (xmlParserCtxtPtr ctxt, 635 const char *chunk, 636 int size, 637 int terminate);</pre> 638 639<p>and here is a simple example showing how to use the interface:</p> 640<pre> FILE *f; 641 642 f = fopen(filename, "r"); 643 if (f != NULL) { 644 int res, size = 1024; 645 char chars[1024]; 646 xmlParserCtxtPtr ctxt; 647 648 res = fread(chars, 1, 4, f); 649 if (res > 0) { 650 ctxt = xmlCreatePushParserCtxt(NULL, NULL, 651 chars, res, filename); 652 while ((res = fread(chars, 1, size, f)) > 0) { 653 xmlParseChunk(ctxt, chars, res, 0); 654 } 655 xmlParseChunk(ctxt, chars, 0, 1); 656 doc = ctxt->myDoc; 657 xmlFreeParserCtxt(ctxt); 658 } 659 }</pre> 660 661<p>Also note that the HTML parser embedded into libxml also has a push 662interface; the functions are just prefixed by "html" rather than "xml"</p> 663 664<h3 id="Invoking2">Invoking the parser: the SAX interface</h3> 665 666<p>A couple of comments can be made, first this mean that the parser is 667memory-hungry, first to load the document in memory, second to build the tree. 668Reading a document without building the tree is possible using the SAX 669interfaces (see SAX.h and <a 670href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James 671Henstridge's documentation</a>). Note also that the push interface can be 672limited to SAX. Just use the two first arguments of 673<code>xmlCreatePushParserCtxt()</code>.</p> 674 675<h3><a name="Building">Building a tree from scratch</a></h3> 676 677<p>The other way to get an XML tree in memory is by building it. Basically 678there is a set of functions dedicated to building new elements. (These are 679also described in <libxml/tree.h>.) For example, here is a piece of code 680that produces the XML document used in the previous examples:</p> 681<pre> #include <libxml/tree.h> 682 xmlDocPtr doc; 683 xmlNodePtr tree, subtree; 684 685 doc = xmlNewDoc("1.0"); 686 doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL); 687 xmlSetProp(doc->children, "prop1", "gnome is great"); 688 xmlSetProp(doc->children, "prop2", "& linux too"); 689 tree = xmlNewChild(doc->children, NULL, "head", NULL); 690 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome"); 691 tree = xmlNewChild(doc->children, NULL, "chapter", NULL); 692 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure"); 693 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ..."); 694 subtree = xmlNewChild(tree, NULL, "image", NULL); 695 xmlSetProp(subtree, "href", "linus.gif");</pre> 696 697<p>Not really rocket science ...</p> 698 699<h3><a name="Traversing">Traversing the tree</a></h3> 700 701<p>Basically by <a href="html/gnome-xml-tree.html">including "tree.h"</a> your 702code has access to the internal structure of all the elements of the tree. The 703names should be somewhat simple like <strong>parent</strong>, 704<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>, 705<strong>properties</strong>, etc... For example, still with the previous 706example:</p> 707<pre><code>doc->children->children->children</code></pre> 708 709<p>points to the title element,</p> 710<pre>doc->children->children->next->child->child</pre> 711 712<p>points to the text node containing the chapter title "The Linux 713adventure".</p> 714 715<p><strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be 716present before the document root, so <code>doc->children</code> may point 717to an element which is not the document Root Element, a function 718<code>xmlDocGetRootElement()</code> was added for this purpose.</p> 719 720<h3><a name="Modifying">Modifying the tree</a></h3> 721 722<p>Functions are provided for reading and writing the document content. Here 723is an excerpt from the <a href="html/gnome-xml-tree.html">tree API</a>:</p> 724<dl> 725 <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const 726 xmlChar *value);</code></dt> 727 <dd><p>This sets (or changes) an attribute carried by an ELEMENT node. The 728 value can be NULL.</p> 729 </dd> 730</dl> 731<dl> 732 <dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar 733 *name);</code></dt> 734 <dd><p>This function returns a pointer to the property content. Note that 735 no extra copy is made.</p> 736 </dd> 737</dl> 738 739<p>Two functions are provided for reading and writing the text associated with 740elements:</p> 741<dl> 742 <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar 743 *value);</code></dt> 744 <dd><p>This function takes an "external" string and convert it to one text 745 node or possibly to a list of entity and text nodes. All non-predefined 746 entity references like &Gnome; will be stored internally as entity 747 nodes, hence the result of the function may not be a single node.</p> 748 </dd> 749</dl> 750<dl> 751 <dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int 752 inLine);</code></dt> 753 <dd><p>This function is the inverse of 754 <code>xmlStringGetNodeList()</code>. It generates a new string 755 containing the content of the text and entity nodes. Note the extra 756 argument inLine. If this argument is set to 1, the function will expand 757 entity references. For example, instead of returning the &Gnome; 758 XML encoding in the string, it will substitute it with its value (say, 759 "GNU Network Object Model Environment"). Set this argument if you want 760 to use the string for non-XML usage like User Interface.</p> 761 </dd> 762</dl> 763 764<h3><a name="Saving">Saving a tree</a></h3> 765 766<p>Basically 3 options are possible:</p> 767<dl> 768 <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int 769 *size);</code></dt> 770 <dd><p>Returns a buffer into which the document has been saved.</p> 771 </dd> 772</dl> 773<dl> 774 <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt> 775 <dd><p>Dumps a document to an open file descriptor.</p> 776 </dd> 777</dl> 778<dl> 779 <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt> 780 <dd><p>Saves the document to a file. In this case, the compression 781 interface is triggered if it has been turned on.</p> 782 </dd> 783</dl> 784 785<h3><a name="Compressio">Compression</a></h3> 786 787<p>The library transparently handles compression when doing file-based 788accesses. The level of compression on saves can be turned on either globally 789or individually for one file:</p> 790<dl> 791 <dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt> 792 <dd><p>Gets the document compression ratio (0-9).</p> 793 </dd> 794</dl> 795<dl> 796 <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt> 797 <dd><p>Sets the document compression ratio.</p> 798 </dd> 799</dl> 800<dl> 801 <dt><code>int xmlGetCompressMode(void);</code></dt> 802 <dd><p>Gets the default compression ratio.</p> 803 </dd> 804</dl> 805<dl> 806 <dt><code>void xmlSetCompressMode(int mode);</code></dt> 807 <dd><p>Sets the default compression ratio.</p> 808 </dd> 809</dl> 810 811<h2><a name="Entities">Entities or no entities</a></h2> 812 813<p>Entities in principle are similar to simple C macros. An entity defines an 814abbreviation for a given string that you can reuse many times throughout the 815content of your document. Entities are especially useful when a given string 816may occur frequently within a document, or to confine the change needed to a 817document to a restricted area in the internal subset of the document (at the 818beginning). Example:</p> 819<pre>1 <?xml version="1.0"?> 8202 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [ 8213 <!ENTITY xml "Extensible Markup Language"> 8224 ]> 8235 <EXAMPLE> 8246 &xml; 8257 </EXAMPLE></pre> 826 827<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing 828it's name with '&' and following it by ';' without any spaces added. There 829are 5 predefined entities in libxml allowing you to escape charaters with 830predefined meaning in some parts of the xml document content: 831<strong>&lt;</strong> for the character '<', <strong>&gt;</strong> 832for the character '>', <strong>&apos;</strong> for the character ''', 833<strong>&quot;</strong> for the character '"', and 834<strong>&amp;</strong> for the character '&'.</p> 835 836<p>One of the problems related to entities is that you may want the parser to 837substitute an entity's content so that you can see the replacement text in 838your application. Or you may prefer to keep entity references as such in the 839content to be able to save the document back without losing this usually 840precious information (if the user went through the pain of explicitly defining 841entities, he may have a a rather negative attitude if you blindly susbtitute 842them as saving time). The <a 843href="html/gnome-xml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> 844function allows you to check and change the behaviour, which is to not 845substitute entities by default.</p> 846 847<p>Here is the DOM tree built by libxml for the previous document in the 848default case:</p> 849<pre>/gnome/src/gnome-xml -> /xmllint --debug test/ent1 850DOCUMENT 851version=1.0 852 ELEMENT EXAMPLE 853 TEXT 854 content= 855 ENTITY_REF 856 INTERNAL_GENERAL_ENTITY xml 857 content=Extensible Markup Language 858 TEXT 859 content=</pre> 860 861<p>And here is the result when substituting entities:</p> 862<pre>/gnome/src/gnome-xml -> /tester --debug --noent test/ent1 863DOCUMENT 864version=1.0 865 ELEMENT EXAMPLE 866 TEXT 867 content= Extensible Markup Language</pre> 868 869<p>So, entities or no entities? Basically, it depends on your use case. I 870suggest that you keep the non-substituting default behaviour and avoid using 871entities in your XML document or data if you are not willing to handle the 872entity references elements in the DOM tree.</p> 873 874<p>Note that at save time libxml enforce the conversion of the predefined 875entities where necessary to prevent well-formedness problems, and will also 876transparently replace those with chars (i.e., it will not generate entity 877reference elements in the DOM tree or call the reference() SAX callback when 878finding them in the input).</p> 879 880<h2><a name="Namespaces">Namespaces</a></h2> 881 882<p>The libxml library implements <a 883href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by 884recognizing namespace contructs in the input, and does namespace lookup 885automatically when building the DOM tree. A namespace declaration is 886associated with an in-memory structure and all elements or attributes within 887that namespace point to it. Hence testing the namespace is a simple and fast 888equality operation at the user level.</p> 889 890<p>I suggest that people using libxml use a namespace, and declare it in the 891root element of their document as the default namespace. Then they don't need 892to use the prefix in the content but we will have a basis for future semantic 893refinement and merging of data from different sources. This doesn't augment 894significantly the size of the XML output, but significantly increase its value 895in the long-term. Example:</p> 896<pre><mydoc xmlns="http://mydoc.example.org/schemas/"> 897 <elem1>...</elem1> 898 <elem2>...</elem2> 899</mydoc></pre> 900 901<p>Concerning the namespace value, this has to be an URL, but the URL doesn't 902have to point to any existing resource on the Web. It will bind all the 903element and atributes with that URL. I suggest to use an URL within a domain 904you control, and that the URL should contain some kind of version information 905if possible. For example, <code>"http://www.gnome.org/gnumeric/1.0/"</code> is 906a good namespace scheme.</p> 907 908<p>Then when you load a file, make sure that a namespace carrying the 909version-independent prefix is installed on the root element of your document, 910and if the version information don't match something you know, warn the user 911and be liberal in what you accept as the input. Also do *not* try to base 912namespace checking on the prefix value. <foo:text> may be exactly the 913same as <bar:text> in another document. What really matter is the URI 914associated with the element or the attribute, not the prefix string (which is 915just a shortcut for the full URI). In libxml element and attributes have a 916<code>ns</code> field pointing to an xmlNs structure detailing the namespace 917prefix and it's URI.</p> 918 919<p>@@Interfaces@@</p> 920 921<p>@@Examples@@</p> 922 923<p>Usually people object using namespace in the case of validation, I object 924this and will make sure that using namespaces won't break validity checking, 925so even is you plan to use or currently are using validation I strongly 926suggest adding namespaces to your document. A default namespace scheme 927<code>xmlns="http://...."</code> should not break validity even on less 928flexible parsers. Now using namespace to mix and differentiate content coming 929from multiple DTDs will certainly break current validation schemes. I will try 930to provide ways to do this, but this may not be portable or standardized.</p> 931 932<h2><a name="Validation">Validation, or are you afraid of DTDs ?</a></h2> 933 934<p>Well what is validation and what is a DTD ?</p> 935 936<p>Validation is the process of checking a document against a set of 937construction rules, a <strong>DTD</strong> (Document Type Definition) is such 938a set of rules.</p> 939 940<p>The validation process and building DTDs are the two most difficult parts 941of XML life cycle. Briefly a DTD defines all the possibles element to be 942found within your document, what is the formal shape of your document tree (by 943defining the allowed content of an element, either text, a regular expression 944for the allowed list of children, or mixed content i.e. both text and 945children). The DTD also defines the allowed attributes for all elements and 946the types of the attributes. For more detailed informations, I suggest to read 947the related parts of the XML specification, the examples found under 948gnome-xml/test/valid/dtd and the large amount of books available on XML. The 949dia example in gnome-xml/test/valid should be both simple and complete enough 950to allow you to build your own.</p> 951 952<p>A word of warning, building a good DTD which will fit your needs of your 953application in the long-term is far from trivial, however the extra level of 954quality it can insure is well worth the price for some sets of applications or 955if you already have already a DTD defined for your application field.</p> 956 957<p>The validation is not completely finished but in a (very IMHO) usable 958state. Until a real validation interface is defined the way to do it is to 959define and set the <strong>xmlDoValidityCheckingDefaultValue</strong> external 960variable to 1, this will of course be changed at some point:</p> 961 962<p>extern int xmlDoValidityCheckingDefaultValue;</p> 963 964<p>...</p> 965 966<p>xmlDoValidityCheckingDefaultValue = 1;</p> 967 968<p></p> 969 970<p>To handle external entities, use the function 971<strong>xmlSetExternalEntityLoader</strong>(xmlExternalEntityLoader f); to 972link in you HTTP/FTP/Entities database library to the standard libxml 973core.</p> 974 975<p>@@interfaces@@</p> 976 977<h2><a name="DOM"></a><a name="Principles">DOM Principles</a></h2> 978 979<p><a href="http://www.w3.org/DOM/">DOM</a> stands for the <em>Document Object 980Model</em> this is an API for accessing XML or HTML structured documents. 981Native support for DOM in Gnome is on the way (module gnome-dom), and it will 982be based on gnome-xml. This will be a far cleaner interface to manipulate XML 983files within Gnome since it won't expose the internal structure. DOM defines a 984set of IDL (or Java) interfaces allowing to traverse and manipulate a 985document. The DOM library will allow accessing and modifying "live" documents 986presents on other programs like this:</p> 987 988<p><img src="DOM.gif" alt=" DOM.gif "></p> 989 990<p>This should help greatly doing things like modifying a gnumeric spreadsheet 991embedded in a GWP document for example.</p> 992 993<p>The current DOM implementation on top of libxml is the <a 994href="http://cvs.gnome.org/lxr/source/gdome/">gdome Gnome module</a>, this is 995a full DOM interface, thanks to <a href="mailto:raph@levien.com">Raph 996Levien</a>.</p> 997 998<p>The gnome-dom module in the Gnome CVS base is obsolete</p> 999 1000<h2><a name="Example"></a><a name="real">A real example</a></h2> 1001 1002<p>Here is a real size example, where the actual content of the application 1003data is not kept in the DOM tree but uses internal structures. It is based on 1004a proposal to keep a database of jobs related to Gnome, with an XML based 1005storage structure. Here is an <a href="gjobs.xml">XML encoded jobs 1006base</a>:</p> 1007<pre><?xml version="1.0"?> 1008<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location"> 1009 <gjob:Jobs> 1010 1011 <gjob:Job> 1012 <gjob:Project ID="3"/> 1013 <gjob:Application>GBackup</gjob:Application> 1014 <gjob:Category>Development</gjob:Category> 1015 1016 <gjob:Update> 1017 <gjob:Status>Open</gjob:Status> 1018 <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified> 1019 <gjob:Salary>USD 0.00</gjob:Salary> 1020 </gjob:Update> 1021 1022 <gjob:Developers> 1023 <gjob:Developer> 1024 </gjob:Developer> 1025 </gjob:Developers> 1026 1027 <gjob:Contact> 1028 <gjob:Person>Nathan Clemons</gjob:Person> 1029 <gjob:Email>nathan@windsofstorm.net</gjob:Email> 1030 <gjob:Company> 1031 </gjob:Company> 1032 <gjob:Organisation> 1033 </gjob:Organisation> 1034 <gjob:Webpage> 1035 </gjob:Webpage> 1036 <gjob:Snailmail> 1037 </gjob:Snailmail> 1038 <gjob:Phone> 1039 </gjob:Phone> 1040 </gjob:Contact> 1041 1042 <gjob:Requirements> 1043 The program should be released as free software, under the GPL. 1044 </gjob:Requirements> 1045 1046 <gjob:Skills> 1047 </gjob:Skills> 1048 1049 <gjob:Details> 1050 A GNOME based system that will allow a superuser to configure 1051 compressed and uncompressed files and/or file systems to be backed 1052 up with a supported media in the system. This should be able to 1053 perform via find commands generating a list of files that are passed 1054 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 1055 or via operations performed on the filesystem itself. Email 1056 notification and GUI status display very important. 1057 </gjob:Details> 1058 1059 </gjob:Job> 1060 1061 </gjob:Jobs> 1062</gjob:Helping></pre> 1063 1064<p>While loading the XML file into an internal DOM tree is a matter of calling 1065only a couple of functions, browsing the tree to gather the informations and 1066generate the internals structures is harder, and more error prone.</p> 1067 1068<p>The suggested principle is to be tolerant with respect to the input 1069structure. For example, the ordering of the attributes is not significant, 1070Cthe XML specification is clear about it. It's also usually a good idea to not 1071be dependent of the orders of the children of a given node, unless it really 1072makes things harder. Here is some code to parse the informations for a 1073person:</p> 1074<pre>/* 1075 * A person record 1076 */ 1077typedef struct person { 1078 char *name; 1079 char *email; 1080 char *company; 1081 char *organisation; 1082 char *smail; 1083 char *webPage; 1084 char *phone; 1085} person, *personPtr; 1086 1087/* 1088 * And the code needed to parse it 1089 */ 1090personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 1091 personPtr ret = NULL; 1092 1093DEBUG("parsePerson\n"); 1094 /* 1095 * allocate the struct 1096 */ 1097 ret = (personPtr) malloc(sizeof(person)); 1098 if (ret == NULL) { 1099 fprintf(stderr,"out of memory\n"); 1100 return(NULL); 1101 } 1102 memset(ret, 0, sizeof(person)); 1103 1104 /* We don't care what the top level element name is */ 1105 cur = cur->xmlChildrenNode; 1106 while (cur != NULL) { 1107 if ((!strcmp(cur->name, "Person")) && (cur->ns == ns)) 1108 ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1109 if ((!strcmp(cur->name, "Email")) && (cur->ns == ns)) 1110 ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1111 cur = cur->next; 1112 } 1113 1114 return(ret); 1115}</pre> 1116 1117<p>Here is a couple of things to notice:</p> 1118<ul> 1119 <li>Usually a recursive parsing style is the more convenient one, XML data 1120 being by nature subject to repetitive constructs and usualy exibit highly 1121 stuctured patterns.</li> 1122 <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, i.e. 1123 the pointer to the global XML document and the namespace reserved to the 1124 application. Document wide information are needed for example to decode 1125 entities and it's a good coding practice to define a namespace for your 1126 application set of data and test that the element and attributes you're 1127 analyzing actually pertains to your application space. This is done by a 1128 simple equality test (cur->ns == ns).</li> 1129 <li>To retrieve text and attributes value, it is suggested to use the 1130 function <em>xmlNodeListGetString</em> to gather all the text and entity 1131 reference nodes generated by the DOM output and produce an single text 1132 string.</li> 1133</ul> 1134 1135<p>Here is another piece of code used to parse another level of the 1136structure:</p> 1137<pre>#include <libxml/tree.h> 1138/* 1139 * a Description for a Job 1140 */ 1141typedef struct job { 1142 char *projectID; 1143 char *application; 1144 char *category; 1145 personPtr contact; 1146 int nbDevelopers; 1147 personPtr developers[100]; /* using dynamic alloc is left as an exercise */ 1148} job, *jobPtr; 1149 1150/* 1151 * And the code needed to parse it 1152 */ 1153jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 1154 jobPtr ret = NULL; 1155 1156DEBUG("parseJob\n"); 1157 /* 1158 * allocate the struct 1159 */ 1160 ret = (jobPtr) malloc(sizeof(job)); 1161 if (ret == NULL) { 1162 fprintf(stderr,"out of memory\n"); 1163 return(NULL); 1164 } 1165 memset(ret, 0, sizeof(job)); 1166 1167 /* We don't care what the top level element name is */ 1168 cur = cur->xmlChildrenNode; 1169 while (cur != NULL) { 1170 1171 if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) { 1172 ret->projectID = xmlGetProp(cur, "ID"); 1173 if (ret->projectID == NULL) { 1174 fprintf(stderr, "Project has no ID\n"); 1175 } 1176 } 1177 if ((!strcmp(cur->name, "Application")) && (cur->ns == ns)) 1178 ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1179 if ((!strcmp(cur->name, "Category")) && (cur->ns == ns)) 1180 ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1181 if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns)) 1182 ret->contact = parsePerson(doc, ns, cur); 1183 cur = cur->next; 1184 } 1185 1186 return(ret); 1187}</pre> 1188 1189<p>One can notice that once used to it, writing this kind of code is quite 1190simple, but boring. Ultimately, it could be possble to write stubbers taking 1191either C data structure definitions, a set of XML examples or an XML DTD and 1192produce the code needed to import and export the content between C data and 1193XML storage. This is left as an exercise to the reader :-)</p> 1194 1195<p>Feel free to use <a href="example/gjobread.c">the code for the full C 1196parsing example</a> as a template, it is also available with Makefile in the 1197Gnome CVS base under gnome-xml/example</p> 1198 1199<h2><a name="Contributi">Contributions</a></h2> 1200<ul> 1201 <li><a href="mailto:ari@btigate.com">Ari Johnson</a> provides a C++ wrapper 1202 for libxml: 1203 <p>Website: <a 1204 href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p> 1205 <p>Download: <a 1206 href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p> 1207 </li> 1208 <li><a href="mailto:doolin@cs.utk.edu">David Doolin</a> provides a 1209 precompiled Windows version 1210 <p><a 1211 href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p> 1212 </li> 1213 <li><a href="mailto:fnatter@gmx.net">Felix Natter</a> provided <a 1214 href="libxml-doc.el">an emacs module</a> to lookup libxml functions 1215 documentation</li> 1216 <li><a href="mailto:sherwin@nlm.nih.gov">Ziying Sherwin</a> provided <a 1217 href="http://xmlsoft.org/messages/0488.html">man pages</a> (not yet 1218 integrated in the distribution)</li> 1219</ul> 1220 1221<p></p> 1222 1223<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p> 1224 1225<p>$Id: xml.html,v 1.43 2000/07/17 14:38:19 veillard Exp $</p> 1226</body> 1227</html> 1228