xml.html revision 2adbb514c38dfb4a5649a928aa50900482edddb1
1<html> 2<head> 3 <title>The XML C library for Gnome</title> 4 <meta name="GENERATOR" content="amaya V4.1"> 5 <meta http-equiv="Content-Type" content="text/html"> 6</head> 7 8<body bgcolor="#ffffff"> 9<p><a href="http://www.gnome.org/"><img src="smallfootonly.gif" 10alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" 11alt="W3C Logo"></a><a href="http://www.redhat.com"><img src="redhat.gif" 12alt="Red Hat Logo"></a></p> 13 14<h1 align="center">The XML C library for Gnome</h1> 15 16<h2 style="text-align: center">libxml, a.k.a. gnome-xml</h2> 17 18<p></p> 19<ul> 20 <li><a href="#Introducti">Introduction</a></li> 21 <li><a href="#Documentat">Documentation</a></li> 22 <li><a href="#Reporting">Reporting bugs and getting help</a></li> 23 <li><a href="#help">how to help</a></li> 24 <li><a href="#Downloads">Downloads</a></li> 25 <li><a href="#News">News</a></li> 26 <li><a href="#XML">XML</a></li> 27 <li><a href="#XSLT">XSLT</a></li> 28 <li><a href="#tree">The tree output</a></li> 29 <li><a href="#interface">The SAX interface</a></li> 30 <li><a href="#library">The XML library interfaces</a> 31 <ul> 32 <li><a href="#Invoking">Invoking the parser: the pull way</a></li> 33 <li><a href="#Invoking">Invoking the parser: the push way</a></li> 34 <li><a href="#Invoking2">Invoking the parser: the SAX interface</a></li> 35 <li><a href="#Building">Building a tree from scratch</a></li> 36 <li><a href="#Traversing">Traversing the tree</a></li> 37 <li><a href="#Modifying">Modifying the tree</a></li> 38 <li><a href="#Saving">Saving the tree</a></li> 39 <li><a href="#Compressio">Compression</a></li> 40 </ul> 41 </li> 42 <li><a href="#Entities">Entities or no entities</a></li> 43 <li><a href="#Namespaces">Namespaces</a></li> 44 <li><a href="#Validation">Validation</a></li> 45 <li><a href="#Principles">DOM principles</a></li> 46 <li><a href="#real">A real example</a></li> 47 <li><a href="#Contributi">Contributions</a></li> 48</ul> 49 50<p>Separate documents:</p> 51<ul> 52 <li><a href="upgrade.html">upgrade instructions for migrating to 53 libxml2</a></li> 54 <li><a href="encoding.html">libxml Internationalization support</a></li> 55 <li><a href="xmlio.html">libxml Input/Output interfaces</a></li> 56 <li><a href="xmlmem.html">libxml Memory interfaces</a></li> 57 <li><a href="xmldtd.html">a short introduction about DTDs and 58 libxml</a></li> 59 <li><a href="http://xmlsoft.org/XSLT/">the libxslt page</a></li> 60 <li><a href="http://www.cs.unibo.it/~casarini/gdome2/">the gdome2 page: a 61 standard DOM interface for libxml2</a></li> 62</ul> 63 64<h2><a name="Introducti">Introduction</a></h2> 65 66<p>This document describes libxml, the <a 67href="http://www.w3.org/XML/">XML</a> C library developped for the <a 68href="http://www.gnome.org/">Gnome</a> project. <a 69href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based 70structured documents/data.</p> 71 72<p>Here are some key points about libxml:</p> 73<ul> 74 <li>Libxml exports Push and Pull type parser interfaces for both XML and 75 HTML.</li> 76 <li>Libxml can do DTD validation at parse time, using a parsed document 77 instance, or with an arbitrary DTD.</li> 78 <li>Libxml now includes nearly complete <a 79 href="http://www.w3.org/TR/xpath">XPath</a> and <a 80 href="http://www.w3.org/TR/xptr">XPointer</a> implementations.</li> 81 <li>It is written in plain C, making as few assumptions as possible, and 82 sticking closely to ANSI C/POSIX for easy embedding. Works on 83 Linux/Unix/Windows, ported to a number of other platforms.</li> 84 <li>Basic support for HTTP and FTP client allowing aplications to fetch 85 remote resources</li> 86 <li>The design is modular, most of the extensions can be compiled out.</li> 87 <li>The internal document repesentation is as close as possible to the <a 88 href="http://www.w3.org/DOM/">DOM</a> interfaces.</li> 89 <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX 90 like interface</a>; the interface is designed to be compatible with <a 91 href="http://www.jclark.com/xml/expat.html">Expat</a>.</li> 92 <li>This library is released both under the <a 93 href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C 94 IPR</a> and the <a href="http://www.gnu.org/copyleft/lesser.html">GNU 95 LGPL</a>. Use either at your convenience, basically this should make 96 everybody happy, if not, drop me a mail.</li> 97</ul> 98 99<p>Warning: unless you are forced to because your application links with a 100Gnome library requiring it, <strong><span 101style="background-color: #FF0000">Do Not Use libxml1</span></strong>, use 102libxml2</p> 103 104<h2><a name="Documentat">Documentation</a></h2> 105 106<p>There are some on-line resources about using libxml:</p> 107<ol> 108 <li>Check the <a href="FAQ.html">FAQ</a></li> 109 <li>Check the <a href="http://xmlsoft.org/html/libxml-lib.html">extensive 110 documentation</a> automatically extracted from code comments (using <a 111 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gtk-doc">gtk 112 doc</a>).</li> 113 <li>Look at the documentation about <a href="encoding.html">libxml 114 internationalization support</a></li> 115 <li>This page provides a global overview and <a href="#real">some 116 examples</a> on how to use libxml.</li> 117 <li><a href="mailto:james@daa.com.au">James Henstridge</a> wrote <a 118 href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">some nice 119 documentation</a> explaining how to use the libxml SAX interface.</li> 120 <li>George Lebl wrote <a 121 href="http://www-4.ibm.com/software/developer/library/gnome3/">an article 122 for IBM developerWorks</a> about using libxml.</li> 123 <li>Check <a href="http://cvs.gnome.org/lxr/source/gnome-xml/TODO">the TODO 124 file</a></li> 125 <li>Read the <a href="upgrade.html">1.x to 2.x upgrade path</a>. If you are 126 starting a new project using libxml you should really use the 2.x 127 version.</li> 128 <li>And don't forget to look at the <a href="/messages/">mailing-list 129 archive</a>.</li> 130</ol> 131 132<h2><a name="Reporting">Reporting bugs and getting help</a></h2> 133 134<p>Well, bugs or missing features are always possible, and I will make a point 135of fixing them in a timely fashion. The best way to report a bug is to use the 136<a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Gnome bug 137tracking database</a> (make sure to use the "libxml" module name). I look at 138reports there regularly and it's good to have a reminder when a bug is still 139open. Check the <a 140href="http://bugzilla.gnome.org/bugwritinghelp.html">instructions on reporting 141bugs</a> and be sure to specify that the bug is for the package libxml.</p> 142 143<p>There is also a mailing-list <a 144href="mailto:xml@gnome.org">xml@gnome.org</a> for libxml, with an <a 145href="http://mail.gnome.org/archives/xml/">on-line archive</a> (<a 146href="http://xmlsoft.org/messages">old</a>). To subscribe to this list, please 147visit the <a href="http://mail.gnome.org/mailman/listinfo/xml">associated 148Web</a> page and follow the instructions. <strong>Do not send code, I won't 149debug it</strong> (but patches are really appreciated!), make sure you can 150reproduce the bug with xmllint or one of the test programs found in source in 151the distribution and send the command showing the error as well as the input 152(as an attachement), thanks.</p> 153 154<p>Alternatively, you can just send the bug to the <a 155href="mailto:xml@gnome.org">xml@gnome.org</a> list; if it's really libxml 156related I will approve it.. Please do not send me mail directly especially for 157portability problem, it makes things really harder to track and in some cases 158I'm not the best person to answer a given question, ask the list instead.</p> 159 160<p>Of course, bugs reported with a suggested patch for fixing them will 161probably be processed faster.</p> 162 163<p>If you're looking for help, a quick look at <a 164href="http://xmlsoft.org/messages/#407">the list archive</a> may actually 165provide the answer, I usually send source samples when answering libxml usage 166questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated 167documentantion</a> is not as polished as I would like (i need to learn more 168about Docbook), but it's a good starting point.</p> 169 170<h2><a name="help">How to help</a></h2> 171 172<p>You can help the project in various ways, the best thing to do first is to 173subscribe to the mailing-list as explained before, check the <a 174href="http://xmlsoft.org/messages/">archives </a>and the <a 175href="http://bugs.gnome.org/db/pa/lgnome-xml.html">Gnome bug 176database:</a>:</p> 177<ol> 178 <li>provide patches when you find problems</li> 179 <li>provide the diffs when you port libxml to a new platform. They may not 180 be integrated in all cases but help pinpointing portability problems 181 and</li> 182 <li>provide documentation fixes (either as patches to the code comments or 183 as HTML diffs).</li> 184 <li>provide new documentations pieces (translations, examples, etc ...)</li> 185 <li>Check the TODO file and try to close one of the items</li> 186 <li>take one of the points raised in the archive or the bug database and 187 provide a fix. <a href="mailto:Daniel.Veillard@imag.fr">Get in touch with 188 me </a>before to avoid synchronization problems and check that the 189 suggested fix will fit in nicely :-)</li> 190</ol> 191 192<h2><a name="Downloads">Downloads</a></h2> 193 194<p>The latest versions of libxml can be found on <a 195href="ftp://xmlsoft.org/">xmlsoft.org</a> (<a 196href="ftp://speakeasy.rpmfind.net/pub/libxml/">Seattle</a>, <a 197href="ftp://fr.rpmfind.net/pub/libxml/">France</a>) or on the <a 198href="ftp://ftp.gnome.org/pub/GNOME/MIRRORS.html">Gnome FTP server</a> either 199as a <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">source 200archive</a> or <a 201href="ftp://ftp.gnome.org/pub/GNOME/stable/redhat/i386/libxml/">RPM 202packages</a>. (NOTE that you need both the <a 203href="http://rpmfind.net/linux/RPM/libxml2.html">libxml(2)</a> and <a 204href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml(2)-devel</a> 205packages installed to compile applications using libxml.)</p> 206 207<p><a name="Snapshot">Snapshot:</a></p> 208<ul> 209 <li>Code from the W3C cvs base libxml <a 210 href="ftp://xmlsoft.org/cvs-snapshot.tar.gz">cvs-snapshot.tar.gz</a></li> 211 <li>Docs, content of the web site, the list archive included <a 212 href="ftp://xmlsoft.org/libxml-docs.tar.gz">libxml-docs.tar.gz</a></li> 213</ul> 214 215<p><a name="Contribs">Contribs:</a></p> 216 217<p>I do accept external contributions, especially if compiling on another 218platform, get in touch with me to upload the package. I will keep them in the 219<a href="ftp://xmlsoft.org/contribs/">contrib directory</a></p> 220 221<p>Libxml is also available from CVS:</p> 222<ul> 223 <li><p>The <a 224 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gnome-xml">Gnome 225 CVS base</a>. Check the <a 226 href="http://developer.gnome.org/tools/cvs.html">Gnome CVS Tools</a> page; 227 the CVS module is <b>gnome-xml</b>.</p> 228 </li> 229 <li>The <strong>libxslt</strong> module is also present there</li> 230</ul> 231 232<h2><a name="News">News</a></h2> 233 234<h3>CVS only : check the <a 235href="http://cvs.gnome.org/lxr/source/gnome-xml/ChangeLog">Changelog</a> file 236for a really accurate description</h3> 237 238<p>Items floating around but not actively worked on, get in touch with me if 239you want to test those</p> 240<ul> 241 <li>Implementing <a href="http://xmlsoft.org/XSLT">XSLT</a>, this is done as 242 a separate C library on top of libxml called libxslt</li> 243 <li>Finishing up <a href="http://www.w3.org/TR/xptr">XPointer</a> and <a 244 href="http://www.w3.org/TR/xinclude">XInclude</a></li> 245 <li>(seeems working but delayed from release) parsing/import of Docbook SGML 246 docs</li> 247</ul> 248 249<h3>2.3.13: June 28 2001</h3> 250<ul> 251 <li>2.3.12 configure.in was broken as well as the push mode XML parser</li> 252 <li>a few more fixes for compilation on Windows MSC by Yon Derek</li> 253</ul> 254 255<h3>1.8.14: June 28 2001</h3> 256<ul> 257 <li>Zbigniew Chyla gave a patch to use the old XML parser in push mode</li> 258 <li>Small Makefile fix</li> 259</ul> 260 261<h3>2.3.12: June 26 2001</h3> 262<ul> 263 <li>lots of cleanup</li> 264 <li>a couple of validation fix</li> 265 <li>fixed line number counting</li> 266 <li>fixed serious problems in the XInclude processing</li> 267 <li>added support for UTF8 BOM at beginning of entities</li> 268 <li>fixed a strange gcc optimizer bugs in xpath handling of float, gcc-3.0 269 miscompile uri.c (William), Thomas Leitner provided a fix for the 270 optimizer on Tru64</li> 271 <li>incorporated Yon Derek and Igor Zlatkovic fixes and improvements for 272 compilation on Windows MSC</li> 273 <li>update of libxml-doc.el (Felix Natter)</li> 274 <li>fixed 2 bugs in URI normalization code</li> 275</ul> 276 277<h3>2.3.11: June 17 2001</h3> 278<ul> 279 <li>updates to trio, Makefiles and configure should fix some portability 280 problems (alpha)</li> 281 <li>fixed some HTML serialization problems (pre, script, and block/inline 282 handling), added encoding aware APIs, cleanup of this code</li> 283 <li>added xmlHasNsProp()</li> 284 <li>implemented a specific PI for encoding support in the DocBook SGML 285 parser</li> 286 <li>some XPath fixes (-Infinity, / as a function parameter and namespaces 287 node selection)</li> 288 <li>fixed a performance problem and an error in the validation code</li> 289 <li>fixed XInclude routine to implement the recursive behaviour</li> 290 <li>fixed xmlFreeNode problem when libxml is included statically twice</li> 291 <li>added --version to xmllint for bug reports</li> 292</ul> 293 294<h3>2.3.10: June 1 2001</h3> 295<ul> 296 <li>fixed the SGML catalog support</li> 297 <li>a number of reported bugs got fixed, in XPath, iconv detection, XInclude 298 processing</li> 299 <li>XPath string function should now handle unicode correctly</li> 300</ul> 301 302<h3>2.3.9: May 19 2001</h3> 303 304<p>Lots of bugfixes, and added a basic SGML catalog support:</p> 305<ul> 306 <li>HTML push bugfix #54891 and another patch from Jonas Borgstr�m</li> 307 <li>some serious speed optimisation again</li> 308 <li>some documentation cleanups</li> 309 <li>trying to get better linking on solaris (-R)</li> 310 <li>XPath API cleanup from Thomas Broyer</li> 311 <li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed 312 xmlValidGetValidElements()</li> 313 <li>Added an INSTALL file</li> 314 <li>Attribute removal added to API: #54433</li> 315 <li>added a basic support for SGML catalogs</li> 316 <li>fixed xmlKeepBlanksDefault(0) API</li> 317 <li>bugfix in xmlNodeGetLang()</li> 318 <li>fixed a small configure portability problem</li> 319 <li>fixed an inversion of SYSTEM and PUBLIC identifier in HTML document</li> 320</ul> 321 322<h3>1.8.13: May 14 2001</h3> 323<ul> 324 <li>bugfixes release of the old libxml1 branch used by Gnome</li> 325</ul> 326 327<h3>2.3.8: May 3 2001</h3> 328<ul> 329 <li>Integrated an SGML DocBook parser for the Gnome project</li> 330 <li>Fixed a few things in the HTML parser</li> 331 <li>Fixed some XPath bugs raised by XSLT use, tried to fix the floating 332 point portability issue</li> 333 <li>Speed improvement (8M/s for SAX, 3M/s for DOM, 1.5M/s for DOM+validation 334 using the XML REC as input and a 700MHz celeron).</li> 335 <li>incorporated more Windows cleanup</li> 336 <li>added xmlSaveFormatFile()</li> 337 <li>fixed problems in copying nodes with entities references (gdome)</li> 338 <li>removed some troubles surrounding the new validation module</li> 339</ul> 340 341<h3>2.3.7: April 22 2001</h3> 342<ul> 343 <li>lots of small bug fixes, corrected XPointer</li> 344 <li>Non determinist content model validation support</li> 345 <li>added xmlDocCopyNode for gdome2</li> 346 <li>revamped the way the HTML parser handles end of tags</li> 347 <li>XPath: corrctions of namespacessupport and number formatting</li> 348 <li>Windows: Igor Zlatkovic patches for MSC compilation</li> 349 <li>HTML ouput fixes from P C Chow and William M. Brack</li> 350 <li>Improved validation speed sensible for DocBook</li> 351 <li>fixed a big bug with ID declared in external parsed entities</li> 352 <li>portability fixes, update of Trio from Bjorn Reese</li> 353</ul> 354 355<h3>2.3.6: April 8 2001</h3> 356<ul> 357 <li>Code cleanup using extreme gcc compiler warning options, found and 358 cleared half a dozen potential problem</li> 359 <li>the Eazel team found an XML parser bug</li> 360 <li>cleaned up the user of some of the string formatting function. used the 361 trio library code to provide the one needed when the platform is missing 362 them</li> 363 <li>xpath: removed a memory leak and fixed the predicate evaluation problem, 364 extended the testsuite and cleaned up the result. XPointer seems broken 365 ...</li> 366</ul> 367 368<h3>2.3.5: Mar 23 2001</h3> 369<ul> 370 <li>Biggest change is separate parsing and evaluation of XPath expressions, 371 there is some new APIs for this too</li> 372 <li>included a number of bug fixes(XML push parser, 51876, notations, 373 52299)</li> 374 <li>Fixed some portability issues</li> 375</ul> 376 377<h3>2.3.4: Mar 10 2001</h3> 378<ul> 379 <li>Fixed bugs #51860 and #51861</li> 380 <li>Added a global variable xmlDefaultBufferSize to allow default buffer 381 size to be application tunable.</li> 382 <li>Some cleanup in the validation code, still a bug left and this part 383 should probably be rewritten to support ambiguous content model :-\</li> 384 <li>Fix a couple of serious bugs introduced or raised by changes in 2.3.3 385 parser</li> 386 <li>Fixed another bug in xmlNodeGetContent()</li> 387 <li>Bjorn fixed XPath node collection and Number formatting</li> 388 <li>Fixed a loop reported in the HTML parsing</li> 389 <li>blank space are reported even if the Dtd content model proves that they 390 are formatting spaces, this is for XmL conformance</li> 391</ul> 392 393<h3>2.3.3: Mar 1 2001</h3> 394<ul> 395 <li>small change in XPath for XSLT</li> 396 <li>documentation cleanups</li> 397 <li>fix in validation by Gary Pennington</li> 398 <li>serious parsing performances improvements</li> 399</ul> 400 401<h3>2.3.2: Feb 24 2001</h3> 402<ul> 403 <li>chasing XPath bugs, found a bunch, completed some TODO</li> 404 <li>fixed a Dtd parsing bug</li> 405 <li>fixed a bug in xmlNodeGetContent</li> 406 <li>ID/IDREF support partly rewritten by Gary Pennington</li> 407</ul> 408 409<h3>2.3.1: Feb 15 2001</h3> 410<ul> 411 <li>some XPath and HTML bug fixes for XSLT</li> 412 <li>small extension of the hash table interfaces for DOM gdome2 413 implementation</li> 414 <li>A few bug fixes</li> 415</ul> 416 417<h3>2.3.0: Feb 8 2001 (2.2.12 was on 25 Jan but I didn't kept track)</h3> 418<ul> 419 <li>Lots of XPath bug fixes</li> 420 <li>Add a mode with Dtd lookup but without validation error reporting for 421 XSLT</li> 422 <li>Add support for text node without escaping (XSLT)</li> 423 <li>bug fixes for xmlCheckFilename</li> 424 <li>validation code bug fixes from Gary Pennington</li> 425 <li>Patch from Paul D. Smith correcting URI path normalization</li> 426 <li>Patch to allow simultaneous install of libxml-devel and 427 libxml2-devel</li> 428 <li>the example Makefile is now fixed</li> 429 <li>added HTML to the RPM packages</li> 430 <li>tree copying bugfixes</li> 431 <li>updates to Windows makefiles</li> 432 <li>optimisation patch from Bjorn Reese</li> 433</ul> 434 435<h3>2.2.11: Jan 4 2001</h3> 436<ul> 437 <li>bunch of bug fixes (memory I/O, xpath, ftp/http, ...)</li> 438 <li>added htmlHandleOmittedElem()</li> 439 <li>Applied Bjorn Reese's IPV6 first patch</li> 440 <li>Applied Paul D. Smith patches for validation of XInclude results</li> 441 <li>added XPointer xmlns() new scheme support</li> 442</ul> 443 444<h3>2.2.10: Nov 25 2000</h3> 445<ul> 446 <li>Fix the Windows problems of 2.2.8</li> 447 <li>integrate OpenVMS patches</li> 448 <li>better handling of some nasty HTML input</li> 449 <li>Improved the XPointer implementation</li> 450 <li>integrate a number of provided patches</li> 451</ul> 452 453<h3>2.2.9: Nov 25 2000</h3> 454<ul> 455 <li>erroneous release :-(</li> 456</ul> 457 458<h3>2.2.8: Nov 13 2000</h3> 459<ul> 460 <li>First version of <a href="http://www.w3.org/TR/xinclude">XInclude</a> 461 support</li> 462 <li>Patch in conditional section handling</li> 463 <li>updated MS compiler project</li> 464 <li>fixed some XPath problems</li> 465 <li>added an URI escaping function</li> 466 <li>some other bug fixes</li> 467</ul> 468 469<h3>2.2.7: Oct 31 2000</h3> 470<ul> 471 <li>added message redirection</li> 472 <li>XPath improvements (thanks TOM !)</li> 473 <li>xmlIOParseDTD() added</li> 474 <li>various small fixes in the HTML, URI, HTTP and XPointer support</li> 475 <li>some cleanup of the Makefile, autoconf and the distribution content</li> 476</ul> 477 478<h3>2.2.6: Oct 25 2000:</h3> 479<ul> 480 <li>Added an hash table module, migrated a number of internal structure to 481 those</li> 482 <li>Fixed a posteriori validation problems</li> 483 <li>HTTP module cleanups</li> 484 <li>HTML parser improvements (tag errors, script/style handling, attribute 485 normalization)</li> 486 <li>coalescing of adjacent text nodes</li> 487 <li>couple of XPath bug fixes, exported the internal API</li> 488</ul> 489 490<h3>2.2.5: Oct 15 2000:</h3> 491<ul> 492 <li>XPointer implementation and testsuite</li> 493 <li>Lot of XPath fixes, added variable and functions registration, more 494 tests</li> 495 <li>Portability fixes, lots of enhancements toward an easy Windows build and 496 release</li> 497 <li>Late validation fixes</li> 498 <li>Integrated a lot of contributed patches</li> 499 <li>added memory management docs</li> 500 <li>a performance problem when using large buffer seems fixed</li> 501</ul> 502 503<h3>2.2.4: Oct 1 2000:</h3> 504<ul> 505 <li>main XPath problem fixed</li> 506 <li>Integrated portability patches for Windows</li> 507 <li>Serious bug fixes on the URI and HTML code</li> 508</ul> 509 510<h3>2.2.3: Sep 17 2000</h3> 511<ul> 512 <li>bug fixes</li> 513 <li>cleanup of entity handling code</li> 514 <li>overall review of all loops in the parsers, all sprintf usage has been 515 checked too</li> 516 <li>Far better handling of larges Dtd. Validating against Docbook XML Dtd 517 works smoothly now.</li> 518</ul> 519 520<h3>1.8.10: Sep 6 2000</h3> 521<ul> 522 <li>bug fix release for some Gnome projects</li> 523</ul> 524 525<h3>2.2.2: August 12 2000</h3> 526<ul> 527 <li>mostly bug fixes</li> 528 <li>started adding routines to access xml parser context options</li> 529</ul> 530 531<h3>2.2.1: July 21 2000</h3> 532<ul> 533 <li>a purely bug fixes release</li> 534 <li>fixed an encoding support problem when parsing from a memory block</li> 535 <li>fixed a DOCTYPE parsing problem</li> 536 <li>removed a bug in the function allowing to override the memory allocation 537 routines</li> 538</ul> 539 540<h3>2.2.0: July 14 2000</h3> 541<ul> 542 <li>applied a lot of portability fixes</li> 543 <li>better encoding support/cleanup and saving (content is now always 544 encoded in UTF-8)</li> 545 <li>the HTML parser now correctly handles encodings</li> 546 <li>added xmlHasProp()</li> 547 <li>fixed a serious problem with &#38;</li> 548 <li>propagated the fix to FTP client</li> 549 <li>cleanup, bugfixes, etc ...</li> 550 <li>Added a page about <a href="encoding.html">libxml Internationalization 551 support</a></li> 552</ul> 553 554<h3>1.8.9: July 9 2000</h3> 555<ul> 556 <li>fixed the spec the RPMs should be better</li> 557 <li>fixed a serious bug in the FTP implementation, released 1.8.9 to solve 558 rpmfind users problem</li> 559</ul> 560 561<h3>2.1.1: July 1 2000</h3> 562<ul> 563 <li>fixes a couple of bugs in the 2.1.0 packaging</li> 564 <li>improvements on the HTML parser</li> 565</ul> 566 567<h3>2.1.0 and 1.8.8: June 29 2000</h3> 568<ul> 569 <li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to 570 <a href="upgrade.html">new instructions</a>. It fixes a nasty problem 571 about &#38; charref parsing</li> 572 <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it 573 also contains numerous fixes and enhancements: 574 <ul> 575 <li>added xmlStopParser() to stop parsing</li> 576 <li>improved a lot parsing speed when there is large CDATA blocs</li> 577 <li>includes XPath patches provided by Picdar Technology</li> 578 <li>tried to fix as much as possible DtD validation and namespace 579 related problems</li> 580 <li>output to a given encoding has been added/tested</li> 581 <li>lot of various fixes</li> 582 </ul> 583 </li> 584</ul> 585 586<h3>2.0.0: Apr 12 2000</h3> 587<ul> 588 <li>First public release of libxml2. If you are using libxml, it's a good 589 idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally 590 scheduled for Apr 3 the relase occured only on Apr 12 due to massive 591 workload.</li> 592 <li>The include are now located under $prefix/include/libxml (instead of 593 $prefix/include/gnome-xml), they also are referenced by 594 <pre>#include <libxml/xxx.h></pre> 595 <p>instead of</p> 596 <pre>#include "xxx.h"</pre> 597 </li> 598 <li>a new URI module for parsing URIs and following strictly RFC 2396</li> 599 <li>the memory allocation routines used by libxml can now be overloaded 600 dynamically by using xmlMemSetup()</li> 601 <li>The previously CVS only tool tester has been renamed 602 <strong>xmllint</strong> and is now installed as part of the libxml2 603 package</li> 604 <li>The I/O interface has been revamped. There is now ways to plug in 605 specific I/O modules, either at the URI scheme detection level using 606 xmlRegisterInputCallbacks() or by passing I/O functions when creating a 607 parser context using xmlCreateIOParserCtxt()</li> 608 <li>there is a C preprocessor macro LIBXML_VERSION providing the version 609 number of the libxml module in use</li> 610 <li>a number of optional features of libxml can now be excluded at configure 611 time (FTP/HTTP/HTML/XPath/Debug)</li> 612</ul> 613 614<h3>2.0.0beta: Mar 14 2000</h3> 615<ul> 616 <li>This is a first Beta release of libxml version 2</li> 617 <li>It's available only from<a href="ftp://xmlsoft.org/">xmlsoft.org 618 FTP</a>, it's packaged as libxml2-2.0.0beta and available as tar and 619 RPMs</li> 620 <li>This version is now the head in the Gnome CVS base, the old one is 621 available under the tag LIB_XML_1_X</li> 622 <li>This includes a very large set of changes. Froma programmatic point of 623 view applications should not have to be modified too much, check the <a 624 href="upgrade.html">upgrade page</a></li> 625 <li>Some interfaces may changes (especially a bit about encoding).</li> 626 <li>the updates includes: 627 <ul> 628 <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly 629 handled now</li> 630 <li>Better handling of entities, especially well formedness checking and 631 proper PEref extensions in external subsets</li> 632 <li>DTD conditional sections</li> 633 <li>Validation now correcly handle entities content</li> 634 <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change 635 structures to accomodate DOM</a></li> 636 </ul> 637 </li> 638 <li>Serious progress were made toward compliance, <a 639 href="conf/result.html">here are the result of the test</a> against the 640 OASIS testsuite (except the japanese tests since I don't support that 641 encoding yet). This URL is rebuilt every couple of hours using the CVS 642 head version.</li> 643</ul> 644 645<h3>1.8.7: Mar 6 2000</h3> 646<ul> 647 <li>This is a bug fix release:</li> 648 <li>It is possible to disable the ignorable blanks heuristic used by 649 libxml-1.x, a new function xmlKeepBlanksDefault(0) will allow this. Note 650 that for adherence to XML spec, this behaviour will be disabled by default 651 in 2.x . The same function will allow to keep compatibility for old 652 code.</li> 653 <li>Blanks in <a> </a> constructs are not ignored anymore, 654 avoiding heuristic is really the Right Way :-\</li> 655 <li>The unchecked use of snprintf which was breaking libxml-1.8.6 656 compilation on some platforms has been fixed</li> 657 <li>nanoftp.c nanohttp.c: Fixed '#' and '?' stripping when processing 658 URIs</li> 659</ul> 660 661<h3>1.8.6: Jan 31 2000</h3> 662<ul> 663 <li>added a nanoFTP transport module, debugged until the new version of <a 664 href="http://rpmfind.net/linux/rpm2html/rpmfind.html">rpmfind</a> can use 665 it without troubles</li> 666</ul> 667 668<h3>1.8.5: Jan 21 2000</h3> 669<ul> 670 <li>adding APIs to parse a well balanced chunk of XML (production <a 671 href="http://www.w3.org/TR/REC-xml#NT-content">[43] content</a> of the XML 672 spec)</li> 673 <li>fixed a hideous bug in xmlGetProp pointed by Rune.Djurhuus@fast.no</li> 674 <li>Jody Goldberg <jgoldberg@home.com> provided another patch trying 675 to solve the zlib checks problems</li> 676 <li>The current state in gnome CVS base is expected to ship as 1.8.5 with 677 gnumeric soon</li> 678</ul> 679 680<h3>1.8.4: Jan 13 2000</h3> 681<ul> 682 <li>bug fixes, reintroduced xmlNewGlobalNs(), fixed xmlNewNs()</li> 683 <li>all exit() call should have been removed from libxml</li> 684 <li>fixed a problem with INCLUDE_WINSOCK on WIN32 platform</li> 685 <li>added newDocFragment()</li> 686</ul> 687 688<h3>1.8.3: Jan 5 2000</h3> 689<ul> 690 <li>a Push interface for the XML and HTML parsers</li> 691 <li>a shell-like interface to the document tree (try tester --shell :-)</li> 692 <li>lots of bug fixes and improvement added over XMas hollidays</li> 693 <li>fixed the DTD parsing code to work with the xhtml DTD</li> 694 <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li> 695 <li>Fixed bugs in xmlNewNs()</li> 696 <li>External entity loading code has been revamped, now it uses 697 xmlLoadExternalEntity(), some fix on entities processing were added</li> 698 <li>cleaned up WIN32 includes of socket stuff</li> 699</ul> 700 701<h3>1.8.2: Dec 21 1999</h3> 702<ul> 703 <li>I got another problem with includes and C++, I hope this issue is fixed 704 for good this time</li> 705 <li>Added a few tree modification functions: xmlReplaceNode, 706 xmlAddPrevSibling, xmlAddNextSibling, xmlNodeSetName and 707 xmlDocSetRootElement</li> 708 <li>Tried to improve the HTML output with help from <a 709 href="mailto:clahey@umich.edu">Chris Lahey</a></li> 710</ul> 711 712<h3>1.8.1: Dec 18 1999</h3> 713<ul> 714 <li>various patches to avoid troubles when using libxml with C++ compilers 715 the "namespace" keyword and C escaping in include files</li> 716 <li>a problem in one of the core macros IS_CHAR was corrected</li> 717 <li>fixed a bug introduced in 1.8.0 breaking default namespace processing, 718 and more specifically the Dia application</li> 719 <li>fixed a posteriori validation (validation after parsing, or by using a 720 Dtd not specified in the original document)</li> 721 <li>fixed a bug in</li> 722</ul> 723 724<h3>1.8.0: Dec 12 1999</h3> 725<ul> 726 <li>cleanup, especially memory wise</li> 727 <li>the parser should be more reliable, especially the HTML one, it should 728 not crash, whatever the input !</li> 729 <li>Integrated various patches, especially a speedup improvement for large 730 dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>, 731 configure with --with-buffers to enable them.</li> 732 <li>attribute normalization, oops should have been added long ago !</li> 733 <li>attributes defaulted from Dtds should be available, xmlSetProp() now 734 does entities escapting by default.</li> 735</ul> 736 737<h3>1.7.4: Oct 25 1999</h3> 738<ul> 739 <li>Lots of HTML improvement</li> 740 <li>Fixed some errors when saving both XML and HTML</li> 741 <li>More examples, the regression tests should now look clean</li> 742 <li>Fixed a bug with contiguous charref</li> 743</ul> 744 745<h3>1.7.3: Sep 29 1999</h3> 746<ul> 747 <li>portability problems fixed</li> 748 <li>snprintf was used unconditionnally, leading to link problems on system 749 were it's not available, fixed</li> 750</ul> 751 752<h3>1.7.1: Sep 24 1999</h3> 753<ul> 754 <li>The basic type for strings manipulated by libxml has been renamed in 755 1.7.1 from <strong>CHAR</strong> to <strong>xmlChar</strong>. The reason 756 is that CHAR was conflicting with a predefined type on Windows. However on 757 non WIN32 environment, compatibility is provided by the way of a 758 <strong>#define </strong>.</li> 759 <li>Changed another error : the use of a structure field called errno, and 760 leading to troubles on platforms where it's a macro</li> 761</ul> 762 763<h3>1.7.0: sep 23 1999</h3> 764<ul> 765 <li>Added the ability to fetch remote DTD or parsed entities, see the <a 766 href="html/libxml-nanohttp.html">nanohttp</a> module.</li> 767 <li>Added an errno to report errors by another mean than a simple printf 768 like callback</li> 769 <li>Finished ID/IDREF support and checking when validation</li> 770 <li>Serious memory leaks fixed (there is now a <a 771 href="html/libxml-xmlmemory.html">memory wrapper</a> module)</li> 772 <li>Improvement of <a href="http://www.w3.org/TR/xpath">XPath</a> 773 implementation</li> 774 <li>Added an HTML parser front-end</li> 775</ul> 776 777<h2><a name="XML">XML</a></h2> 778 779<p><a href="http://www.w3.org/TR/REC-xml">XML is a standard</a> for 780markup-based structured documents. Here is <a name="example">an example XML 781document</a>:</p> 782<pre><?xml version="1.0"?> 783<EXAMPLE prop1="gnome is great" prop2="&amp; linux too"> 784 <head> 785 <title>Welcome to Gnome</title> 786 </head> 787 <chapter> 788 <title>The Linux adventure</title> 789 <p>bla bla bla ...</p> 790 <image href="linus.gif"/> 791 <p>...</p> 792 </chapter> 793</EXAMPLE></pre> 794 795<p>The first line specifies that it's an XML document and gives useful 796information about its encoding. Then the document is a text format whose 797structure is specified by tags between brackets. <strong>Each tag opened has 798to be closed</strong>. XML is pedantic about this. However, if a tag is empty 799(no content), a single tag can serve as both the opening and closing tag if it 800ends with <code>/></code> rather than with <code>></code>. Note that, 801for example, the image tag has no content (just an attribute) and is closed by 802ending the tag with <code>/></code>.</p> 803 804<p>XML can be applied sucessfully to a wide range of uses, from long term 805structured document maintenance (where it follows the steps of SGML) to simple 806data encoding mechanisms like configuration file formatting (glade), 807spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where 808it is used to encode remote calls between a client and a server.</p> 809 810<h2><a name="XSLT">XSLT</a></h2> 811 812<p>Check <a href="http://xmlsoft.org/XSLT">the separate libxslt page</a></p> 813 814<p><a href="http://www.w3.org/TR/xslt">XSL Transformations</a>, is a language 815for transforming XML documents into other XML documents (or HTML/textual 816output).</p> 817 818<p>A separate library called libxslt is being built on top of libxml2. This 819module "libxslt" can be found in the Gnome CVS base too.</p> 820 821<p>You can check the <a 822href="http://cvs.gnome.org/lxr/source/libxslt/FEATURES">features</a> supported 823and the progresses on the <a 824href="http://cvs.gnome.org/lxr/source/libxslt/ChangeLog">Changelog</a></p> 825 826<h2>An overview of libxml architecture</h2> 827 828<p>Libxml is made of multiple components; some of them are optional, and most 829of the block interfaces are public. The main components are:</p> 830<ul> 831 <li>an Input/Output layer</li> 832 <li>FTP and HTTP client layers (optional)</li> 833 <li>an Internationalization layer managing the encodings support</li> 834 <li>a URI module</li> 835 <li>the XML parser and its basic SAX interface</li> 836 <li>an HTML parser using the same SAX interface (optional)</li> 837 <li>a SAX tree module to build an in-memory DOM representation</li> 838 <li>a tree module to manipulate the DOM representation</li> 839 <li>a validation module using the DOM representation (optional)</li> 840 <li>an XPath module for global lookup in a DOM representation 841 (optional)</li> 842 <li>a debug module (optional)</li> 843</ul> 844 845<p>Graphically this gives the following:</p> 846 847<p><img src="libxml.gif" alt="a graphical view of the various"></p> 848 849<p></p> 850 851<h2><a name="tree">The tree output</a></h2> 852 853<p>The parser returns a tree built during the document analysis. The value 854returned is an <strong>xmlDocPtr</strong> (i.e., a pointer to an 855<strong>xmlDoc</strong> structure). This structure contains information such 856as the file name, the document type, and a <strong>children</strong> pointer 857which is the root of the document (or more exactly the first child under the 858root which is the document). The tree is made of <strong>xmlNode</strong>s, 859chained in double-linked lists of siblings and with a children<->parent 860relationship. An xmlNode can also carry properties (a chain of xmlAttr 861structures). An attribute may have a value which is a list of TEXT or 862ENTITY_REF nodes.</p> 863 864<p>Here is an example (erroneous with respect to the XML spec since there 865should be only one ELEMENT under the root):</p> 866 867<p><img src="structure.gif" alt=" structure.gif "></p> 868 869<p>In the source package there is a small program (not installed by default) 870called <strong>xmllint</strong> which parses XML files given as argument and 871prints them back as parsed. This is useful for detecting errors both in XML 872code and in the XML parser itself. It has an option <strong>--debug</strong> 873which prints the actual in-memory structure of the document; here is the 874result with the <a href="#example">example</a> given before:</p> 875<pre>DOCUMENT 876version=1.0 877standalone=true 878 ELEMENT EXAMPLE 879 ATTRIBUTE prop1 880 TEXT 881 content=gnome is great 882 ATTRIBUTE prop2 883 ENTITY_REF 884 TEXT 885 content= linux too 886 ELEMENT head 887 ELEMENT title 888 TEXT 889 content=Welcome to Gnome 890 ELEMENT chapter 891 ELEMENT title 892 TEXT 893 content=The Linux adventure 894 ELEMENT p 895 TEXT 896 content=bla bla bla ... 897 ELEMENT image 898 ATTRIBUTE href 899 TEXT 900 content=linus.gif 901 ELEMENT p 902 TEXT 903 content=...</pre> 904 905<p>This should be useful for learning the internal representation model.</p> 906 907<h2><a name="interface">The SAX interface</a></h2> 908 909<p>Sometimes the DOM tree output is just too large to fit reasonably into 910memory. In that case (and if you don't expect to save back the XML document 911loaded using libxml), it's better to use the SAX interface of libxml. SAX is a 912<strong>callback-based interface</strong> to the parser. Before parsing, the 913application layer registers a customized set of callbacks which are called by 914the library as it progresses through the XML input.</p> 915 916<p>To get more detailed step-by-step guidance on using the SAX interface of 917libxml, see the <a 918href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">nice 919documentation</a>.written by <a href="mailto:james@daa.com.au">James 920Henstridge</a>.</p> 921 922<p>You can debug the SAX behaviour by using the <strong>testSAX</strong> 923program located in the gnome-xml module (it's usually not shipped in the 924binary packages of libxml, but you can find it in the tar source 925distribution). Here is the sequence of callbacks that would be reported by 926testSAX when parsing the example XML document shown earlier:</p> 927<pre>SAX.setDocumentLocator() 928SAX.startDocument() 929SAX.getEntity(amp) 930SAX.startElement(EXAMPLE, prop1='gnome is great', prop2='&amp; linux too') 931SAX.characters( , 3) 932SAX.startElement(head) 933SAX.characters( , 4) 934SAX.startElement(title) 935SAX.characters(Welcome to Gnome, 16) 936SAX.endElement(title) 937SAX.characters( , 3) 938SAX.endElement(head) 939SAX.characters( , 3) 940SAX.startElement(chapter) 941SAX.characters( , 4) 942SAX.startElement(title) 943SAX.characters(The Linux adventure, 19) 944SAX.endElement(title) 945SAX.characters( , 4) 946SAX.startElement(p) 947SAX.characters(bla bla bla ..., 15) 948SAX.endElement(p) 949SAX.characters( , 4) 950SAX.startElement(image, href='linus.gif') 951SAX.endElement(image) 952SAX.characters( , 4) 953SAX.startElement(p) 954SAX.characters(..., 3) 955SAX.endElement(p) 956SAX.characters( , 3) 957SAX.endElement(chapter) 958SAX.characters( , 1) 959SAX.endElement(EXAMPLE) 960SAX.endDocument()</pre> 961 962<p>Most of the other interfaces of libxml are based on the DOM tree-building 963facility, so nearly everything up to the end of this document presupposes the 964use of the standard DOM tree build. Note that the DOM tree itself is built by 965a set of registered default callbacks, without internal specific 966interface.</p> 967 968<h2><a name="library">The XML library interfaces</a></h2> 969 970<p>This section is directly intended to help programmers getting bootstrapped 971using the XML library from the C language. It is not intended to be extensive. 972I hope the automatically generated documents will provide the completeness 973required, but as a separate set of documents. The interfaces of the XML 974library are by principle low level, there is nearly zero abstraction. Those 975interested in a higher level API should <a href="#DOM">look at DOM</a>.</p> 976 977<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are 978separated from the <a href="html/libxml-htmlparser.html">HTML parser 979interfaces</a>. Let's have a look at how the XML parser can be called:</p> 980 981<h3><a name="Invoking">Invoking the parser : the pull method</a></h3> 982 983<p>Usually, the first thing to do is to read an XML input. The parser accepts 984documents either from in-memory strings or from files. The functions are 985defined in "parser.h":</p> 986<dl> 987 <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt> 988 <dd><p>Parse a null-terminated string containing the document.</p> 989 </dd> 990</dl> 991<dl> 992 <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt> 993 <dd><p>Parse an XML document contained in a (possibly compressed) 994 file.</p> 995 </dd> 996</dl> 997 998<p>The parser returns a pointer to the document structure (or NULL in case of 999failure).</p> 1000 1001<h3 id="Invoking1">Invoking the parser: the push method</h3> 1002 1003<p>In order for the application to keep the control when the document is being 1004fetched (which is common for GUI based programs) libxml provides a push 1005interface, too, as of version 1.8.3. Here are the interface functions:</p> 1006<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax, 1007 void *user_data, 1008 const char *chunk, 1009 int size, 1010 const char *filename); 1011int xmlParseChunk (xmlParserCtxtPtr ctxt, 1012 const char *chunk, 1013 int size, 1014 int terminate);</pre> 1015 1016<p>and here is a simple example showing how to use the interface:</p> 1017<pre> FILE *f; 1018 1019 f = fopen(filename, "r"); 1020 if (f != NULL) { 1021 int res, size = 1024; 1022 char chars[1024]; 1023 xmlParserCtxtPtr ctxt; 1024 1025 res = fread(chars, 1, 4, f); 1026 if (res > 0) { 1027 ctxt = xmlCreatePushParserCtxt(NULL, NULL, 1028 chars, res, filename); 1029 while ((res = fread(chars, 1, size, f)) > 0) { 1030 xmlParseChunk(ctxt, chars, res, 0); 1031 } 1032 xmlParseChunk(ctxt, chars, 0, 1); 1033 doc = ctxt->myDoc; 1034 xmlFreeParserCtxt(ctxt); 1035 } 1036 }</pre> 1037 1038<p>The HTML parser embedded into libxml also has a push interface; the 1039functions are just prefixed by "html" rather than "xml".</p> 1040 1041<h3 id="Invoking2">Invoking the parser: the SAX interface</h3> 1042 1043<p>The tree-building interface makes the parser memory-hungry, first loading 1044the document in memory and then building the tree itself. Reading a document 1045without building the tree is possible using the SAX interfaces (see SAX.h and 1046<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James 1047Henstridge's documentation</a>). Note also that the push interface can be 1048limited to SAX: just use the two first arguments of 1049<code>xmlCreatePushParserCtxt()</code>.</p> 1050 1051<h3><a name="Building">Building a tree from scratch</a></h3> 1052 1053<p>The other way to get an XML tree in memory is by building it. Basically 1054there is a set of functions dedicated to building new elements. (These are 1055also described in <libxml/tree.h>.) For example, here is a piece of code 1056that produces the XML document used in the previous examples:</p> 1057<pre> #include <libxml/tree.h> 1058 xmlDocPtr doc; 1059 xmlNodePtr tree, subtree; 1060 1061 doc = xmlNewDoc("1.0"); 1062 doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL); 1063 xmlSetProp(doc->children, "prop1", "gnome is great"); 1064 xmlSetProp(doc->children, "prop2", "& linux too"); 1065 tree = xmlNewChild(doc->children, NULL, "head", NULL); 1066 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome"); 1067 tree = xmlNewChild(doc->children, NULL, "chapter", NULL); 1068 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure"); 1069 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ..."); 1070 subtree = xmlNewChild(tree, NULL, "image", NULL); 1071 xmlSetProp(subtree, "href", "linus.gif");</pre> 1072 1073<p>Not really rocket science ...</p> 1074 1075<h3><a name="Traversing">Traversing the tree</a></h3> 1076 1077<p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your 1078code has access to the internal structure of all the elements of the tree. The 1079names should be somewhat simple like <strong>parent</strong>, 1080<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>, 1081<strong>properties</strong>, etc... For example, still with the previous 1082example:</p> 1083<pre><code>doc->children->children->children</code></pre> 1084 1085<p>points to the title element,</p> 1086<pre>doc->children->children->next->children->children</pre> 1087 1088<p>points to the text node containing the chapter title "The Linux 1089adventure".</p> 1090 1091<p><strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be 1092present before the document root, so <code>doc->children</code> may point 1093to an element which is not the document Root Element; a function 1094<code>xmlDocGetRootElement()</code> was added for this purpose.</p> 1095 1096<h3><a name="Modifying">Modifying the tree</a></h3> 1097 1098<p>Functions are provided for reading and writing the document content. Here 1099is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p> 1100<dl> 1101 <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const 1102 xmlChar *value);</code></dt> 1103 <dd><p>This sets (or changes) an attribute carried by an ELEMENT node. The 1104 value can be NULL.</p> 1105 </dd> 1106</dl> 1107<dl> 1108 <dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar 1109 *name);</code></dt> 1110 <dd><p>This function returns a pointer to new copy of the property 1111 content. Note that the user must deallocate the result.</p> 1112 </dd> 1113</dl> 1114 1115<p>Two functions are provided for reading and writing the text associated with 1116elements:</p> 1117<dl> 1118 <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar 1119 *value);</code></dt> 1120 <dd><p>This function takes an "external" string and converts it to one 1121 text node or possibly to a list of entity and text nodes. All 1122 non-predefined entity references like &Gnome; will be stored 1123 internally as entity nodes, hence the result of the function may not be 1124 a single node.</p> 1125 </dd> 1126</dl> 1127<dl> 1128 <dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int 1129 inLine);</code></dt> 1130 <dd><p>This function is the inverse of 1131 <code>xmlStringGetNodeList()</code>. It generates a new string 1132 containing the content of the text and entity nodes. Note the extra 1133 argument inLine. If this argument is set to 1, the function will expand 1134 entity references. For example, instead of returning the &Gnome; 1135 XML encoding in the string, it will substitute it with its value (say, 1136 "GNU Network Object Model Environment").</p> 1137 </dd> 1138</dl> 1139 1140<h3><a name="Saving">Saving a tree</a></h3> 1141 1142<p>Basically 3 options are possible:</p> 1143<dl> 1144 <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int 1145 *size);</code></dt> 1146 <dd><p>Returns a buffer into which the document has been saved.</p> 1147 </dd> 1148</dl> 1149<dl> 1150 <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt> 1151 <dd><p>Dumps a document to an open file descriptor.</p> 1152 </dd> 1153</dl> 1154<dl> 1155 <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt> 1156 <dd><p>Saves the document to a file. In this case, the compression 1157 interface is triggered if it has been turned on.</p> 1158 </dd> 1159</dl> 1160 1161<h3><a name="Compressio">Compression</a></h3> 1162 1163<p>The library transparently handles compression when doing file-based 1164accesses. The level of compression on saves can be turned on either globally 1165or individually for one file:</p> 1166<dl> 1167 <dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt> 1168 <dd><p>Gets the document compression ratio (0-9).</p> 1169 </dd> 1170</dl> 1171<dl> 1172 <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt> 1173 <dd><p>Sets the document compression ratio.</p> 1174 </dd> 1175</dl> 1176<dl> 1177 <dt><code>int xmlGetCompressMode(void);</code></dt> 1178 <dd><p>Gets the default compression ratio.</p> 1179 </dd> 1180</dl> 1181<dl> 1182 <dt><code>void xmlSetCompressMode(int mode);</code></dt> 1183 <dd><p>Sets the default compression ratio.</p> 1184 </dd> 1185</dl> 1186 1187<h2><a name="Entities">Entities or no entities</a></h2> 1188 1189<p>Entities in principle are similar to simple C macros. An entity defines an 1190abbreviation for a given string that you can reuse many times throughout the 1191content of your document. Entities are especially useful when a given string 1192may occur frequently within a document, or to confine the change needed to a 1193document to a restricted area in the internal subset of the document (at the 1194beginning). Example:</p> 1195<pre>1 <?xml version="1.0"?> 11962 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [ 11973 <!ENTITY xml "Extensible Markup Language"> 11984 ]> 11995 <EXAMPLE> 12006 &xml; 12017 </EXAMPLE></pre> 1202 1203<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing 1204its name with '&' and following it by ';' without any spaces added. There 1205are 5 predefined entities in libxml allowing you to escape charaters with 1206predefined meaning in some parts of the xml document content: 1207<strong>&lt;</strong> for the character '<', <strong>&gt;</strong> 1208for the character '>', <strong>&apos;</strong> for the character ''', 1209<strong>&quot;</strong> for the character '"', and 1210<strong>&amp;</strong> for the character '&'.</p> 1211 1212<p>One of the problems related to entities is that you may want the parser to 1213substitute an entity's content so that you can see the replacement text in 1214your application. Or you may prefer to keep entity references as such in the 1215content to be able to save the document back without losing this usually 1216precious information (if the user went through the pain of explicitly defining 1217entities, he may have a a rather negative attitude if you blindly susbtitute 1218them as saving time). The <a 1219href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> 1220function allows you to check and change the behaviour, which is to not 1221substitute entities by default.</p> 1222 1223<p>Here is the DOM tree built by libxml for the previous document in the 1224default case:</p> 1225<pre>/gnome/src/gnome-xml -> /xmllint --debug test/ent1 1226DOCUMENT 1227version=1.0 1228 ELEMENT EXAMPLE 1229 TEXT 1230 content= 1231 ENTITY_REF 1232 INTERNAL_GENERAL_ENTITY xml 1233 content=Extensible Markup Language 1234 TEXT 1235 content=</pre> 1236 1237<p>And here is the result when substituting entities:</p> 1238<pre>/gnome/src/gnome-xml -> /tester --debug --noent test/ent1 1239DOCUMENT 1240version=1.0 1241 ELEMENT EXAMPLE 1242 TEXT 1243 content= Extensible Markup Language</pre> 1244 1245<p>So, entities or no entities? Basically, it depends on your use case. I 1246suggest that you keep the non-substituting default behaviour and avoid using 1247entities in your XML document or data if you are not willing to handle the 1248entity references elements in the DOM tree.</p> 1249 1250<p>Note that at save time libxml enforces the conversion of the predefined 1251entities where necessary to prevent well-formedness problems, and will also 1252transparently replace those with chars (i.e. it will not generate entity 1253reference elements in the DOM tree or call the reference() SAX callback when 1254finding them in the input).</p> 1255 1256<p><span style="background-color: #FF0000">WARNING</span>: handling entities 1257on top of the libxml SAX interface is difficult!!! If you plan to use 1258non-predefined entities in your documents, then the learning cuvre to handle 1259then using the SAX API may be long. If you plan to use complex documents, I 1260strongly suggest you consider using the DOM interface instead and let libxml 1261deal with the complexity rather than trying to do it yourself.</p> 1262 1263<h2><a name="Namespaces">Namespaces</a></h2> 1264 1265<p>The libxml library implements <a 1266href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by 1267recognizing namespace contructs in the input, and does namespace lookup 1268automatically when building the DOM tree. A namespace declaration is 1269associated with an in-memory structure and all elements or attributes within 1270that namespace point to it. Hence testing the namespace is a simple and fast 1271equality operation at the user level.</p> 1272 1273<p>I suggest that people using libxml use a namespace, and declare it in the 1274root element of their document as the default namespace. Then they don't need 1275to use the prefix in the content but we will have a basis for future semantic 1276refinement and merging of data from different sources. This doesn't increase 1277the size of the XML output significantly, but significantly increases its 1278value in the long-term. Example:</p> 1279<pre><mydoc xmlns="http://mydoc.example.org/schemas/"> 1280 <elem1>...</elem1> 1281 <elem2>...</elem2> 1282</mydoc></pre> 1283 1284<p>The namespace value has to be an absolute URL, but the URL doesn't have to 1285point to any existing resource on the Web. It will bind all the element and 1286atributes with that URL. I suggest to use an URL within a domain you control, 1287and that the URL should contain some kind of version information if possible. 1288For example, <code>"http://www.gnome.org/gnumeric/1.0/"</code> is a good 1289namespace scheme.</p> 1290 1291<p>Then when you load a file, make sure that a namespace carrying the 1292version-independent prefix is installed on the root element of your document, 1293and if the version information don't match something you know, warn the user 1294and be liberal in what you accept as the input. Also do *not* try to base 1295namespace checking on the prefix value. <foo:text> may be exactly the 1296same as <bar:text> in another document. What really matters is the URI 1297associated with the element or the attribute, not the prefix string (which is 1298just a shortcut for the full URI). In libxml, element and attributes have an 1299<code>ns</code> field pointing to an xmlNs structure detailing the namespace 1300prefix and its URI.</p> 1301 1302<p>@@Interfaces@@</p> 1303 1304<p>@@Examples@@</p> 1305 1306<p>Usually people object to using namespaces together with validity checking. 1307I will try to make sure that using namespaces won't break validity checking, 1308so even if you plan to use or currently are using validation I strongly 1309suggest adding namespaces to your document. A default namespace scheme 1310<code>xmlns="http://...."</code> should not break validity even on less 1311flexible parsers. Using namespaces to mix and differentiate content coming 1312from multiple DTDs will certainly break current validation schemes. I will try 1313to provide ways to do this, but this may not be portable or standardized.</p> 1314 1315<h2><a name="Validation">Validation, or are you afraid of DTDs ?</a></h2> 1316 1317<p>Well what is validation and what is a DTD ?</p> 1318 1319<p>Validation is the process of checking a document against a set of 1320construction rules; a <strong>DTD</strong> (Document Type Definition) is such 1321a set of rules.</p> 1322 1323<p>The validation process and building DTDs are the two most difficult parts 1324of the XML life cycle. Briefly a DTD defines all the possibles element to be 1325found within your document, what is the formal shape of your document tree (by 1326defining the allowed content of an element, either text, a regular expression 1327for the allowed list of children, or mixed content i.e. both text and 1328children). The DTD also defines the allowed attributes for all elements and 1329the types of the attributes. For more detailed information, I suggest that you 1330read the related parts of the XML specification, the examples found under 1331gnome-xml/test/valid/dtd and any of the large number of books available on 1332XML. The dia example in gnome-xml/test/valid should be both simple and 1333complete enough to allow you to build your own.</p> 1334 1335<p>A word of warning, building a good DTD which will fit the needs of your 1336application in the long-term is far from trivial; however, the extra level of 1337quality it can ensure is well worth the price for some sets of applications or 1338if you already have already a DTD defined for your application field.</p> 1339 1340<p>The validation is not completely finished but in a (very IMHO) usable 1341state. Until a real validation interface is defined the way to do it is to 1342define and set the <strong>xmlDoValidityCheckingDefaultValue</strong> external 1343variable to 1, this will of course be changed at some point:</p> 1344 1345<p>extern int xmlDoValidityCheckingDefaultValue;</p> 1346 1347<p>...</p> 1348 1349<p>xmlDoValidityCheckingDefaultValue = 1;</p> 1350 1351<p></p> 1352 1353<p>To handle external entities, use the function 1354<strong>xmlSetExternalEntityLoader</strong>(xmlExternalEntityLoader f); to 1355link in you HTTP/FTP/Entities database library to the standard libxml 1356core.</p> 1357 1358<p>@@interfaces@@</p> 1359 1360<h2><a name="DOM"></a><a name="Principles">DOM Principles</a></h2> 1361 1362<p><a href="http://www.w3.org/DOM/">DOM</a> stands for the <em>Document Object 1363Model</em>; this is an API for accessing XML or HTML structured documents. 1364Native support for DOM in Gnome is on the way (module gnome-dom), and will be 1365based on gnome-xml. This will be a far cleaner interface to manipulate XML 1366files within Gnome since it won't expose the internal structure.</p> 1367 1368<p>The current DOM implementation on top of libxml is the <a 1369href="http://cvs.gnome.org/lxr/source/gdome2/">gdome2 Gnome module</a>, this 1370is a full DOM interface, thanks to Paolo Casarini, check the <a 1371href="http://www.cs.unibo.it/~casarini/gdome2/">Gdome2 homepage</a> for more 1372informations.</p> 1373 1374<h2><a name="Example"></a><a name="real">A real example</a></h2> 1375 1376<p>Here is a real size example, where the actual content of the application 1377data is not kept in the DOM tree but uses internal structures. It is based on 1378a proposal to keep a database of jobs related to Gnome, with an XML based 1379storage structure. Here is an <a href="gjobs.xml">XML encoded jobs 1380base</a>:</p> 1381<pre><?xml version="1.0"?> 1382<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location"> 1383 <gjob:Jobs> 1384 1385 <gjob:Job> 1386 <gjob:Project ID="3"/> 1387 <gjob:Application>GBackup</gjob:Application> 1388 <gjob:Category>Development</gjob:Category> 1389 1390 <gjob:Update> 1391 <gjob:Status>Open</gjob:Status> 1392 <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified> 1393 <gjob:Salary>USD 0.00</gjob:Salary> 1394 </gjob:Update> 1395 1396 <gjob:Developers> 1397 <gjob:Developer> 1398 </gjob:Developer> 1399 </gjob:Developers> 1400 1401 <gjob:Contact> 1402 <gjob:Person>Nathan Clemons</gjob:Person> 1403 <gjob:Email>nathan@windsofstorm.net</gjob:Email> 1404 <gjob:Company> 1405 </gjob:Company> 1406 <gjob:Organisation> 1407 </gjob:Organisation> 1408 <gjob:Webpage> 1409 </gjob:Webpage> 1410 <gjob:Snailmail> 1411 </gjob:Snailmail> 1412 <gjob:Phone> 1413 </gjob:Phone> 1414 </gjob:Contact> 1415 1416 <gjob:Requirements> 1417 The program should be released as free software, under the GPL. 1418 </gjob:Requirements> 1419 1420 <gjob:Skills> 1421 </gjob:Skills> 1422 1423 <gjob:Details> 1424 A GNOME based system that will allow a superuser to configure 1425 compressed and uncompressed files and/or file systems to be backed 1426 up with a supported media in the system. This should be able to 1427 perform via find commands generating a list of files that are passed 1428 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 1429 or via operations performed on the filesystem itself. Email 1430 notification and GUI status display very important. 1431 </gjob:Details> 1432 1433 </gjob:Job> 1434 1435 </gjob:Jobs> 1436</gjob:Helping></pre> 1437 1438<p>While loading the XML file into an internal DOM tree is a matter of calling 1439only a couple of functions, browsing the tree to gather the ata and generate 1440the internal structures is harder, and more error prone.</p> 1441 1442<p>The suggested principle is to be tolerant with respect to the input 1443structure. For example, the ordering of the attributes is not significant, the 1444XML specification is clear about it. It's also usually a good idea not to 1445depend on the order of the children of a given node, unless it really makes 1446things harder. Here is some code to parse the information for a person:</p> 1447<pre>/* 1448 * A person record 1449 */ 1450typedef struct person { 1451 char *name; 1452 char *email; 1453 char *company; 1454 char *organisation; 1455 char *smail; 1456 char *webPage; 1457 char *phone; 1458} person, *personPtr; 1459 1460/* 1461 * And the code needed to parse it 1462 */ 1463personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 1464 personPtr ret = NULL; 1465 1466DEBUG("parsePerson\n"); 1467 /* 1468 * allocate the struct 1469 */ 1470 ret = (personPtr) malloc(sizeof(person)); 1471 if (ret == NULL) { 1472 fprintf(stderr,"out of memory\n"); 1473 return(NULL); 1474 } 1475 memset(ret, 0, sizeof(person)); 1476 1477 /* We don't care what the top level element name is */ 1478 cur = cur->xmlChildrenNode; 1479 while (cur != NULL) { 1480 if ((!strcmp(cur->name, "Person")) && (cur->ns == ns)) 1481 ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1482 if ((!strcmp(cur->name, "Email")) && (cur->ns == ns)) 1483 ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1484 cur = cur->next; 1485 } 1486 1487 return(ret); 1488}</pre> 1489 1490<p>Here are a couple of things to notice:</p> 1491<ul> 1492 <li>Usually a recursive parsing style is the more convenient one: XML data 1493 is by nature subject to repetitive constructs and usually exibits highly 1494 stuctured patterns.</li> 1495 <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, i.e. 1496 the pointer to the global XML document and the namespace reserved to the 1497 application. Document wide information are needed for example to decode 1498 entities and it's a good coding practice to define a namespace for your 1499 application set of data and test that the element and attributes you're 1500 analyzing actually pertains to your application space. This is done by a 1501 simple equality test (cur->ns == ns).</li> 1502 <li>To retrieve text and attributes value, you can use the function 1503 <em>xmlNodeListGetString</em> to gather all the text and entity reference 1504 nodes generated by the DOM output and produce an single text string.</li> 1505</ul> 1506 1507<p>Here is another piece of code used to parse another level of the 1508structure:</p> 1509<pre>#include <libxml/tree.h> 1510/* 1511 * a Description for a Job 1512 */ 1513typedef struct job { 1514 char *projectID; 1515 char *application; 1516 char *category; 1517 personPtr contact; 1518 int nbDevelopers; 1519 personPtr developers[100]; /* using dynamic alloc is left as an exercise */ 1520} job, *jobPtr; 1521 1522/* 1523 * And the code needed to parse it 1524 */ 1525jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 1526 jobPtr ret = NULL; 1527 1528DEBUG("parseJob\n"); 1529 /* 1530 * allocate the struct 1531 */ 1532 ret = (jobPtr) malloc(sizeof(job)); 1533 if (ret == NULL) { 1534 fprintf(stderr,"out of memory\n"); 1535 return(NULL); 1536 } 1537 memset(ret, 0, sizeof(job)); 1538 1539 /* We don't care what the top level element name is */ 1540 cur = cur->xmlChildrenNode; 1541 while (cur != NULL) { 1542 1543 if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) { 1544 ret->projectID = xmlGetProp(cur, "ID"); 1545 if (ret->projectID == NULL) { 1546 fprintf(stderr, "Project has no ID\n"); 1547 } 1548 } 1549 if ((!strcmp(cur->name, "Application")) && (cur->ns == ns)) 1550 ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1551 if ((!strcmp(cur->name, "Category")) && (cur->ns == ns)) 1552 ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 1553 if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns)) 1554 ret->contact = parsePerson(doc, ns, cur); 1555 cur = cur->next; 1556 } 1557 1558 return(ret); 1559}</pre> 1560 1561<p>Once you are used to it, writing this kind of code is quite simple, but 1562boring. Ultimately, it could be possble to write stubbers taking either C data 1563structure definitions, a set of XML examples or an XML DTD and produce the 1564code needed to import and export the content between C data and XML storage. 1565This is left as an exercise to the reader :-)</p> 1566 1567<p>Feel free to use <a href="example/gjobread.c">the code for the full C 1568parsing example</a> as a template, it is also available with Makefile in the 1569Gnome CVS base under gnome-xml/example</p> 1570 1571<h2><a name="Contributi">Contributions</a></h2> 1572<ul> 1573 <li><a href="mailto:ari@lusis.org">Ari Johnson</a> provides a C++ wrapper 1574 for libxml: 1575 <p>Website: <a 1576 href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p> 1577 <p>Download: <a 1578 href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p> 1579 </li> 1580 <li><a href="mailto:doolin@cs.utk.edu">David Doolin</a> provides a 1581 precompiled Windows version 1582 <p><a 1583 href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a> 1584 (older). The distribution now includes projects and makefiles for Windows 1585 compiler contributed by various people.</p> 1586 </li> 1587 <li><a 1588 href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt 1589 Sergeant</a> developped <a 1590 href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for 1591 libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML 1592 application server</a></li> 1593 <li><a href="mailto:fnatter@gmx.net">Felix Natter</a> and <a 1594 href="mailto:geertk@ai.rug.nl">Geert Kloosterman</a> provide <a 1595 href="libxml-doc.el">an emacs module</a> to lookup libxml(2) functions 1596 documentation</li> 1597 <li><a href="mailto:sherwin@nlm.nih.gov">Ziying Sherwin</a> provided <a 1598 href="http://xmlsoft.org/messages/0488.html">man pages</a></li> 1599 <li>Seems <a href="http://www.arsdigita.com/">ArsDigita</a> wrote a Tcl 1600 wrapper for libxml called <a 1601 href="http://www.linuxgazette.com/issue63/washington.html">ns_xml</a></li> 1602</ul> 1603 1604<p></p> 1605 1606<p><a href="mailto:Daniel.Veillard@imag.fr">Daniel Veillard</a></p> 1607 1608<p>$Id: xml.html,v 1.96 2001/06/26 23:07:32 veillard Exp $</p> 1609</body> 1610</html> 1611