xml.html revision b8cfbd12680cbd28c9eaafea2642b8f1cbd52a48
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "http://www.w3.org/TR/html4/loose.dtd"> 3<html> 4<head> 5 <title>The XML C library for Gnome</title> 6 <meta name="GENERATOR" content="amaya V5.0"> 7 <meta http-equiv="Content-Type" content="text/html"> 8</head> 9 10<body bgcolor="#ffffff"> 11<h1 align="center">The XML C library for Gnome</h1> 12 13<h1 style="text-align: center">libxml, a.k.a. gnome-xml</h1> 14 15<p></p> 16<ul> 17 <li><a href="#Introducti">Introduction</a></li> 18 <li><a href="#Documentat">Documentation</a></li> 19 <li><a href="#Reporting">Reporting bugs and getting help</a></li> 20 <li><a href="#help">how to help</a></li> 21 <li><a href="#Downloads">Downloads</a></li> 22 <li><a href="#News">News</a></li> 23 <li><a href="#XML">XML</a></li> 24 <li><a href="#XSLT">XSLT</a></li> 25 <li><a href="#tree">The tree output</a></li> 26 <li><a href="#interface">The SAX interface</a></li> 27 <li><a href="#library">The XML library interfaces</a></li> 28 <li><a href="#Entities">Entities or no entities</a></li> 29 <li><a href="#Namespaces">Namespaces</a></li> 30 <li><a href="#Validation">Validation</a></li> 31 <li><a href="#Principles">DOM principles</a></li> 32 <li><a href="#real">A real example</a></li> 33 <li><a href="#Contributi">Contributions</a></li> 34</ul> 35 36<p>Separate documents:</p> 37<ul> 38 <li><a href="http://xmlsoft.org/XSLT/">the libxslt page</a></li> 39 <li><a href="http://www.cs.unibo.it/~casarini/gdome2/">the gdome2 page: a 40 standard DOM interface for libxml2</a></li> 41</ul> 42 43<h2><a name="Introducti">Introduction</a></h2> 44 45<p>This document describes libxml, the <a 46href="http://www.w3.org/XML/">XML</a> C library developped for the <a 47href="http://www.gnome.org/">Gnome</a> project. <a 48href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based 49structured documents/data.</p> 50 51<p>Here are some key points about libxml:</p> 52<ul> 53 <li>Libxml exports Push and Pull type parser interfaces for both XML and 54 HTML.</li> 55 <li>Libxml can do DTD validation at parse time, using a parsed document 56 instance, or with an arbitrary DTD.</li> 57 <li>Libxml now includes nearly complete <a 58 href="http://www.w3.org/TR/xpath">XPath</a>, <a 59 href="http://www.w3.org/TR/xptr">XPointer</a> and <a 60 href="http://www.w3.org/TR/xinclude">XInclude</a> implementations.</li> 61 <li>It is written in plain C, making as few assumptions as possible, and 62 sticking closely to ANSI C/POSIX for easy embedding. Works on 63 Linux/Unix/Windows, ported to a number of other platforms.</li> 64 <li>Basic support for HTTP and FTP client allowing aplications to fetch 65 remote resources</li> 66 <li>The design is modular, most of the extensions can be compiled out.</li> 67 <li>The internal document repesentation is as close as possible to the <a 68 href="http://www.w3.org/DOM/">DOM</a> interfaces.</li> 69 <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX 70 like interface</a>; the interface is designed to be compatible with <a 71 href="http://www.jclark.com/xml/expat.html">Expat</a>.</li> 72 <li>This library is released both under the <a 73 href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C 74 IPR</a> and the <a href="http://www.gnu.org/copyleft/lesser.html">GNU 75 LGPL</a>. Use either at your convenience, basically this should make 76 everybody happy, if not, drop me a mail.</li> 77</ul> 78 79<p>Warning: unless you are forced to because your application links with a 80Gnome library requiring it, <strong><span 81style="background-color: #FF0000">Do Not Use libxml1</span></strong>, use 82libxml2</p> 83 84<h2><a name="FAQ">FAQ</a></h2> 85 86<p>Table of Content:</p> 87<ul> 88 <li><a href="FAQ.html#Licence">Licence(s)</a></li> 89 <li><a href="FAQ.html#Installati">Installation</a></li> 90 <li><a href="FAQ.html#Compilatio">Compilation</a></li> 91 <li><a href="FAQ.html#Developer">Developer corner</a></li> 92</ul> 93 94<h3><a name="Licence">Licence</a>(s)</h3> 95<ol> 96 <li><em>Licensing Terms for libxml</em> 97 <p>libxml is released under 2 (compatible) licences:</p> 98 <ul> 99 <li>the <a href="http://www.gnu.org/copyleft/lgpl.html">LGPL</a>: GNU 100 Library General Public License</li> 101 <li>the <a 102 href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C 103 IPR</a>: very similar to the XWindow licence</li> 104 </ul> 105 </li> 106 <li><em>Can I embed libxml in a proprietary application ?</em> 107 <p>Yes. The W3C IPR allows you to also keep proprietary the changes you 108 made to libxml, but it would be graceful to provide back bugfixes and 109 improvements as patches for possible incorporation in the main 110 development tree</p> 111 </li> 112</ol> 113 114<h3><a name="Installati">Installation</a></h3> 115<ol> 116 <li>Unless you are forced to because your application links with a Gnome 117 library requiring it, <strong><span style="background-color: #FF0000">Do 118 Not Use libxml1</span></strong>, use libxml2</li> 119 <li><em>Where can I get libxml</em> 120 ? 121 <p>The original distribution comes from <a 122 href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a 123 href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p> 124 <p>Most linux and Bsd distribution includes libxml, this is probably the 125 safer way for end-users</p> 126 <p>David Doolin provides precompiled Windows versions at <a 127 href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/ ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p> 128 </li> 129 <li><em>I see libxml and libxml2 releases, which one should I install ?</em> 130 <ul> 131 <li>If you are not concerned by any existing backward compatibility 132 with existing application, install libxml2 only</li> 133 <li>If you are not doing development, you can safely install both. 134 usually the packages <a 135 href="http://rpmfind.net/linux/RPM/libxml.html">libxml</a> and <a 136 href="http://rpmfind.net/linux/RPM/libxml2.html">libxml2</a> are 137 compatible (this is not the case for development packages)</li> 138 <li>If you are a developer and your system provides separate packaging 139 for shared libraries and the development components, it is possible 140 to install libxml and libxml2, and also <a 141 href="http://rpmfind.net/linux/RPM/libxml-devel.html">libxml-devel</a> 142 and <a 143 href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml2-devel</a> 144 too for libxml2 >= 2.3.0</li> 145 <li>If you are developing a new application, please develop against 146 libxml2(-devel)</li> 147 </ul> 148 </li> 149 <li><em>I can't install the libxml package it conflicts with libxml0</em> 150 <p>You probably have an old libxml0 package used to provide the shared 151 library for libxml.so.0, you can probably safely remove it. Anyway the 152 libxml packages provided on <a 153 href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> provides 154 libxml.so.0</p> 155 </li> 156 <li><em>I can't install the libxml(2) RPM package due to failed 157 dependancies</em> 158 <p>The most generic solution is to refetch the latest src.rpm , and 159 rebuild it locally with</p> 160 <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p> 161 <p>if everything goes well it will generate two binary rpm (one providing 162 the shared libs and xmllint, and the other one, the -devel package 163 providing includes, static libraries and scripts needed to build 164 applications with libxml(2)) that you can install locally.</p> 165 </li> 166</ol> 167 168<h3><a name="Compilatio">Compilation</a></h3> 169<ol> 170 <li><em>What is the process to compile libxml ?</em> 171 <p>As most UNIX libraries libxml follows the "standard":</p> 172 <p><code>gunzip -c xxx.tar.gz | tar xvf -</code></p> 173 <p><code>cd libxml-xxxx</code></p> 174 <p><code>/configure --help</code></p> 175 <p>to see the options, then the compilation/installation proper</p> 176 <p><code>/configure [possible options]</code></p> 177 <p><code>make</code></p> 178 <p><code>make install</code></p> 179 <p>At that point you may have to rerun ldconfig or similar utility to 180 update your list of installed shared libs.</p> 181 </li> 182 <li><em>What other libraries are needed to compile/install libxml ?</em> 183 <p>Libxml does not requires any other library, the normal C ANSI API 184 should be sufficient (please report any violation to this rule you may 185 find).</p> 186 <p>However if found at configuration time libxml will detect and use the 187 following libs:</p> 188 <ul> 189 <li><a href="http://www.info-zip.org/pub/infozip/zlib/">libz</a> 190 : a highly portable and available widely compression library</li> 191 <li>iconv: a powerful character encoding conversion library. It's 192 included by default on recent glibc libraries, so it doesn't need to 193 be installed specifically on linux. It seems it's now <a 194 href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part 195 of the official UNIX</a> specification. Here is one <a 196 href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation 197 of the library</a> which source can be found <a 198 href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li> 199 </ul> 200 </li> 201 <li><em>libxml does not compile with HP-UX's optional ANSI-C compiler</em> 202 <p>this is due to macro limitations. Try to add " -Wp,-H16800 -Ae" to the 203 CFLAGS</p> 204 <p>you can also install and use gcc instead or use a precompiled version 205 of libxml, both available from the <a 206 href="http://hpux.cae.wisc.edu/hppd/auto/summary_all.html">HP-UX Porting 207 and Archive Centre</a></p> 208 </li> 209 <li><em>make check fails on some platforms</em> 210 <p>Sometime the regression tests results don't completely match the value 211 produced by the parser, and the makefile uses diff to print the delta. On 212 some platforms the diff return breaks the compilation process, if the 213 diff is small this is probably not a serious problem</p> 214 </li> 215 <li><em>I use the CVS version and there is no configure script</em> 216 <p>The configure (and other Makefiles) are generated. Use the autogen.sh 217 script to regenerate the configure and Makefiles, like:</p> 218 <p><code>/autogen.sh --prefix=/usr --disable-shared</code></p> 219 </li> 220 <li><em>I have troubles when running make tests with gcc-3.0</em> 221 <p>It seems the initial release of gcc-3.0 has a problem with the 222 optimizer which miscompiles the URI module. Please use another 223 compiler</p> 224 </li> 225</ol> 226 227<h3><a name="Developer">Developer</a> corner</h3> 228<ol> 229 <li><em>xmlDocDump() generates output on one line</em> 230 <p>libxml will not <strong>invent</strong> spaces in the content of a 231 document since <strong>all spaces in the content of a document are 232 significant</strong>. If you build a tree from the API and want 233 indentation:</p> 234 <ol> 235 <li>the correct way is to generate those yourself too</li> 236 <li>the dangerous way is to ask libxml to add those blanks to your 237 content <strong>modifying the content of your document in the 238 process</strong>. The result may not be what you expect. There is 239 <strong>NO</strong> way to guarantee that such a modification won't 240 impact other part of the content of your document. See <a 241 href="http://xmlsoft.org/html/libxml-parser.html#XMLKEEPBLANKSDEFAULT">xmlKeepBlanksDefault 242 ()</a> and <a 243 href="http://xmlsoft.org/html/libxml-tree.html#XMLSAVEFORMATFILE">xmlSaveFormatFile 244 ()</a></li> 245 </ol> 246 </li> 247 <li>Extra nodes in the document: 248 <p><em>For a XML file as below:</em></p> 249 <pre><?xml version="1.0"?> 250<PLAN xmlns="http://www.argus.ca/autotest/1.0/"> 251<NODE CommFlag="0"/> 252<NODE CommFlag="1"/> 253</PLAN></pre> 254 <p><em>after parsing it with the function 255 pxmlDoc=xmlParseFile(...);</em></p> 256 <p><em>I want to the get the content of the first node (node with the 257 CommFlag="0")</em></p> 258 <p><em>so I did it as following;</em></p> 259 <pre>xmlNodePtr pode; 260pnode=pxmlDoc->children->children;</pre> 261 <p><em>but it does not work. If I change it to</em></p> 262 <pre>pnode=pxmlDoc->children->children->next;</pre> 263 <p><em>then it works. Can someone explain it to me.</em></p> 264 <p></p> 265 <p>In XML all characters in the content of the document are significant 266 <strong>including blanks and formatting line breaks</strong>.</p> 267 <p>The extra nodes you are wondering about are just that, text nodes with 268 the formatting spaces wich are part of the document but that people tend 269 to forget. There is a function <a 270 href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault 271 ()</a> to remove those at parse time, but that's an heuristic, and its 272 use should be limited to case where you are sure there is no 273 mixed-content in the document.</p> 274 </li> 275 <li><em>I get compilation errors of existing code like when accessing 276 <strong>root</strong> or <strong>childs fields</strong> of nodes</em> 277 <p>You are compiling code developed for libxml version 1 and using a 278 libxml2 development environment. Either switch back to libxml v1 devel or 279 even better fix the code to compile with libxml2 (or both) by <a 280 href="upgrade.html">following the instructions</a>.</p> 281 </li> 282 <li><em>I get compilation errors about non existing 283 <strong>xmlRootNode</strong> or <strong>xmlChildrenNode</strong> 284 fields</em> 285 <p>The source code you are using has been <a 286 href="upgrade.html">upgraded</a> to be able to compile with both libxml 287 and libxml2, but you need to install a more recent version: 288 libxml(-devel) >= 1.8.8 or libxml2(-devel) >= 2.1.0</p> 289 </li> 290 <li><em>XPath implementation looks seriously broken</em> 291 <p>XPath implementation prior to 2.3.0 was really incomplete, upgrade to 292 a recent version, the implementation and debug of libxslt generated fixes 293 for most obvious problems.</p> 294 </li> 295 <li><em>The example provided in the web page does not compile</em> 296 <p>It's hard to maintain the documentation in sync with the code 297 <grin/> ...</p> 298 <p>Check the previous points 1/ and 2/ raised before, and send 299 patches.</p> 300 </li> 301 <li><em>Where can I get more examples and informations than in the web 302 page</em> 303 <p>Ideally a libxml book would be nice. I have no such plan ... But you 304 can:</p> 305 <ul> 306 <li>check more deeply the <a href="html/libxml-lib.html">existing 307 generated doc</a></li> 308 <li>looks for examples of use for libxml function using the Gnome code 309 for example the following will query the full Gnome CVs base for the 310 use of the <strong>xmlAddChild()</strong> function: 311 <p><a 312 href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p> 313 <p>This may be slow, a large hardware donation to the gnome project 314 could cure this :-)</p> 315 </li> 316 <li><a 317 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gnome-xml">Browse 318 the libxml source</a> 319 , I try to write code as clean and documented as possible, so 320 looking at it may be helpful</li> 321 </ul> 322 </li> 323 <li>What about C++ ? 324 <p>libxml is written in pure C in order to allow easy reuse on a number 325 of platforms, including embedded systems. I don't intend to convert to 326 C++.</p> 327 <p>There is however a C++ wrapper provided by Ari Johnson 328 <ari@btigate.com> which may fullfill your needs:</p> 329 <p>Website: <a 330 href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p> 331 <p>Download: <a 332 href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p> 333 </li> 334 <li>How to validate a document a posteriori ? 335 <p>It is possible to validate documents which had not been validated at 336 initial parsing time or documents who have been built from scratch using 337 the API. Use the <a 338 href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a> 339 function. It is also possible to simply add a Dtd to an existing 340 document:</p> 341 <pre>xmlDocPtr doc; /* your existing document */ 342 xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */ 343 dtd->name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */ 344 345 doc->intSubset = dtd; 346 if (doc->children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd); 347 else xmlAddPrevSibling(doc->children, (xmlNodePtr)dtd); 348 </pre> 349 </li> 350 <li>etc ...</li> 351</ol> 352 353<p></p> 354 355<h2><a name="Documentat">Documentation</a></h2> 356 357<p>There are some on-line resources about using libxml:</p> 358<ol> 359 <li>Check the <a href="FAQ.html">FAQ</a></li> 360 <li>Check the <a href="http://xmlsoft.org/html/libxml-lib.html">extensive 361 documentation</a> automatically extracted from code comments (using <a 362 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gtk-doc">gtk 363 doc</a>).</li> 364 <li>Look at the documentation about <a href="encoding.html">libxml 365 internationalization support</a></li> 366 <li>This page provides a global overview and <a href="#real">some 367 examples</a> on how to use libxml.</li> 368 <li><a href="mailto:james@daa.com.au">James Henstridge</a> 369 wrote <a 370 href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">some nice 371 documentation</a> explaining how to use the libxml SAX interface.</li> 372 <li>George Lebl wrote <a 373 href="http://www-4.ibm.com/software/developer/library/gnome3/">an article 374 for IBM developerWorks</a> about using libxml.</li> 375 <li>Check <a href="http://cvs.gnome.org/lxr/source/gnome-xml/TODO">the TODO 376 file</a></li> 377 <li>Read the <a href="upgrade.html">1.x to 2.x upgrade path</a>. If you are 378 starting a new project using libxml you should really use the 2.x 379 version.</li> 380 <li>And don't forget to look at the <a href="/messages/">mailing-list 381 archive</a>.</li> 382</ol> 383 384<h2><a name="Reporting">Reporting bugs and getting help</a></h2> 385 386<p>Well, bugs or missing features are always possible, and I will make a 387point of fixing them in a timely fashion. The best way to report a bug is to 388use the <a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Gnome 389bug tracking database</a> (make sure to use the "libxml" module name). I look 390at reports there regularly and it's good to have a reminder when a bug is 391still open. Check the <a 392href="http://bugzilla.gnome.org/bugwritinghelp.html">instructions on 393reporting bugs</a> and be sure to specify that the bug is for the package 394libxml.</p> 395 396<p>There is also a mailing-list <a 397href="mailto:xml@gnome.org">xml@gnome.org</a> for libxml, with an <a 398href="http://mail.gnome.org/archives/xml/">on-line archive</a> (<a 399href="http://xmlsoft.org/messages">old</a>). To subscribe to this list, 400please visit the <a 401href="http://mail.gnome.org/mailman/listinfo/xml">associated Web</a> page and 402follow the instructions. <strong>Do not send code, I won't debug it</strong> 403(but patches are really appreciated!).</p> 404 405<p>Check the following <strong><span style="color: #FF0000">before 406posting</span></strong>:</p> 407<ul> 408 <li>read the <a href="FAQ.html">FAQ</a></li> 409 <li>make sure you are <a href="ftp://xmlsoft.org/">using a recent 410 version</a>, and that the problem still shows up in those</li> 411 <li>check the <a href="http://mail.gnome.org/archives/xml/">list 412 archives</a> to see if the problem was reported already, in this case 413 there is probably a fix available, similary check the <a 414 href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered 415 open bugs</a></li> 416 <li>make sure you can reproduce the bug with xmllint or one of the test 417 programs found in source in the distribution</li> 418 <li>Please send the command showing the error as well as the input (as an 419 attachement)</li> 420</ul> 421 422<p>Then send the bug with associated informations to reproduce it to the <a 423href="mailto:xml@gnome.org">xml@gnome.org</a> list; if it's really libxml 424related I will approve it.. Please do not send me mail directly, it makes 425things really harder to track and in some cases I'm not the best person to 426answer a given question, ask the list instead.</p> 427 428<p>Of course, bugs reported with a suggested patch for fixing them will 429probably be processed faster.</p> 430 431<p>If you're looking for help, a quick look at <a 432href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually 433provide the answer, I usually send source samples when answering libxml usage 434questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated 435documentantion</a> is not as polished as I would like (i need to learn more 436about Docbook), but it's a good starting point.</p> 437 438<h2><a name="help">How to help</a></h2> 439 440<p>You can help the project in various ways, the best thing to do first is to 441subscribe to the mailing-list as explained before, check the <a 442href="http://mail.gnome.org/archives/xml/">archives </a>and the <a 443href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Gnome bug 444database:</a>:</p> 445<ol> 446 <li>provide patches when you find problems</li> 447 <li>provide the diffs when you port libxml to a new platform. They may not 448 be integrated in all cases but help pinpointing portability problems 449 and</li> 450 <li>provide documentation fixes (either as patches to the code comments or 451 as HTML diffs).</li> 452 <li>provide new documentations pieces (translations, examples, etc ...)</li> 453 <li>Check the TODO file and try to close one of the items</li> 454 <li>take one of the points raised in the archive or the bug database and 455 provide a fix. <a href="mailto:daniel@veillard.com">Get in touch with me 456 </a>before to avoid synchronization problems and check that the suggested 457 fix will fit in nicely :-)</li> 458</ol> 459 460<h2><a name="Downloads">Downloads</a></h2> 461 462<p>The latest versions of libxml can be found on <a 463href="ftp://xmlsoft.org/">xmlsoft.org</a> (<a 464href="ftp://speakeasy.rpmfind.net/pub/libxml/">Seattle</a>, <a 465href="ftp://fr.rpmfind.net/pub/libxml/">France</a>) or on the <a 466href="ftp://ftp.gnome.org/pub/GNOME/MIRRORS.html">Gnome FTP server</a> either 467as a <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">source 468archive</a> or <a 469href="ftp://ftp.gnome.org/pub/GNOME/stable/redhat/i386/libxml/">RPM 470packages</a>. (NOTE that you need both the <a 471href="http://rpmfind.net/linux/RPM/libxml2.html">libxml(2)</a> and <a 472href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml(2)-devel</a> 473packages installed to compile applications using libxml.) <a 474href="mailto:izlatkovic@daenet.de">Igor Zlatkovic</a> is now the maintainer 475of the Windows port, <a 476href="http://www.fh-frankfurt.de/~igor/projects/libxml/index.html">he 477provides binaries</a></p> 478 479<p><a name="Snapshot">Snapshot:</a></p> 480<ul> 481 <li>Code from the W3C cvs base libxml <a 482 href="ftp://xmlsoft.org/cvs-snapshot.tar.gz">cvs-snapshot.tar.gz</a></li> 483 <li>Docs, content of the web site, the list archive included <a 484 href="ftp://xmlsoft.org/libxml-docs.tar.gz">libxml-docs.tar.gz</a></li> 485</ul> 486 487<p><a name="Contribs">Contribs:</a></p> 488 489<p>I do accept external contributions, especially if compiling on another 490platform, get in touch with me to upload the package. I will keep them in the 491<a href="ftp://xmlsoft.org/contribs/">contrib directory</a></p> 492 493<p>Libxml is also available from CVS:</p> 494<ul> 495 <li><p>The <a 496 href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gnome-xml">Gnome 497 CVS base</a>. Check the <a 498 href="http://developer.gnome.org/tools/cvs.html">Gnome CVS Tools</a> 499 page; the CVS module is <b>gnome-xml</b>.</p> 500 </li> 501 <li>The <strong>libxslt</strong> module is also present there</li> 502</ul> 503 504<h2><a name="News">News</a></h2> 505 506<h3>CVS only : check the <a 507href="http://cvs.gnome.org/lxr/source/gnome-xml/ChangeLog">Changelog</a> file 508for a really accurate description</h3> 509 510<p>Items floating around but not actively worked on, get in touch with me if 511you want to test those</p> 512<ul> 513 <li>Implementing <a href="http://xmlsoft.org/XSLT">XSLT</a>, this is done 514 as a separate C library on top of libxml called libxslt</li> 515 <li>Finishing up <a href="http://www.w3.org/TR/xptr">XPointer</a> and <a 516 href="http://www.w3.org/TR/xinclude">XInclude</a></li> 517 <li>(seeems working but delayed from release) parsing/import of Docbook 518 SGML docs</li> 519</ul> 520 521<h3>2.4.6: Oct 10 2001</h3> 522<ul> 523 <li>added and updated man pages by John Fleck</li> 524 <li>portability and configure fixes</li> 525 <li>an infinite loop on the HTML parser was removed (William)</li> 526 <li>Windows makefile patches from Igor</li> 527 <li>fixed half a dozen bugs reported fof libxml or libxslt</li> 528 <li>updated xmlcatalog to be able to modify SGML super catalogs</li> 529</ul> 530 531<h3>2.4.5: Sep 14 2001</h3> 532<ul> 533 <li>Remove a few annoying bugs in 2.4.4</li> 534 <li>forces the HTML serializer to output decimal charrefs since some 535 version of Netscape can't handle hexadecimal ones</li> 536</ul> 537 538<h3>1.8.16: Sep 14 2001</h3> 539<ul> 540 <li>maintenance release of the old libxml1 branch, couple of bug and 541 portability fixes</li> 542</ul> 543 544<h3>2.4.4: Sep 12 2001</h3> 545<ul> 546 <li>added --convert to xmlcatalog, bug fixes and cleanups of XML 547 Catalog</li> 548 <li>a few bug fixes and some portability changes</li> 549 <li>some documentation cleanups</li> 550</ul> 551 552<h3>2.4.3: Aug 23 2001</h3> 553<ul> 554 <li>XML Catalog support see the doc</li> 555 <li>New NaN/Infinity floating point code</li> 556 <li>A few bug fixes</li> 557</ul> 558 559<h3>2.4.2: Aug 15 2001</h3> 560<ul> 561 <li>adds xmlLineNumbersDefault() to control line number generation</li> 562 <li>lot of bug fixes</li> 563 <li>the Microsoft MSC projects files shuld now be up to date</li> 564 <li>inheritance of namespaces from DTD defaulted attributes</li> 565 <li>fixes a serious potential security bug</li> 566 <li>added a --format option to xmllint</li> 567</ul> 568 569<h3>2.4.1: July 24 2001</h3> 570<ul> 571 <li>possibility to keep line numbers in the tree</li> 572 <li>some computation NaN fixes</li> 573 <li>extension of the XPath API</li> 574 <li>cleanup for alpha and ia64 targets</li> 575 <li>patch to allow saving through HTTP PUT or POST</li> 576</ul> 577 578<h3>2.4.0: July 10 2001</h3> 579<ul> 580 <li>Fixed a few bugs in XPath, validation, and tree handling.</li> 581 <li>Fixed XML Base implementation, added a coupel of examples to the 582 regression tests</li> 583 <li>A bit of cleanup</li> 584</ul> 585 586<h3>2.3.14: July 5 2001</h3> 587<ul> 588 <li>fixed some entities problems and reduce mem requirement when 589 substituing them</li> 590 <li>lots of improvements in the XPath queries interpreter can be 591 substancially faster</li> 592 <li>Makefiles and configure cleanups</li> 593 <li>Fixes to XPath variable eval, and compare on empty node set</li> 594 <li>HTML tag closing bug fixed</li> 595 <li>Fixed an URI reference computating problem when validating</li> 596</ul> 597 598<h3>2.3.13: June 28 2001</h3> 599<ul> 600 <li>2.3.12 configure.in was broken as well as the push mode XML parser</li> 601 <li>a few more fixes for compilation on Windows MSC by Yon Derek</li> 602</ul> 603 604<h3>1.8.14: June 28 2001</h3> 605<ul> 606 <li>Zbigniew Chyla gave a patch to use the old XML parser in push mode</li> 607 <li>Small Makefile fix</li> 608</ul> 609 610<h3>2.3.12: June 26 2001</h3> 611<ul> 612 <li>lots of cleanup</li> 613 <li>a couple of validation fix</li> 614 <li>fixed line number counting</li> 615 <li>fixed serious problems in the XInclude processing</li> 616 <li>added support for UTF8 BOM at beginning of entities</li> 617 <li>fixed a strange gcc optimizer bugs in xpath handling of float, gcc-3.0 618 miscompile uri.c (William), Thomas Leitner provided a fix for the 619 optimizer on Tru64</li> 620 <li>incorporated Yon Derek and Igor Zlatkovic fixes and improvements for 621 compilation on Windows MSC</li> 622 <li>update of libxml-doc.el (Felix Natter)</li> 623 <li>fixed 2 bugs in URI normalization code</li> 624</ul> 625 626<h3>2.3.11: June 17 2001</h3> 627<ul> 628 <li>updates to trio, Makefiles and configure should fix some portability 629 problems (alpha)</li> 630 <li>fixed some HTML serialization problems (pre, script, and block/inline 631 handling), added encoding aware APIs, cleanup of this code</li> 632 <li>added xmlHasNsProp()</li> 633 <li>implemented a specific PI for encoding support in the DocBook SGML 634 parser</li> 635 <li>some XPath fixes (-Infinity, / as a function parameter and namespaces 636 node selection)</li> 637 <li>fixed a performance problem and an error in the validation code</li> 638 <li>fixed XInclude routine to implement the recursive behaviour</li> 639 <li>fixed xmlFreeNode problem when libxml is included statically twice</li> 640 <li>added --version to xmllint for bug reports</li> 641</ul> 642 643<h3>2.3.10: June 1 2001</h3> 644<ul> 645 <li>fixed the SGML catalog support</li> 646 <li>a number of reported bugs got fixed, in XPath, iconv detection, 647 XInclude processing</li> 648 <li>XPath string function should now handle unicode correctly</li> 649</ul> 650 651<h3>2.3.9: May 19 2001</h3> 652 653<p>Lots of bugfixes, and added a basic SGML catalog support:</p> 654<ul> 655 <li>HTML push bugfix #54891 and another patch from Jonas Borgstr�m</li> 656 <li>some serious speed optimisation again</li> 657 <li>some documentation cleanups</li> 658 <li>trying to get better linking on solaris (-R)</li> 659 <li>XPath API cleanup from Thomas Broyer</li> 660 <li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed 661 xmlValidGetValidElements()</li> 662 <li>Added an INSTALL file</li> 663 <li>Attribute removal added to API: #54433</li> 664 <li>added a basic support for SGML catalogs</li> 665 <li>fixed xmlKeepBlanksDefault(0) API</li> 666 <li>bugfix in xmlNodeGetLang()</li> 667 <li>fixed a small configure portability problem</li> 668 <li>fixed an inversion of SYSTEM and PUBLIC identifier in HTML document</li> 669</ul> 670 671<h3>1.8.13: May 14 2001</h3> 672<ul> 673 <li>bugfixes release of the old libxml1 branch used by Gnome</li> 674</ul> 675 676<h3>2.3.8: May 3 2001</h3> 677<ul> 678 <li>Integrated an SGML DocBook parser for the Gnome project</li> 679 <li>Fixed a few things in the HTML parser</li> 680 <li>Fixed some XPath bugs raised by XSLT use, tried to fix the floating 681 point portability issue</li> 682 <li>Speed improvement (8M/s for SAX, 3M/s for DOM, 1.5M/s for 683 DOM+validation using the XML REC as input and a 700MHz celeron).</li> 684 <li>incorporated more Windows cleanup</li> 685 <li>added xmlSaveFormatFile()</li> 686 <li>fixed problems in copying nodes with entities references (gdome)</li> 687 <li>removed some troubles surrounding the new validation module</li> 688</ul> 689 690<h3>2.3.7: April 22 2001</h3> 691<ul> 692 <li>lots of small bug fixes, corrected XPointer</li> 693 <li>Non determinist content model validation support</li> 694 <li>added xmlDocCopyNode for gdome2</li> 695 <li>revamped the way the HTML parser handles end of tags</li> 696 <li>XPath: corrctions of namespacessupport and number formatting</li> 697 <li>Windows: Igor Zlatkovic patches for MSC compilation</li> 698 <li>HTML ouput fixes from P C Chow and William M. Brack</li> 699 <li>Improved validation speed sensible for DocBook</li> 700 <li>fixed a big bug with ID declared in external parsed entities</li> 701 <li>portability fixes, update of Trio from Bjorn Reese</li> 702</ul> 703 704<h3>2.3.6: April 8 2001</h3> 705<ul> 706 <li>Code cleanup using extreme gcc compiler warning options, found and 707 cleared half a dozen potential problem</li> 708 <li>the Eazel team found an XML parser bug</li> 709 <li>cleaned up the user of some of the string formatting function. used the 710 trio library code to provide the one needed when the platform is missing 711 them</li> 712 <li>xpath: removed a memory leak and fixed the predicate evaluation 713 problem, extended the testsuite and cleaned up the result. XPointer seems 714 broken ...</li> 715</ul> 716 717<h3>2.3.5: Mar 23 2001</h3> 718<ul> 719 <li>Biggest change is separate parsing and evaluation of XPath expressions, 720 there is some new APIs for this too</li> 721 <li>included a number of bug fixes(XML push parser, 51876, notations, 722 52299)</li> 723 <li>Fixed some portability issues</li> 724</ul> 725 726<h3>2.3.4: Mar 10 2001</h3> 727<ul> 728 <li>Fixed bugs #51860 and #51861</li> 729 <li>Added a global variable xmlDefaultBufferSize to allow default buffer 730 size to be application tunable.</li> 731 <li>Some cleanup in the validation code, still a bug left and this part 732 should probably be rewritten to support ambiguous content model :-\</li> 733 <li>Fix a couple of serious bugs introduced or raised by changes in 2.3.3 734 parser</li> 735 <li>Fixed another bug in xmlNodeGetContent()</li> 736 <li>Bjorn fixed XPath node collection and Number formatting</li> 737 <li>Fixed a loop reported in the HTML parsing</li> 738 <li>blank space are reported even if the Dtd content model proves that they 739 are formatting spaces, this is for XmL conformance</li> 740</ul> 741 742<h3>2.3.3: Mar 1 2001</h3> 743<ul> 744 <li>small change in XPath for XSLT</li> 745 <li>documentation cleanups</li> 746 <li>fix in validation by Gary Pennington</li> 747 <li>serious parsing performances improvements</li> 748</ul> 749 750<h3>2.3.2: Feb 24 2001</h3> 751<ul> 752 <li>chasing XPath bugs, found a bunch, completed some TODO</li> 753 <li>fixed a Dtd parsing bug</li> 754 <li>fixed a bug in xmlNodeGetContent</li> 755 <li>ID/IDREF support partly rewritten by Gary Pennington</li> 756</ul> 757 758<h3>2.3.1: Feb 15 2001</h3> 759<ul> 760 <li>some XPath and HTML bug fixes for XSLT</li> 761 <li>small extension of the hash table interfaces for DOM gdome2 762 implementation</li> 763 <li>A few bug fixes</li> 764</ul> 765 766<h3>2.3.0: Feb 8 2001 (2.2.12 was on 25 Jan but I didn't kept track)</h3> 767<ul> 768 <li>Lots of XPath bug fixes</li> 769 <li>Add a mode with Dtd lookup but without validation error reporting for 770 XSLT</li> 771 <li>Add support for text node without escaping (XSLT)</li> 772 <li>bug fixes for xmlCheckFilename</li> 773 <li>validation code bug fixes from Gary Pennington</li> 774 <li>Patch from Paul D. Smith correcting URI path normalization</li> 775 <li>Patch to allow simultaneous install of libxml-devel and 776 libxml2-devel</li> 777 <li>the example Makefile is now fixed</li> 778 <li>added HTML to the RPM packages</li> 779 <li>tree copying bugfixes</li> 780 <li>updates to Windows makefiles</li> 781 <li>optimisation patch from Bjorn Reese</li> 782</ul> 783 784<h3>2.2.11: Jan 4 2001</h3> 785<ul> 786 <li>bunch of bug fixes (memory I/O, xpath, ftp/http, ...)</li> 787 <li>added htmlHandleOmittedElem()</li> 788 <li>Applied Bjorn Reese's IPV6 first patch</li> 789 <li>Applied Paul D. Smith patches for validation of XInclude results</li> 790 <li>added XPointer xmlns() new scheme support</li> 791</ul> 792 793<h3>2.2.10: Nov 25 2000</h3> 794<ul> 795 <li>Fix the Windows problems of 2.2.8</li> 796 <li>integrate OpenVMS patches</li> 797 <li>better handling of some nasty HTML input</li> 798 <li>Improved the XPointer implementation</li> 799 <li>integrate a number of provided patches</li> 800</ul> 801 802<h3>2.2.9: Nov 25 2000</h3> 803<ul> 804 <li>erroneous release :-(</li> 805</ul> 806 807<h3>2.2.8: Nov 13 2000</h3> 808<ul> 809 <li>First version of <a href="http://www.w3.org/TR/xinclude">XInclude</a> 810 support</li> 811 <li>Patch in conditional section handling</li> 812 <li>updated MS compiler project</li> 813 <li>fixed some XPath problems</li> 814 <li>added an URI escaping function</li> 815 <li>some other bug fixes</li> 816</ul> 817 818<h3>2.2.7: Oct 31 2000</h3> 819<ul> 820 <li>added message redirection</li> 821 <li>XPath improvements (thanks TOM !)</li> 822 <li>xmlIOParseDTD() added</li> 823 <li>various small fixes in the HTML, URI, HTTP and XPointer support</li> 824 <li>some cleanup of the Makefile, autoconf and the distribution content</li> 825</ul> 826 827<h3>2.2.6: Oct 25 2000:</h3> 828<ul> 829 <li>Added an hash table module, migrated a number of internal structure to 830 those</li> 831 <li>Fixed a posteriori validation problems</li> 832 <li>HTTP module cleanups</li> 833 <li>HTML parser improvements (tag errors, script/style handling, attribute 834 normalization)</li> 835 <li>coalescing of adjacent text nodes</li> 836 <li>couple of XPath bug fixes, exported the internal API</li> 837</ul> 838 839<h3>2.2.5: Oct 15 2000:</h3> 840<ul> 841 <li>XPointer implementation and testsuite</li> 842 <li>Lot of XPath fixes, added variable and functions registration, more 843 tests</li> 844 <li>Portability fixes, lots of enhancements toward an easy Windows build 845 and release</li> 846 <li>Late validation fixes</li> 847 <li>Integrated a lot of contributed patches</li> 848 <li>added memory management docs</li> 849 <li>a performance problem when using large buffer seems fixed</li> 850</ul> 851 852<h3>2.2.4: Oct 1 2000:</h3> 853<ul> 854 <li>main XPath problem fixed</li> 855 <li>Integrated portability patches for Windows</li> 856 <li>Serious bug fixes on the URI and HTML code</li> 857</ul> 858 859<h3>2.2.3: Sep 17 2000</h3> 860<ul> 861 <li>bug fixes</li> 862 <li>cleanup of entity handling code</li> 863 <li>overall review of all loops in the parsers, all sprintf usage has been 864 checked too</li> 865 <li>Far better handling of larges Dtd. Validating against Docbook XML Dtd 866 works smoothly now.</li> 867</ul> 868 869<h3>1.8.10: Sep 6 2000</h3> 870<ul> 871 <li>bug fix release for some Gnome projects</li> 872</ul> 873 874<h3>2.2.2: August 12 2000</h3> 875<ul> 876 <li>mostly bug fixes</li> 877 <li>started adding routines to access xml parser context options</li> 878</ul> 879 880<h3>2.2.1: July 21 2000</h3> 881<ul> 882 <li>a purely bug fixes release</li> 883 <li>fixed an encoding support problem when parsing from a memory block</li> 884 <li>fixed a DOCTYPE parsing problem</li> 885 <li>removed a bug in the function allowing to override the memory 886 allocation routines</li> 887</ul> 888 889<h3>2.2.0: July 14 2000</h3> 890<ul> 891 <li>applied a lot of portability fixes</li> 892 <li>better encoding support/cleanup and saving (content is now always 893 encoded in UTF-8)</li> 894 <li>the HTML parser now correctly handles encodings</li> 895 <li>added xmlHasProp()</li> 896 <li>fixed a serious problem with &#38;</li> 897 <li>propagated the fix to FTP client</li> 898 <li>cleanup, bugfixes, etc ...</li> 899 <li>Added a page about <a href="encoding.html">libxml Internationalization 900 support</a></li> 901</ul> 902 903<h3>1.8.9: July 9 2000</h3> 904<ul> 905 <li>fixed the spec the RPMs should be better</li> 906 <li>fixed a serious bug in the FTP implementation, released 1.8.9 to solve 907 rpmfind users problem</li> 908</ul> 909 910<h3>2.1.1: July 1 2000</h3> 911<ul> 912 <li>fixes a couple of bugs in the 2.1.0 packaging</li> 913 <li>improvements on the HTML parser</li> 914</ul> 915 916<h3>2.1.0 and 1.8.8: June 29 2000</h3> 917<ul> 918 <li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to 919 <a href="upgrade.html">new instructions</a>. It fixes a nasty problem 920 about &#38; charref parsing</li> 921 <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it 922 also contains numerous fixes and enhancements: 923 <ul> 924 <li>added xmlStopParser() to stop parsing</li> 925 <li>improved a lot parsing speed when there is large CDATA blocs</li> 926 <li>includes XPath patches provided by Picdar Technology</li> 927 <li>tried to fix as much as possible DtD validation and namespace 928 related problems</li> 929 <li>output to a given encoding has been added/tested</li> 930 <li>lot of various fixes</li> 931 </ul> 932 </li> 933</ul> 934 935<h3>2.0.0: Apr 12 2000</h3> 936<ul> 937 <li>First public release of libxml2. If you are using libxml, it's a good 938 idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally 939 scheduled for Apr 3 the relase occured only on Apr 12 due to massive 940 workload.</li> 941 <li>The include are now located under $prefix/include/libxml (instead of 942 $prefix/include/gnome-xml), they also are referenced by 943 <pre>#include <libxml/xxx.h></pre> 944 <p>instead of</p> 945 <pre>#include "xxx.h"</pre> 946 </li> 947 <li>a new URI module for parsing URIs and following strictly RFC 2396</li> 948 <li>the memory allocation routines used by libxml can now be overloaded 949 dynamically by using xmlMemSetup()</li> 950 <li>The previously CVS only tool tester has been renamed 951 <strong>xmllint</strong> and is now installed as part of the libxml2 952 package</li> 953 <li>The I/O interface has been revamped. There is now ways to plug in 954 specific I/O modules, either at the URI scheme detection level using 955 xmlRegisterInputCallbacks() or by passing I/O functions when creating a 956 parser context using xmlCreateIOParserCtxt()</li> 957 <li>there is a C preprocessor macro LIBXML_VERSION providing the version 958 number of the libxml module in use</li> 959 <li>a number of optional features of libxml can now be excluded at 960 configure time (FTP/HTTP/HTML/XPath/Debug)</li> 961</ul> 962 963<h3>2.0.0beta: Mar 14 2000</h3> 964<ul> 965 <li>This is a first Beta release of libxml version 2</li> 966 <li>It's available only from<a href="ftp://xmlsoft.org/">xmlsoft.org 967 FTP</a>, it's packaged as libxml2-2.0.0beta and available as tar and 968 RPMs</li> 969 <li>This version is now the head in the Gnome CVS base, the old one is 970 available under the tag LIB_XML_1_X</li> 971 <li>This includes a very large set of changes. Froma programmatic point of 972 view applications should not have to be modified too much, check the <a 973 href="upgrade.html">upgrade page</a></li> 974 <li>Some interfaces may changes (especially a bit about encoding).</li> 975 <li>the updates includes: 976 <ul> 977 <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly 978 handled now</li> 979 <li>Better handling of entities, especially well formedness checking 980 and proper PEref extensions in external subsets</li> 981 <li>DTD conditional sections</li> 982 <li>Validation now correcly handle entities content</li> 983 <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change 984 structures to accomodate DOM</a></li> 985 </ul> 986 </li> 987 <li>Serious progress were made toward compliance, <a 988 href="conf/result.html">here are the result of the test</a> against the 989 OASIS testsuite (except the japanese tests since I don't support that 990 encoding yet). This URL is rebuilt every couple of hours using the CVS 991 head version.</li> 992</ul> 993 994<h3>1.8.7: Mar 6 2000</h3> 995<ul> 996 <li>This is a bug fix release:</li> 997 <li>It is possible to disable the ignorable blanks heuristic used by 998 libxml-1.x, a new function xmlKeepBlanksDefault(0) will allow this. Note 999 that for adherence to XML spec, this behaviour will be disabled by 1000 default in 2.x . The same function will allow to keep compatibility for 1001 old code.</li> 1002 <li>Blanks in <a> </a> constructs are not ignored anymore, 1003 avoiding heuristic is really the Right Way :-\</li> 1004 <li>The unchecked use of snprintf which was breaking libxml-1.8.6 1005 compilation on some platforms has been fixed</li> 1006 <li>nanoftp.c nanohttp.c: Fixed '#' and '?' stripping when processing 1007 URIs</li> 1008</ul> 1009 1010<h3>1.8.6: Jan 31 2000</h3> 1011<ul> 1012 <li>added a nanoFTP transport module, debugged until the new version of <a 1013 href="http://rpmfind.net/linux/rpm2html/rpmfind.html">rpmfind</a> can use 1014 it without troubles</li> 1015</ul> 1016 1017<h3>1.8.5: Jan 21 2000</h3> 1018<ul> 1019 <li>adding APIs to parse a well balanced chunk of XML (production <a 1020 href="http://www.w3.org/TR/REC-xml#NT-content">[43] content</a> of the 1021 XML spec)</li> 1022 <li>fixed a hideous bug in xmlGetProp pointed by Rune.Djurhuus@fast.no</li> 1023 <li>Jody Goldberg <jgoldberg@home.com> provided another patch trying 1024 to solve the zlib checks problems</li> 1025 <li>The current state in gnome CVS base is expected to ship as 1.8.5 with 1026 gnumeric soon</li> 1027</ul> 1028 1029<h3>1.8.4: Jan 13 2000</h3> 1030<ul> 1031 <li>bug fixes, reintroduced xmlNewGlobalNs(), fixed xmlNewNs()</li> 1032 <li>all exit() call should have been removed from libxml</li> 1033 <li>fixed a problem with INCLUDE_WINSOCK on WIN32 platform</li> 1034 <li>added newDocFragment()</li> 1035</ul> 1036 1037<h3>1.8.3: Jan 5 2000</h3> 1038<ul> 1039 <li>a Push interface for the XML and HTML parsers</li> 1040 <li>a shell-like interface to the document tree (try tester --shell :-)</li> 1041 <li>lots of bug fixes and improvement added over XMas hollidays</li> 1042 <li>fixed the DTD parsing code to work with the xhtml DTD</li> 1043 <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li> 1044 <li>Fixed bugs in xmlNewNs()</li> 1045 <li>External entity loading code has been revamped, now it uses 1046 xmlLoadExternalEntity(), some fix on entities processing were added</li> 1047 <li>cleaned up WIN32 includes of socket stuff</li> 1048</ul> 1049 1050<h3>1.8.2: Dec 21 1999</h3> 1051<ul> 1052 <li>I got another problem with includes and C++, I hope this issue is fixed 1053 for good this time</li> 1054 <li>Added a few tree modification functions: xmlReplaceNode, 1055 xmlAddPrevSibling, xmlAddNextSibling, xmlNodeSetName and 1056 xmlDocSetRootElement</li> 1057 <li>Tried to improve the HTML output with help from <a 1058 href="mailto:clahey@umich.edu">Chris Lahey</a></li> 1059</ul> 1060 1061<h3>1.8.1: Dec 18 1999</h3> 1062<ul> 1063 <li>various patches to avoid troubles when using libxml with C++ compilers 1064 the "namespace" keyword and C escaping in include files</li> 1065 <li>a problem in one of the core macros IS_CHAR was corrected</li> 1066 <li>fixed a bug introduced in 1.8.0 breaking default namespace processing, 1067 and more specifically the Dia application</li> 1068 <li>fixed a posteriori validation (validation after parsing, or by using a 1069 Dtd not specified in the original document)</li> 1070 <li>fixed a bug in</li> 1071</ul> 1072 1073<h3>1.8.0: Dec 12 1999</h3> 1074<ul> 1075 <li>cleanup, especially memory wise</li> 1076 <li>the parser should be more reliable, especially the HTML one, it should 1077 not crash, whatever the input !</li> 1078 <li>Integrated various patches, especially a speedup improvement for large 1079 dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>, 1080 configure with --with-buffers to enable them.</li> 1081 <li>attribute normalization, oops should have been added long ago !</li> 1082 <li>attributes defaulted from Dtds should be available, xmlSetProp() now 1083 does entities escapting by default.</li> 1084</ul> 1085 1086<h3>1.7.4: Oct 25 1999</h3> 1087<ul> 1088 <li>Lots of HTML improvement</li> 1089 <li>Fixed some errors when saving both XML and HTML</li> 1090 <li>More examples, the regression tests should now look clean</li> 1091 <li>Fixed a bug with contiguous charref</li> 1092</ul> 1093 1094<h3>1.7.3: Sep 29 1999</h3> 1095<ul> 1096 <li>portability problems fixed</li> 1097 <li>snprintf was used unconditionnally, leading to link problems on system 1098 were it's not available, fixed</li> 1099</ul> 1100 1101<h3>1.7.1: Sep 24 1999</h3> 1102<ul> 1103 <li>The basic type for strings manipulated by libxml has been renamed in 1104 1.7.1 from <strong>CHAR</strong> to <strong>xmlChar</strong>. The reason 1105 is that CHAR was conflicting with a predefined type on Windows. However 1106 on non WIN32 environment, compatibility is provided by the way of a 1107 <strong>#define </strong>.</li> 1108 <li>Changed another error : the use of a structure field called errno, and 1109 leading to troubles on platforms where it's a macro</li> 1110</ul> 1111 1112<h3>1.7.0: sep 23 1999</h3> 1113<ul> 1114 <li>Added the ability to fetch remote DTD or parsed entities, see the <a 1115 href="html/libxml-nanohttp.html">nanohttp</a> module.</li> 1116 <li>Added an errno to report errors by another mean than a simple printf 1117 like callback</li> 1118 <li>Finished ID/IDREF support and checking when validation</li> 1119 <li>Serious memory leaks fixed (there is now a <a 1120 href="html/libxml-xmlmemory.html">memory wrapper</a> module)</li> 1121 <li>Improvement of <a href="http://www.w3.org/TR/xpath">XPath</a> 1122 implementation</li> 1123 <li>Added an HTML parser front-end</li> 1124</ul> 1125 1126<h2><a name="XML">XML</a></h2> 1127 1128<p><a href="http://www.w3.org/TR/REC-xml">XML is a standard</a> for 1129markup-based structured documents. Here is <a name="example">an example XML 1130document</a>:</p> 1131<pre><?xml version="1.0"?> 1132<EXAMPLE prop1="gnome is great" prop2="&amp; linux too"> 1133 <head> 1134 <title>Welcome to Gnome</title> 1135 </head> 1136 <chapter> 1137 <title>The Linux adventure</title> 1138 <p>bla bla bla ...</p> 1139 <image href="linus.gif"/> 1140 <p>...</p> 1141 </chapter> 1142</EXAMPLE></pre> 1143 1144<p>The first line specifies that it's an XML document and gives useful 1145information about its encoding. Then the document is a text format whose 1146structure is specified by tags between brackets. <strong>Each tag opened has 1147to be closed</strong>. XML is pedantic about this. However, if a tag is empty 1148(no content), a single tag can serve as both the opening and closing tag if 1149it ends with <code>/></code> rather than with <code>></code>. Note 1150that, for example, the image tag has no content (just an attribute) and is 1151closed by ending the tag with <code>/></code>.</p> 1152 1153<p>XML can be applied sucessfully to a wide range of uses, from long term 1154structured document maintenance (where it follows the steps of SGML) to 1155simple data encoding mechanisms like configuration file formatting (glade), 1156spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where 1157it is used to encode remote calls between a client and a server.</p> 1158 1159<h2><a name="XSLT">XSLT</a></h2> 1160 1161<p>Check <a href="http://xmlsoft.org/XSLT">the separate libxslt page</a></p> 1162 1163<p><a href="http://www.w3.org/TR/xslt">XSL Transformations</a>, is a 1164language for transforming XML documents into other XML documents (or 1165HTML/textual output).</p> 1166 1167<p>A separate library called libxslt is being built on top of libxml2. This 1168module "libxslt" can be found in the Gnome CVS base too.</p> 1169 1170<p>You can check the <a 1171href="http://cvs.gnome.org/lxr/source/libxslt/FEATURES">features</a> 1172supported and the progresses on the <a 1173href="http://cvs.gnome.org/lxr/source/libxslt/ChangeLog">Changelog</a></p> 1174 1175<h2><a name="architecture">libxml architecture</a></h2> 1176 1177<p>Libxml is made of multiple components; some of them are optional, and most 1178of the block interfaces are public. The main components are:</p> 1179<ul> 1180 <li>an Input/Output layer</li> 1181 <li>FTP and HTTP client layers (optional)</li> 1182 <li>an Internationalization layer managing the encodings support</li> 1183 <li>a URI module</li> 1184 <li>the XML parser and its basic SAX interface</li> 1185 <li>an HTML parser using the same SAX interface (optional)</li> 1186 <li>a SAX tree module to build an in-memory DOM representation</li> 1187 <li>a tree module to manipulate the DOM representation</li> 1188 <li>a validation module using the DOM representation (optional)</li> 1189 <li>an XPath module for global lookup in a DOM representation 1190 (optional)</li> 1191 <li>a debug module (optional)</li> 1192</ul> 1193 1194<p>Graphically this gives the following:</p> 1195 1196<p><img src="libxml.gif" alt="a graphical view of the various"></p> 1197 1198<p></p> 1199 1200<h2><a name="tree">The tree output</a></h2> 1201 1202<p>The parser returns a tree built during the document analysis. The value 1203returned is an <strong>xmlDocPtr</strong> (i.e., a pointer to an 1204<strong>xmlDoc</strong> structure). This structure contains information such 1205as the file name, the document type, and a <strong>children</strong> pointer 1206which is the root of the document (or more exactly the first child under the 1207root which is the document). The tree is made of <strong>xmlNode</strong>s, 1208chained in double-linked lists of siblings and with a children<->parent 1209relationship. An xmlNode can also carry properties (a chain of xmlAttr 1210structures). An attribute may have a value which is a list of TEXT or 1211ENTITY_REF nodes.</p> 1212 1213<p>Here is an example (erroneous with respect to the XML spec since there 1214should be only one ELEMENT under the root):</p> 1215 1216<p><img src="structure.gif" alt=" structure.gif "></p> 1217 1218<p>In the source package there is a small program (not installed by default) 1219called <strong>xmllint</strong> which parses XML files given as argument and 1220prints them back as parsed. This is useful for detecting errors both in XML 1221code and in the XML parser itself. It has an option <strong>--debug</strong> 1222which prints the actual in-memory structure of the document; here is the 1223result with the <a href="#example">example</a> given before:</p> 1224<pre>DOCUMENT 1225version=1.0 1226standalone=true 1227 ELEMENT EXAMPLE 1228 ATTRIBUTE prop1 1229 TEXT 1230 content=gnome is great 1231 ATTRIBUTE prop2 1232 ENTITY_REF 1233 TEXT 1234 content= linux too 1235 ELEMENT head 1236 ELEMENT title 1237 TEXT 1238 content=Welcome to Gnome 1239 ELEMENT chapter 1240 ELEMENT title 1241 TEXT 1242 content=The Linux adventure 1243 ELEMENT p 1244 TEXT 1245 content=bla bla bla ... 1246 ELEMENT image 1247 ATTRIBUTE href 1248 TEXT 1249 content=linus.gif 1250 ELEMENT p 1251 TEXT 1252 content=...</pre> 1253 1254<p>This should be useful for learning the internal representation model.</p> 1255 1256<h2><a name="interface">The SAX interface</a></h2> 1257 1258<p>Sometimes the DOM tree output is just too large to fit reasonably into 1259memory. In that case (and if you don't expect to save back the XML document 1260loaded using libxml), it's better to use the SAX interface of libxml. SAX is 1261a <strong>callback-based interface</strong> to the parser. Before parsing, 1262the application layer registers a customized set of callbacks which are 1263called by the library as it progresses through the XML input.</p> 1264 1265<p>To get more detailed step-by-step guidance on using the SAX interface of 1266libxml, see the <a 1267href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">nice 1268documentation</a>.written by <a href="mailto:james@daa.com.au">James 1269Henstridge</a>.</p> 1270 1271<p>You can debug the SAX behaviour by using the <strong>testSAX</strong> 1272program located in the gnome-xml module (it's usually not shipped in the 1273binary packages of libxml, but you can find it in the tar source 1274distribution). Here is the sequence of callbacks that would be reported by 1275testSAX when parsing the example XML document shown earlier:</p> 1276<pre>SAX.setDocumentLocator() 1277SAX.startDocument() 1278SAX.getEntity(amp) 1279SAX.startElement(EXAMPLE, prop1='gnome is great', prop2='&amp; linux too') 1280SAX.characters( , 3) 1281SAX.startElement(head) 1282SAX.characters( , 4) 1283SAX.startElement(title) 1284SAX.characters(Welcome to Gnome, 16) 1285SAX.endElement(title) 1286SAX.characters( , 3) 1287SAX.endElement(head) 1288SAX.characters( , 3) 1289SAX.startElement(chapter) 1290SAX.characters( , 4) 1291SAX.startElement(title) 1292SAX.characters(The Linux adventure, 19) 1293SAX.endElement(title) 1294SAX.characters( , 4) 1295SAX.startElement(p) 1296SAX.characters(bla bla bla ..., 15) 1297SAX.endElement(p) 1298SAX.characters( , 4) 1299SAX.startElement(image, href='linus.gif') 1300SAX.endElement(image) 1301SAX.characters( , 4) 1302SAX.startElement(p) 1303SAX.characters(..., 3) 1304SAX.endElement(p) 1305SAX.characters( , 3) 1306SAX.endElement(chapter) 1307SAX.characters( , 1) 1308SAX.endElement(EXAMPLE) 1309SAX.endDocument()</pre> 1310 1311<p>Most of the other interfaces of libxml are based on the DOM tree-building 1312facility, so nearly everything up to the end of this document presupposes the 1313use of the standard DOM tree build. Note that the DOM tree itself is built by 1314a set of registered default callbacks, without internal specific 1315interface.</p> 1316 1317<h2><a name="Validation">Validation & DTDs</a></h2> 1318 1319<p>Table of Content:</p> 1320<ol> 1321 <li><a href="#General5">General overview</a></li> 1322 <li><a href="#definition">The definition</a></li> 1323 <li><a href="#Simple">Simple rules</a> 1324 <ol> 1325 <li><a href="#reference">How to reference a DTD from a 1326 document</a></li> 1327 <li><a href="#Declaring">Declaring elements</a></li> 1328 <li><a href="#Declaring1">Declaring attributes</a></li> 1329 </ol> 1330 </li> 1331 <li><a href="#Some">Some examples</a></li> 1332 <li><a href="#validate">How to validate</a></li> 1333 <li><a href="#Other">Other resources</a></li> 1334</ol> 1335 1336<h3><a name="General5">General overview</a></h3> 1337 1338<p>Well what is validation and what is a DTD ?</p> 1339 1340<p>DTD is the acronym for Document Type Definition. This is a description of 1341the content for a familly of XML files. This is part of the XML 1.0 1342specification, and alows to describe and check that a given document instance 1343conforms to a set of rules detailing its structure and content.</p> 1344 1345<p>Validation is the process of checking a document against a DTD (more 1346generally against a set of construction rules).</p> 1347 1348<p>The validation process and building DTDs are the two most difficult parts 1349of the XML life cycle. Briefly a DTD defines all the possibles element to be 1350found within your document, what is the formal shape of your document tree 1351(by defining the allowed content of an element, either text, a regular 1352expression for the allowed list of children, or mixed content i.e. both text 1353and children). The DTD also defines the allowed attributes for all elements 1354and the types of the attributes.</p> 1355 1356<h3><a name="definition1">The definition</a></h3> 1357 1358<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a 1359href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of 1360Rev1</a>):</p> 1361<ul> 1362 <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring 1363 elements</a></li> 1364 <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring 1365 attributes</a></li> 1366</ul> 1367 1368<p>(unfortunately) all this is inherited from the SGML world, the syntax is 1369ancient...</p> 1370 1371<h3><a name="Simple1">Simple rules</a></h3> 1372 1373<p>Writing DTD can be done in multiple ways, the rules to build them if you 1374need something fixed or something which can evolve over time can be radically 1375different. Really complex DTD like Docbook ones are flexible but quite harder 1376to design. I will just focuse on DTDs for a formats with a fixed simple 1377structure. It is just a set of basic rules, and definitely not exhaustive nor 1378useable for complex DTD design.</p> 1379 1380<h4><a name="reference1">How to reference a DTD from a document</a>:</h4> 1381 1382<p>Assuming the top element of the document is <code>spec</code> and the dtd 1383is placed in the file <code>mydtd</code> in the subdirectory 1384<code>dtds</code> of the directory from where the document were loaded:</p> 1385 1386<p><code><!DOCTYPE spec SYSTEM "dtds/mydtd"></code></p> 1387 1388<p>Notes:</p> 1389<ul> 1390 <li>the system string is actually an URI-Reference (as defined in <a 1391 href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a 1392 full URL string indicating the location of your DTD on the Web, this is a 1393 really good thing to do if you want others to validate your document</li> 1394 <li>it is also possible to associate a <code>PUBLIC</code> identifier (a 1395 magic string) so that the DTd is looked up in catalogs on the client side 1396 without having to locate it on the web</li> 1397 <li>a dtd contains a set of elements and attributes declarations, but they 1398 don't define what the root of the document should be. This is explicitely 1399 told to the parser/validator as the first element of the 1400 <code>DOCTYPE</code> declaration.</li> 1401</ul> 1402 1403<h4><a name="Declaring2">Declaring elements</a>:</h4> 1404 1405<p>The following declares an element <code>spec</code>:</p> 1406 1407<p><code><!ELEMENT spec (front, body, back?)></code></p> 1408 1409<p>it also expresses that the spec element contains one <code>front</code>, 1410one <code>body</code> and one optionnal <code>back</code> children elements 1411in this order. The declaration of one element of the structure and its 1412content are done in a single declaration. Similary the following declares 1413<code>div1</code> elements:</p> 1414 1415<p><code><!ELEMENT div1 (head, (p | list | note)*, div2*)></code></p> 1416 1417<p>means div1 contains one <code>head</code> then a series of optional 1418<code>p</code>, <code>list</code>s and <code>note</code>s and then an 1419optional <code>div2</code>. And last but not least an element can contain 1420text:</p> 1421 1422<p><code><!ELEMENT b (#PCDATA)></code></p> 1423 1424<p><code>b</code> contains text or being of mixed content (text and elements 1425in no particular order):</p> 1426 1427<p><code><!ELEMENT p (#PCDATA|a|ul|b|i|em)*></code></p> 1428 1429<p><code>p </code>can contain text or <code>a</code>, <code>ul</code>, 1430<code>b</code>, <code>i </code>or <code>em</code> elements in no particular 1431order.</p> 1432 1433<h4><a name="Declaring1">Declaring attributes</a>:</h4> 1434 1435<p>again the attributes declaration includes their content definition:</p> 1436 1437<p><code><!ATTLIST termdef name CDATA #IMPLIED></code></p> 1438 1439<p>means that the element <code>termdef</code> can have a <code>name</code> 1440attribute containing text (<code>CDATA</code>) and which is optionnal 1441(<code>#IMPLIED</code>). The attribute value can also be defined within a 1442set:</p> 1443 1444<p><code><!ATTLIST list type (bullets|ordered|glossary) 1445"ordered"></code></p> 1446 1447<p>means <code>list</code> element have a <code>type</code> attribute with 3 1448allowed values "bullets", "ordered" or "glossary" and which default to 1449"ordered" if the attribute is not explicitely specified.</p> 1450 1451<p>The content type of an attribute can be text (<code>CDATA</code>), 1452anchor/reference/references 1453(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies) 1454(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s) 1455(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a 1456<code>chapter</code> element can have an optional <code>id</code> attribute 1457of type <code>ID</code>, usable for reference from attribute of type 1458IDREF:</p> 1459 1460<p><code><!ATTLIST chapter id ID #IMPLIED></code></p> 1461 1462<p>The last value of an attribute definition can be <code>#REQUIRED 1463</code>meaning that the attribute has to be given, <code>#IMPLIED</code> 1464meaning that it is optional, or the default value (possibly prefixed by 1465<code>#FIXED</code> if it is the only allowed).</p> 1466 1467<p>Notes:</p> 1468<ul> 1469 <li>usually the attributes pertaining to a given element are declared in a 1470 single expression, but it is just a convention adopted by a lot of DTD 1471 writers: 1472 <pre><!ATTLIST termdef 1473 id ID #REQUIRED 1474 name CDATA #IMPLIED></pre> 1475 <p>The previous construct defines both <code>id</code> and 1476 <code>name</code> attributes for the element <code>termdef</code></p> 1477 </li> 1478</ul> 1479 1480<h3><a name="Some1">Some examples</a></h3> 1481 1482<p>The directory <code>test/valid/dtds/</code> in the libxml distribution 1483contains some complex DTD examples. The <code>test/valid/dia.xml</code> 1484example shows an XML file where the simple DTD is directly included within 1485the document.</p> 1486 1487<h3><a name="validate1">How to validate</a></h3> 1488 1489<p>The simplest is to use the xmllint program comming with libxml. The 1490<code>--valid</code> option turn on validation of the files given as input, 1491for example the following validates a copy of the first revision of the XML 14921.0 specification:</p> 1493 1494<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p> 1495 1496<p>the -- noout is used to not output the resulting tree.</p> 1497 1498<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against 1499a given DTD.</p> 1500 1501<p>Libxml exports an API to handle DTDs and validation, check the <a 1502href="http://xmlsoft.org/html/libxml-valid.html">associated 1503description</a>.</p> 1504 1505<h3><a name="Other1">Other resources</a></h3> 1506 1507<p>DTDs are as old as SGML. So there may be a number of examples on-line, I 1508will just list one for now, others pointers welcome:</p> 1509<ul> 1510 <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li> 1511</ul> 1512 1513<p>I suggest looking at the examples found under test/valid/dtd and any of 1514the large number of books available on XML. The dia example in test/valid 1515should be both simple and complete enough to allow you to build your own.</p> 1516 1517<p></p> 1518 1519<h2><a name="Memory">Memory Management</a></h2> 1520 1521<p>Table of Content:</p> 1522<ol> 1523 <li><a href="#General3">General overview</a></li> 1524 <li><a href="#setting">Setting libxml set of memory 1525 routines</a></li> 1526 <li><a href="#cleanup">Cleaning up after parsing</a></li> 1527 <li><a href="#Debugging">Debugging routines</a></li> 1528 <li><a href="#General4">General memory requirements</a></li> 1529</ol> 1530 1531<h3><a name="General3">General overview</a></h3> 1532 1533<p>The module <code><a 1534href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlmemory.h</a></code> 1535provides the interfaces to the libxml memory system:</p> 1536<ul> 1537 <li>libxml does not use the libc memory allocator directly but xmlFree(), 1538 xmlMalloc() and xmlRealloc()</li> 1539 <li>those routines can be reallocated to a specific set of routine, by 1540 default the libc ones i.e. free(), malloc() and realloc()</li> 1541 <li>the xmlmemory.c module includes a set of debugging routine</li> 1542</ul> 1543 1544<h3><a name="setting">Setting libxml set of memory routines</a></h3> 1545 1546<p>It is sometimes useful to not use the default memory allocator, either for 1547debugging, analysis or to implement a specific behaviour on memory management 1548(like on embedded systems). Two function calls are available to do so:</p> 1549<ul> 1550 <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemGet ()</a> 1551 which return the current set of functions in use by the parser</li> 1552 <li><a 1553 href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemSetup()</a> 1554 which allow to set up a new set of memory allocation functions</li> 1555</ul> 1556 1557<p>Of course a call to xmlMemSetup() should probably be done before calling 1558any other libxml routines (unless you are sure your allocations routines are 1559compatibles).</p> 1560 1561<h3><a name="cleanup">Cleaning up after parsing</a></h3> 1562 1563<p>Libxml is not stateless, there is a few set of memory structures needing 1564allocation before the parser is fully functionnal (some encoding structures 1565for example). This also mean that once parsing is finished there is a tiny 1566amount of memory (a few hundred bytes) which can be recollected if you don't 1567reuse the parser immediately:</p> 1568<ul> 1569 <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlCleanupParser 1570 ()</a> 1571 is a centralized routine to free the parsing states. Note that it won't 1572 deallocate any produced tree if any (use the xmlFreeDoc() and related 1573 routines for this).</li> 1574 <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlInitParser 1575 ()</a> 1576 is the dual routine allowing to preallocate the parsing state which can 1577 be useful for example to avoid initialization reentrancy problems when 1578 using libxml in multithreaded applications</li> 1579</ul> 1580 1581<p>Generally xmlCleanupParser() is safe, if needed the state will be rebuild 1582at the next invocation of parser routines, but be careful of the consequences 1583in multithreaded applications.</p> 1584 1585<h3><a name="Debugging">Debugging routines</a></h3> 1586 1587<p>When configured using --with-mem-debug flag (off by default), libxml uses 1588a set of memory allocation debugging routineskeeping track of all allocated 1589blocks and the location in the code where the routine was called. A couple of 1590other debugging routines allow to dump the memory allocated infos to a file 1591or call a specific routine when a given block number is allocated:</p> 1592<ul> 1593 <li><a 1594 href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMallocLoc()</a> 1595 <a 1596 href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlReallocLoc()</a> 1597 and <a 1598 href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemStrdupLoc()</a> 1599 are the memory debugging replacement allocation routines</li> 1600 <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemoryDump 1601 ()</a> 1602 dumps all the informations about the allocated memory block lefts in the 1603 <code>.memdump</code> file</li> 1604</ul> 1605 1606<p>When developping libxml memory debug is enabled, the tests programs call 1607xmlMemoryDump () and the "make test" regression tests will check for any 1608memory leak during the full regression test sequence, this helps a lot 1609ensuring that libxml does not leak memory and bullet proof memory 1610allocations use (some libc implementations are known to be far too permissive 1611resulting in major portability problems!).</p> 1612 1613<p>If the .memdump reports a leak, it displays the allocation function and 1614also tries to give some informations about the content and structure of the 1615allocated blocks left. This is sufficient in most cases to find the culprit, 1616but not always. Assuming the allocation problem is reproductible, it is 1617possible to find more easilly:</p> 1618<ol> 1619 <li>write down the block number xxxx not allocated</li> 1620 <li>export the environement variable XML_MEM_BREAKPOINT=xxxx</li> 1621 <li>run the program under a debugger and set a breakpoint on 1622 xmlMallocBreakpoint() a specific function called when this precise block 1623 is allocated</li> 1624 <li>when the breakpoint is reached you can then do a fine analysis of the 1625 allocation an step to see the condition resulting in the missing 1626 deallocation.</li> 1627</ol> 1628 1629<p>I used to use a commercial tool to debug libxml memory problems but after 1630noticing that it was not detecting memory leaks that simple mechanism was 1631used and proved extremely efficient until now.</p> 1632 1633<h3><a name="General4">General memory requirements</a></h3> 1634 1635<p>How much libxml memory require ? It's hard to tell in average it depends 1636of a number of things:</p> 1637<ul> 1638 <li>the parser itself should work in a fixed amout of memory, except for 1639 information maintained about the stacks of names and entities locations. 1640 The I/O and encoding handlers will probably account for a few KBytes. 1641 This is true for both the XML and HTML parser (though the HTML parser 1642 need more state).</li> 1643 <li>If you are generating the DOM tree then memory requirements will grow 1644 nearly lineary with the size of the data. In general for a balanced 1645 textual document the internal memory requirement is about 4 times the 1646 size of the UTF8 serialization of this document (exmple the XML-1.0 1647 recommendation is a bit more of 150KBytes and takes 650KBytes of main 1648 memory when parsed). Validation will add a amount of memory required for 1649 maintaining the external Dtd state which should be linear with the 1650 complexity of the content model defined by the Dtd</li> 1651 <li>If you don't care about the advanced features of libxml like 1652 validation, DOM, XPath or XPointer, but really need to work fixed memory 1653 requirements, then the SAX interface should be used.</li> 1654</ul> 1655 1656<p></p> 1657 1658<h2><a name="Encodings">Encodings support</a></h2> 1659 1660<p>Table of Content:</p> 1661<ol> 1662 <li><a href="encoding.html#What">What does internationalization support 1663 mean ?</a></li> 1664 <li><a href="encoding.html#internal">The internal encoding, how and 1665 why</a></li> 1666 <li><a href="encoding.html#implemente">How is it implemented ?</a></li> 1667 <li><a href="encoding.html#Default">Default supported encodings</a></li> 1668 <li><a href="encoding.html#extend">How to extend the existing 1669 support</a></li> 1670</ol> 1671 1672<h3><a name="What">What does internationalization support mean ?</a></h3> 1673 1674<p>XML was designed from the start to allow the support of any character set 1675by using Unicode. Any conformant XML parser has to support the UTF-8 and 1676UTF-16 default encodings which can both express the full unicode ranges. UTF8 1677is a variable length encoding whose greatest point are to resuse the same 1678emcoding for ASCII and to save space for Western encodings, but it is a bit 1679more complex to handle in practice. UTF-16 use 2 bytes per characters (and 1680sometimes combines two pairs), it makes implementation easier, but looks a 1681bit overkill for Western languages encoding. Moreover the XML specification 1682allows document to be encoded in other encodings at the condition that they 1683are clearly labelled as such. For example the following is a wellformed XML 1684document encoded in ISO-8859 1 and using accentuated letter that we French 1685likes for both markup and content:</p> 1686<pre><?xml version="1.0" encoding="ISO-8859-1"?> 1687<tr�s>l�</tr�s></pre> 1688 1689<p>Having internationalization support in libxml means the foolowing:</p> 1690<ul> 1691 <li>the document is properly parsed</li> 1692 <li>informations about it's encoding are saved</li> 1693 <li>it can be modified</li> 1694 <li>it can be saved in its original encoding</li> 1695 <li>it can also be saved in another encoding supported by libxml (for 1696 example straight UTF8 or even an ASCII form)</li> 1697</ul> 1698 1699<p>Another very important point is that the whole libxml API, with the 1700exception of a few routines to read with a specific encoding or save to a 1701specific encoding, is completely agnostic about the original encoding of the 1702document.</p> 1703 1704<p>It should be noted too that the HTML parser embedded in libxml now obbey 1705the same rules too, the following document will be (as of 2.2.2) handled in 1706an internationalized fashion by libxml too:</p> 1707<pre><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 1708 "http://www.w3.org/TR/REC-html40/loose.dtd"> 1709<html lang="fr"> 1710<head> 1711 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> 1712</head> 1713<body> 1714<p>W3C cr�e des standards pour le Web.</body> 1715</html></pre> 1716 1717<h3><a name="internal">The internal encoding, how and why</a></h3> 1718 1719<p>One of the core decision was to force all documents to be converted to a 1720default internal encoding, and that encoding to be UTF-8, here are the 1721rationale for those choices:</p> 1722<ul> 1723 <li>keeping the native encoding in the internal form would force the libxml 1724 users (or the code associated) to be fully aware of the encoding of the 1725 original document, for examples when adding a text node to a document, 1726 the content would have to be provided in the document encoding, i.e. the 1727 client code would have to check it before hand, make sure it's conformant 1728 to the encoding, etc ... Very hard in practice, though in some specific 1729 cases this may make sense.</li> 1730 <li>the second decision was which encoding. From the XML spec only UTF8 and 1731 UTF16 really makes sense as being the two only encodings for which there 1732 is amndatory support. UCS-4 (32 bits fixed size encoding) could be 1733 considered an intelligent choice too since it's a direct Unicode mapping 1734 support. I selected UTF-8 on the basis of efficiency and compatibility 1735 with surrounding software: 1736 <ul> 1737 <li>UTF-8 while a bit more complex to convert from/to (i.e. slightly 1738 more costly to import and export CPU wise) is also far more compact 1739 than UTF-16 (and UCS-4) for a majority of the documents I see it used 1740 for right now (RPM RDF catalogs, advogato data, various configuration 1741 file formats, etc.) and the key point for today's computer 1742 architecture is efficient uses of caches. If one nearly double the 1743 memory requirement to store the same amount of data, this will trash 1744 caches (main memory/external caches/internal caches) and my take is 1745 that this harms the system far more than the CPU requirements needed 1746 for the conversion to UTF-8</li> 1747 <li>Most of libxml version 1 users were using it with straight ASCII 1748 most of the time, doing the conversion with an internal encoding 1749 requiring all their code to be rewritten was a serious show-stopper 1750 for using UTF-16 or UCS-4.</li> 1751 <li>UTF-8 is being used as the de-facto internal encoding standard for 1752 related code like the <a href="http://www.pango.org/">pango</a> 1753 upcoming Gnome text widget, and a lot of Unix code (yep another place 1754 where Unix programmer base takes a different approach from Microsoft 1755 - they are using UTF-16)</li> 1756 </ul> 1757 </li> 1758</ul> 1759 1760<p>What does this mean in practice for the libxml user:</p> 1761<ul> 1762 <li>xmlChar, the libxml data type is a byte, those bytes must be assembled 1763 as UTF-8 valid strings. The proper way to terminate an xmlChar * string 1764 is simply to append 0 byte, as usual.</li> 1765 <li>One just need to make sure that when using chars outside the ASCII set, 1766 the values has been properly converted to UTF-8</li> 1767</ul> 1768 1769<h3><a name="implemente">How is it implemented ?</a></h3> 1770 1771<p>Let's describe how all this works within libxml, basically the I18N 1772(internationalization) support get triggered only during I/O operation, i.e. 1773when reading a document or saving one. Let's look first at the reading 1774sequence:</p> 1775<ol> 1776 <li>when a document is processed, we usually don't know the encoding, a 1777 simple heuristic allows to detect UTF-18 and UCS-4 from whose where the 1778 ASCII range (0-0x7F) maps with ASCII</li> 1779 <li>the xml declaration if available is parsed, including the encoding 1780 declaration. At that point, if the autodetected encoding is different 1781 from the one declared a call to xmlSwitchEncoding() is issued.</li> 1782 <li>If there is no encoding declaration, then the input has to be in either 1783 UTF-8 or UTF-16, if it is not then at some point when processing the 1784 input, the converter/checker of UTF-8 form will raise an encoding error. 1785 You may end-up with a garbled document, or no document at all ! Example: 1786 <pre>~/XML -> /xmllint err.xml 1787err.xml:1: error: Input is not proper UTF-8, indicate encoding ! 1788<tr�s>l�</tr�s> 1789 ^ 1790err.xml:1: error: Bytes: 0xE8 0x73 0x3E 0x6C 1791<tr�s>l�</tr�s> 1792 ^</pre> 1793 </li> 1794 <li>xmlSwitchEncoding() does an encoding name lookup, canonalize it, and 1795 then search the default registered encoding converters for that encoding. 1796 If it's not within the default set and iconv() support has been compiled 1797 it, it will ask iconv for such an encoder. If this fails then the parser 1798 will report an error and stops processing: 1799 <pre>~/XML -> /xmllint err2.xml 1800err2.xml:1: error: Unsupported encoding UnsupportedEnc 1801<?xml version="1.0" encoding="UnsupportedEnc"?> 1802 ^</pre> 1803 </li> 1804 <li>From that point the encoder process progressingly the input (it is 1805 plugged as a front-end to the I/O module) for that entity. It captures 1806 and convert on-the-fly the document to be parsed to UTF-8. The parser 1807 itself just does UTF-8 checking of this input and process it 1808 transparently. The only difference is that the encoding information has 1809 been added to the parsing context (more precisely to the input 1810 corresponding to this entity).</li> 1811 <li>The result (when using DOM) is an internal form completely in UTF-8 1812 with just an encoding information on the document node.</li> 1813</ol> 1814 1815<p>Ok then what's happen when saving the document (assuming you 1816colllected/built an xmlDoc DOM like structure) ? It depends on the function 1817called, xmlSaveFile() will just try to save in the original encoding, while 1818xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given 1819encoding:</p> 1820<ol> 1821 <li>if no encoding is given, libxml will look for an encoding value 1822 associated to the document and if it exists will try to save to that 1823 encoding, 1824 <p>otherwise everything is written in the internal form, i.e. UTF-8</p> 1825 </li> 1826 <li>so if an encoding was specified, either at the API level or on the 1827 document, libxml will again canonalize the encoding name, lookup for a 1828 converter in the registered set or through iconv. If not found the 1829 function will return an error code</li> 1830 <li>the converter is placed before the I/O buffer layer, as another kind of 1831 buffer, then libxml will simply push the UTF-8 serialization to through 1832 that buffer, which will then progressively be converted and pushed onto 1833 the I/O layer.</li> 1834 <li>It is possible that the converter code fails on some input, for example 1835 trying to push an UTF-8 encoded chinese character through the UTF-8 to 1836 ISO-8859-1 converter won't work. Since the encoders are progressive they 1837 will just report the error and the number of bytes converted, at that 1838 point libxml will decode the offending character, remove it from the 1839 buffer and replace it with the associated charRef encoding &#123; and 1840 resume the convertion. This guarante that any document will be saved 1841 without losses (except for markup names where this is not legal, this is 1842 a problem in the current version, in pactice avoid using non-ascci 1843 characters for tags or attributes names @@). A special "ascii" encoding 1844 name is used to save documents to a pure ascii form can be used when 1845 portability is really crucial</li> 1846</ol> 1847 1848<p>Here is a few examples based on the same test document:</p> 1849<pre>~/XML -> /xmllint isolat1 1850<?xml version="1.0" encoding="ISO-8859-1"?> 1851<tr�s>l�</tr�s> 1852~/XML -> /xmllint --encode UTF-8 isolat1 1853<?xml version="1.0" encoding="UTF-8"?> 1854<très>l� �</très> 1855~/XML -> </pre> 1856 1857<p>The same processing is applied (and reuse most of the code) for HTML I18N 1858processing. Looking up and modifying the content encoding is a bit more 1859difficult since it is located in a <meta> tag under the <head>, 1860so a couple of functions htmlGetMetaEncoding() and htmlSetMetaEncoding() have 1861been provided. The parser also attempts to switch encoding on the fly when 1862detecting such a tag on input. Except for that the processing is the same 1863(and again reuses the same code).</p> 1864 1865<h3><a name="Default">Default supported encodings</a></h3> 1866 1867<p>libxml has a set of default converters for the following encodings 1868(located in encoding.c):</p> 1869<ol> 1870 <li>UTF-8 is supported by default (null handlers)</li> 1871 <li>UTF-16, both little and big endian</li> 1872 <li>ISO-Latin-1 (ISO-8859-1) covering most western languages</li> 1873 <li>ASCII, useful mostly for saving</li> 1874 <li>HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML 1875 predefined entities like &copy; for the Copyright sign.</li> 1876</ol> 1877 1878<p>More over when compiled on an Unix platfor with iconv support the full set 1879of encodings supported by iconv can be instantly be used by libxml. On a 1880linux machine with glibc-2.1 the list of supported encodings and aliases fill 18813 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the 1882various Japanese ones.</p> 1883 1884<h4>Encoding aliases</h4> 1885 1886<p>From 2.2.3, libxml has support to register encoding names aliases. The 1887goal is to be able to parse document whose encoding is supported but where 1888the name differs (for example from the default set of names accepted by 1889iconv). The following functions allow to register and handle new aliases for 1890existing encodings. Once registered libxml will automatically lookup the 1891aliases when handling a document:</p> 1892<ul> 1893 <li>int xmlAddEncodingAlias(const char *name, const char *alias);</li> 1894 <li>int xmlDelEncodingAlias(const char *alias);</li> 1895 <li>const char * xmlGetEncodingAlias(const char *alias);</li> 1896 <li>void xmlCleanupEncodingAliases(void);</li> 1897</ul> 1898 1899<h3><a name="extend">How to extend the existing support</a></h3> 1900 1901<p>Well adding support for new encoding, or overriding one of the encoders 1902(assuming it is buggy) should not be hard, just write an input and output 1903conversion routines to/from UTF-8, and register them using 1904xmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx), and they will be 1905called automatically if the parser(s) encounter such an encoding name 1906(register it uppercase, this will help). The description of the encoders, 1907their arguments and expected return values are described in the encoding.h 1908header.</p> 1909 1910<p>A quick note on the topic of subverting the parser to use a different 1911internal encoding than UTF-8, in some case people will absolutely want to 1912keep the internal encoding different, I think it's still possible (but the 1913encoding must be compliant with ASCII on the same subrange) though I didn't 1914tried it. The key is to override the default conversion routines (by 1915registering null encoders/decoders for your charsets), and bypass the UTF-8 1916checking of the parser by setting the parser context charset 1917(ctxt->charset) to something different than XML_CHAR_ENCODING_UTF8, but 1918there is no guarantee taht this will work. You may also have some troubles 1919saving back.</p> 1920 1921<p>Basically proper I18N support is important, this requires at least 1922libxml-2.0.0, but a lot of features and corrections are really available only 1923starting 2.2.</p> 1924 1925<h2><a name="IO">I/O Interfaces</a></h2> 1926 1927<p>Table of Content:</p> 1928<ol> 1929 <li><a href="#General1">General overview</a></li> 1930 <li><a href="#basic">The basic buffer type</a></li> 1931 <li><a href="#Input">Input I/O handlers</a></li> 1932 <li><a href="#Output">Output I/O handlers</a></li> 1933 <li><a href="#entities">The entities loader</a></li> 1934 <li><a href="#Example2">Example of customized I/O</a></li> 1935</ol> 1936 1937<h3><a name="General1">General overview</a></h3> 1938 1939<p>The module <code><a 1940href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides 1941the interfaces to the libxml I/O system. This consists of 4 main parts:</p> 1942<ul> 1943 <li>Entities loader, this is a routine which tries to fetch the entities 1944 (files) based on their PUBLIC and SYSTEM identifiers. The default loader 1945 don't look at the public identifier since libxml do not maintain a 1946 catalog. You can redefine you own entity loader by using 1947 <code>xmlGetExternalEntityLoader()</code> and 1948 <code>xmlSetExternalEntityLoader()</code>. <a 1949 href="#entities">Check the example</a>.</li> 1950 <li>Input I/O buffers which are a commodity structure used by the parser(s) 1951 input layer to handle fetching the informations to feed the parser. This 1952 provides buffering and is also a placeholder where the encoding 1953 convertors to UTF8 are piggy-backed.</li> 1954 <li>Output I/O buffers are similar to the Input ones and fulfill similar 1955 task but when generating a serialization from a tree.</li> 1956 <li>A mechanism to register sets of I/O callbacks and associate them with 1957 specific naming schemes like the protocol part of the URIs. 1958 <p>This affect the default I/O operations and allows to use specific I/O 1959 handlers for certain names.</p> 1960 </li> 1961</ul> 1962 1963<p>The general mechanism used when loading http://rpmfind.net/xml.html for 1964example in the HTML parser is the following:</p> 1965<ol> 1966 <li>The default entity loader calls <code>xmlNewInputFromFile()</code> with 1967 the parsing context and the URI string.</li> 1968 <li>the URI string is checked against the existing registered handlers 1969 using their match() callback function, if the HTTP module was compiled 1970 in, it is registered and its match() function will succeeds</li> 1971 <li>the open() function of the handler is called and if successful will 1972 return an I/O Input buffer</li> 1973 <li>the parser will the start reading from this buffer and progressively 1974 fetch information from the resource, calling the read() function of the 1975 handler until the resource is exhausted</li> 1976 <li>if an encoding change is detected it will be installed on the input 1977 buffer, providing buffering and efficient use of the conversion 1978 routines</li> 1979 <li>once the parser has finished, the close() function of the handler is 1980 called once and the Input buffer and associed resources are 1981 deallocated.</li> 1982</ol> 1983 1984<p>The user defined callbacks are checked first to allow overriding of the 1985default libxml I/O routines.</p> 1986 1987<h3><a name="basic">The basic buffer type</a></h3> 1988 1989<p>All the buffer manipulation handling is done using the 1990<code>xmlBuffer</code> type define in <code><a 1991href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a 1992resizable memory buffer. The buffer allocation strategy can be selected to be 1993either best-fit or use an exponential doubling one (CPU vs. memory use 1994tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and 1995<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a 1996system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number 1997of functions allows to manipulate buffers with names starting with the 1998<code>xmlBuffer...</code> prefix.</p> 1999 2000<h3><a name="Input">Input I/O handlers</a></h3> 2001 2002<p>An Input I/O handler is a simple structure 2003<code>xmlParserInputBuffer</code> containing a context associated to the 2004resource (file descriptor, or pointer to a protocol handler), the read() and 2005close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset 2006encoding handler are also present to support charset conversion when 2007needed.</p> 2008 2009<h3><a name="Output">Output I/O handlers</a></h3> 2010 2011<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an 2012Input one except the callbacks are write() and close().</p> 2013 2014<h3><a name="entities">The entities loader</a></h3> 2015 2016<p>The entity loader resolves requests for new entities and create inputs for 2017the parser. Creating an input from a filename or an URI string is done 2018through the xmlNewInputFromFile() routine. The default entity loader do not 2019handle the PUBLIC identifier associated with an entity (if any). So it just 2020calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in 2021XML).</p> 2022 2023<p>If you want to hook up a catalog mechanism then you simply need to 2024override the default entity loader, here is an example:</p> 2025<pre>#include <libxml/xmlIO.h> 2026 2027xmlExternalEntityLoader defaultLoader = NULL; 2028 2029xmlParserInputPtr 2030xmlMyExternalEntityLoader(const char *URL, const char *ID, 2031 xmlParserCtxtPtr ctxt) { 2032 xmlParserInputPtr ret; 2033 const char *fileID = NULL; 2034 /* lookup for the fileID depending on ID */ 2035 2036 ret = xmlNewInputFromFile(ctxt, fileID); 2037 if (ret != NULL) 2038 return(ret); 2039 if (defaultLoader != NULL) 2040 ret = defaultLoader(URL, ID, ctxt); 2041 return(ret); 2042} 2043 2044int main(..) { 2045 ... 2046 2047 /* 2048 * Install our own entity loader 2049 */ 2050 defaultLoader = xmlGetExternalEntityLoader(); 2051 xmlSetExternalEntityLoader(xmlMyExternalEntityLoader); 2052 2053 ... 2054}</pre> 2055 2056<h3><a name="Example2">Example of customized I/O</a></h3> 2057 2058<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a 2059real use case</a>, xmlDocDump() closes the FILE * passed by the application 2060and this was a problem. The <a 2061href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a 2062new output handler with the closing call deactivated:</p> 2063<ol> 2064 <li>First define a new I/O ouput allocator where the output don't close the 2065 file: 2066 <pre>xmlOutputBufferPtr 2067xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) { 2068����xmlOutputBufferPtr ret; 2069���� 2070����if (xmlOutputCallbackInitialized == 0) 2071��������xmlRegisterDefaultOutputCallbacks(); 2072 2073����if (file == NULL) return(NULL); 2074����ret = xmlAllocOutputBuffer(encoder); 2075����if (ret != NULL) { 2076��������ret->context = file; 2077��������ret->writecallback = xmlFileWrite; 2078��������ret->closecallback = NULL; /* No close callback */ 2079����} 2080����return(ret); <br> 2081 2082 2083 2084} </pre> 2085 </li> 2086 <li>And then use it to save the document: 2087 <pre>FILE *f; 2088xmlOutputBufferPtr output; 2089xmlDocPtr doc; 2090int res; 2091 2092f = ... 2093doc = .... 2094 2095output = xmlOutputBufferCreateOwn(f, NULL); 2096res = xmlSaveFileTo(output, doc, NULL); 2097 </pre> 2098 </li> 2099</ol> 2100 2101<h2><a name="Catalog">Catalog support</a></h2> 2102 2103<p>Table of Content:</p> 2104<ol> 2105 <li><a href="General2">General overview</a></li> 2106 <li><a href="#definition">The definition</a></li> 2107 <li><a href="#Simple">Using catalogs</a></li> 2108 <li><a href="#Some">Some examples</a></li> 2109 <li><a href="#reference">How to tune catalog usage</a></li> 2110 <li><a href="#validate">How to debug catalog processing</a></li> 2111 <li><a href="#Declaring">How to create and maintain catalogs</a></li> 2112 <li><a href="#implemento">The implementor corner quick review of the 2113 API</a></li> 2114 <li><a href="#Other">Other resources</a></li> 2115</ol> 2116 2117<h3><a name="General2">General overview</a></h3> 2118 2119<p>What is a catalog? Basically it's a lookup mechanism used when an entity 2120(a file or a remote resource) references another entity. The catalog lookup 2121is inserted between the moment the reference is recognized by the software 2122(XML parser, stylesheet processing, or even images referenced for inclusion 2123in a rendering) and the time where loading that resource is actually 2124started.</p> 2125 2126<p>It is basically used for 3 things:</p> 2127<ul> 2128 <li>mapping from "logical" names, the public identifiers and a more 2129 concrete name usable for download (and URI). For example it can associate 2130 the logical name 2131 <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p> 2132 <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be 2133 downloaded</p> 2134 <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p> 2135 </li> 2136 <li>remapping from a given URL to another one, like an HTTP indirection 2137 saying that 2138 <p>"http://www.oasis-open.org/committes/tr.xsl"</p> 2139 <p>should really be looked at</p> 2140 <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p> 2141 </li> 2142 <li>providing a local cache mechanism allowing to load the entities 2143 associated to public identifiers or remote resources, this is a really 2144 important feature for any significant deployment of XML or SGML since it 2145 allows to avoid the aleas and delays associated to fetching remote 2146 resources.</li> 2147</ul> 2148 2149<h3><a name="definition">The definitions</a></h3> 2150 2151<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p> 2152<ul> 2153 <li>the older SGML catalogs, the official spec is SGML Open Technical 2154 Resolution TR9401:1997, but is better understood by reading <a 2155 href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from 2156 James Clark. This is relatively old and not the preferred mode of 2157 operation of libxml.</li> 2158 <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML 2159 Catalogs</a> 2160 is far more flexible, more recent, uses an XML syntax and should scale 2161 quite better. This is the default option of libxml.</li> 2162</ul> 2163 2164<p></p> 2165 2166<h3><a name="Simple">Using catalog</a></h3> 2167 2168<p>In a normal environment libxml will by default check the presence of a 2169catalog in /etc/xml/catalog, and assuming it has been correctly populated, 2170the processing is completely transparent to the document user. To take a 2171concrete example, suppose you are authoring a DocBook document, this one 2172starts with the following DOCTYPE definition:</p> 2173<pre><?xml version='1.0'?> 2174<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" 2175 "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre> 2176 2177<p>When validating the document with libxml, the catalog will be 2178automatically consulted to lookup the public identifier "-//Norman Walsh//DTD 2179DocBk XML V3.1.4//EN" and the system identifier 2180"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have 2181been installed on your system and the catalogs actually point to them, libxml 2182will fetch them from the local disk.</p> 2183 2184<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this 2185DOCTYPE example it's a really old version, but is fine as an example.</p> 2186 2187<p>Libxml will check the catalog each time that it is requested to load an 2188entity, this includes DTD, external parsed entities, stylesheets, etc ... If 2189your system is correctly configured all the authoring phase and processing 2190should use only local files, even if your document stays portable because it 2191uses the canonical public and system ID, referencing the remote document.</p> 2192 2193<h3><a name="Some">Some examples:</a></h3> 2194 2195<p>Here is a couple of fragments from XML Catalogs used in libxml early 2196regression tests in <code>test/catalogs</code> :</p> 2197<pre><?xml version="1.0"?> 2198<!DOCTYPE catalog PUBLIC 2199 "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 2200 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 2201<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 2202 <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" 2203 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> 2204...</pre> 2205 2206<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are 2207written in XML, there is a specific namespace for catalog elements 2208"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this 2209catalog is a <code>public</code> mapping it allows to associate a Public 2210Identifier with an URI.</p> 2211<pre>... 2212 <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/" 2213 rewritePrefix="file:///usr/share/xml/docbook/"/> 2214...</pre> 2215 2216<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that 2217any URI starting with a given prefix should be looked at another URI 2218constructed by replacing the prefix with an new one. In effect this acts like 2219a cache system for a full area of the Web. In practice it is extremely useful 2220with a file prefix if you have installed a copy of those resources on your 2221local system.</p> 2222<pre>... 2223<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //" 2224 catalog="file:///usr/share/xml/docbook.xml"/> 2225<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML" 2226 catalog="file:///usr/share/xml/docbook.xml"/> 2227<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML" 2228 catalog="file:///usr/share/xml/docbook.xml"/> 2229<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/" 2230 catalog="file:///usr/share/xml/docbook.xml"/> 2231<delegateURI uriStartString="http://www.oasis-open.org/docbook/" 2232 catalog="file:///usr/share/xml/docbook.xml"/> 2233...</pre> 2234 2235<p>Delegation is the core features which allows to build a tree of catalogs, 2236easier to maintain than a single catalog, based on Public Identifier, System 2237Identifier or URI prefixes it instructs the catalog software to look up 2238entries in another resource. This feature allow to build hierarchies of 2239catalogs, the set of entries presented should be sufficient to redirect the 2240resolution of all DocBook references to the specific catalog in 2241<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all 2242references for DocBook 4.2.1 to a specific catalog installed at the same time 2243as the DocBook resources on the local machine.</p> 2244 2245<h3><a name="reference">How to tune catalog usage:</a></h3> 2246 2247<p>The user can change the default catalog behaviour by redirecting queries 2248to its own set of catalogs, this can be done by setting the 2249<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an 2250empty one should deactivate loading the default <code>/etc/xml/catalog</code> 2251default catalog</p> 2252 2253<h3><a name="validate">How to debug catalog processing:</a></h3> 2254 2255<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will 2256make libxml output debugging informations for each catalog operations, for 2257example:</p> 2258<pre>orchis:~/XML -> xmllint --memory --noout test/ent2 2259warning: failed to load external entity "title.xml" 2260orchis:~/XML -> export XML_DEBUG_CATALOG= 2261orchis:~/XML -> xmllint --memory --noout test/ent2 2262Failed to parse catalog /etc/xml/catalog 2263Failed to parse catalog /etc/xml/catalog 2264warning: failed to load external entity "title.xml" 2265Catalogs cleanup 2266orchis:~/XML -> </pre> 2267 2268<p>The test/ent2 references an entity, running the parser from memory makes 2269the base URI unavailable and the the "title.xml" entity cannot be loaded. 2270Setting up the debug environment variable allows to detect that an attempt is 2271made to load the <code>/etc/xml/catalog</code> but since it's not present the 2272resolution fails.</p> 2273 2274<p>But the most advanced way to debug XML catalog processing is to use the 2275<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load 2276catalogs and make resolution queries to see what is going on. This is also 2277used for the regression tests:</p> 2278<pre>orchis:~/XML -> /xmlcatalog test/catalogs/docbook.xml \ 2279 "-//OASIS//DTD DocBook XML V4.1.2//EN" 2280http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 2281orchis:~/XML -> </pre> 2282 2283<p>For debugging what is going on, adding one -v flags increase the verbosity 2284level to indicate the processing done (adding a second flag also indicate 2285what elements are recognized at parsing):</p> 2286<pre>orchis:~/XML -> /xmlcatalog -v test/catalogs/docbook.xml \ 2287 "-//OASIS//DTD DocBook XML V4.1.2//EN" 2288Parsing catalog test/catalogs/docbook.xml's content 2289Found public match -//OASIS//DTD DocBook XML V4.1.2//EN 2290http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 2291Catalogs cleanup 2292orchis:~/XML -> </pre> 2293 2294<p>A shell interface is also available to debug and process multiple queries 2295(and for regression tests):</p> 2296<pre>orchis:~/XML -> /xmlcatalog -shell test/catalogs/docbook.xml \ 2297 "-//OASIS//DTD DocBook XML V4.1.2//EN" 2298> help 2299Commands available: 2300public PublicID: make a PUBLIC identifier lookup 2301system SystemID: make a SYSTEM identifier lookup 2302resolve PublicID SystemID: do a full resolver lookup 2303add 'type' 'orig' 'replace' : add an entry 2304del 'values' : remove values 2305dump: print the current catalog state 2306debug: increase the verbosity level 2307quiet: decrease the verbosity level 2308exit: quit the shell 2309> public "-//OASIS//DTD DocBook XML V4.1.2//EN" 2310http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd 2311> quit 2312orchis:~/XML -> </pre> 2313 2314<p>This should be sufficient for most debugging purpose, this was actually 2315used heavily to debug the XML Catalog implementation itself.</p> 2316 2317<h3><a name="Declaring">How to create and maintain</a> catalogs:</h3> 2318 2319<p>Basically XML Catalogs are XML files, you can either use XML tools to 2320manage them or use <strong>xmlcatalog</strong> for this. The basic step is 2321to create a catalog the -create option provide this facility:</p> 2322<pre>orchis:~/XML -> /xmlcatalog --create tst.xml 2323<?xml version="1.0"?> 2324<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 2325 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 2326<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> 2327orchis:~/XML -> </pre> 2328 2329<p>By default xmlcatalog does not overwrite the original catalog and save the 2330result on the standard output, this can be overridden using the -noout 2331option. The <code>-add</code> command allows to add entries in the 2332catalog:</p> 2333<pre>orchis:~/XML -> /xmlcatalog --noout --create --add "public" \ 2334 "-//OASIS//DTD DocBook XML V4.1.2//EN" \ 2335 http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml 2336orchis:~/XML -> cat tst.xml 2337<?xml version="1.0"?> 2338<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \ 2339 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 2340<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> 2341<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" 2342 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> 2343</catalog> 2344orchis:~/XML -> </pre> 2345 2346<p>The <code>-add</code> option will always take 3 parameters even if some of 2347the XML Catalog constructs (like nextCatalog) will have only a single 2348argument, just pass a third empty string, it will be ignored.</p> 2349 2350<p>Similarly the <code>-del</code> option remove matching entries from the 2351catalog:</p> 2352<pre>orchis:~/XML -> /xmlcatalog --del \ 2353 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml 2354<?xml version="1.0"?> 2355<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" 2356 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 2357<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> 2358orchis:~/XML -> </pre> 2359 2360<p>The catalog is now empty. Note that the matching of <code>-del</code> is 2361exact and would have worked in a similar fashion with the Public ID 2362string.</p> 2363 2364<p>This is rudimentary but should be sufficient to manage a not too complex 2365catalog tree of resources.</p> 2366 2367<h3><a name="implemento">The implementor corner quick review of the 2368API:</a></h3> 2369 2370<p>First, and like for every other module of libxml, there is an 2371automatically generated <a href="html/libxml-catalog.html">API page for 2372catalog support</a>.</p> 2373 2374<p>The header for the catalog interfaces should be included as:</p> 2375<pre>#include <libxml/catalog.h></pre> 2376 2377<p>The API is voluntarily kept very simple. First it is not obvious that 2378applications really need access to it since it is the default behaviour of 2379libxml (Note: it is possible to completely override libxml default catalog by 2380using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to 2381plug an application specific resolver).</p> 2382 2383<p>Basically libxml support 2 catalog lists:</p> 2384<ul> 2385 <li>the default one, global shared by all the application</li> 2386 <li>a per-document catalog, this one is built if the document uses the 2387 <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is 2388 associated to the parser context and destroyed when the parsing context 2389 is destroyed.</li> 2390</ul> 2391 2392<p>the document one will be used first if it exists.</p> 2393 2394<h4>Initialization routines:</h4> 2395 2396<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be 2397used at startup to initialize the catalog, if the catalog should be 2398initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs() 2399should be called before xmlInitializeCatalog() which would otherwise do a 2400default initialization first.</p> 2401 2402<p>The xmlCatalogAddLocal() call is used by the parser to grow the document 2403own catalog list if needed.</p> 2404 2405<h4>Preferences setup:</h4> 2406 2407<p>The XML Catalog spec requires the possibility to select default 2408preferences between public and system delegation, 2409xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and 2410xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should 2411be forbidden, allowed for global catalog, for document catalog or both, the 2412default is to allow both.</p> 2413 2414<p>And of course xmlCatalogSetDebug() allows to generate debug messages 2415(through the xmlGenericError() mechanism).</p> 2416 2417<h4>Querying routines:</h4> 2418 2419<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic() 2420and xmlCatalogResolveURI() are relatively explicit if you read the XML 2421Catalog specification they correspond to section 7 algorithms, they should 2422also work if you have loaded an SGML catalog with a simplified semantic.</p> 2423 2424<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but 2425operate on the document catalog list</p> 2426 2427<h4>Cleanup and Miscellaneous:</h4> 2428 2429<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is 2430the per-document equivalent.</p> 2431 2432<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the 2433first catalog in the global list, and xmlCatalogDump() allows to dump a 2434catalog state, those routines are primarily designed for xmlcatalog, I'm not 2435sure that exposing more complex interfaces (like navigation ones) would be 2436really useful.</p> 2437 2438<p>The xmlParseCatalogFile() is a function used to load XML Catalog files, 2439it's similar as xmlParseFile() except it bypass all catalog lookups, it's 2440provided because this functionality may be useful for client tools.</p> 2441 2442<h4>threaded environments:</h4> 2443 2444<p>Since the catalog tree is built progressively, some care has been taken to 2445try to avoid troubles in multithreaded environments. The code is now thread 2446safe assuming that the libxml library has been compiled with threads 2447support.</p> 2448 2449<p></p> 2450 2451<h3><a name="Other">Other resources</a></h3> 2452 2453<p>The XML Catalog specification is relatively recent so there isn't much 2454literature to point at:</p> 2455<ul> 2456 <li>You can find an good rant from Norm Walsh about <a 2457 href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the 2458 need for catalogs</a>, it provides a lot of context informations even if 2459 I don't agree with everything presented.</li> 2460 <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML 2461 catalog proposal</a> from John Cowan</li> 2462 <li>The <a href="http://www.rddl.org/">Resource Directory Description 2463 Language</a> (RDDL) another catalog system but more oriented toward 2464 providing metadata for XML namespaces.</li> 2465 <li>the page from the OASIS Technical <a 2466 href="http://www.oasis-open.org/committees/entity/">Committee on Entity 2467 Resolution</a> who maintains XML Catalog, you will find pointers to the 2468 specification update, some background and pointers to others tools 2469 providing XML Catalog support</li> 2470 <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a 2471 mall tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to 2472 work fine for me</li> 2473 <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog 2474 manual page</a></li> 2475</ul> 2476 2477<p>If you have suggestions for corrections or additions, simply contact 2478me:</p> 2479 2480<h2><a name="library">The parser interfaces</a></h2> 2481 2482<p>This section is directly intended to help programmers getting bootstrapped 2483using the XML library from the C language. It is not intended to be 2484extensive. I hope the automatically generated documents will provide the 2485completeness required, but as a separate set of documents. The interfaces of 2486the XML library are by principle low level, there is nearly zero abstraction. 2487Those interested in a higher level API should <a href="#DOM">look at 2488DOM</a>.</p> 2489 2490<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are 2491separated from the <a href="html/libxml-htmlparser.html">HTML parser 2492interfaces</a>. Let's have a look at how the XML parser can be called:</p> 2493 2494<h3><a name="Invoking">Invoking the parser : the pull method</a></h3> 2495 2496<p>Usually, the first thing to do is to read an XML input. The parser accepts 2497documents either from in-memory strings or from files. The functions are 2498defined in "parser.h":</p> 2499<dl> 2500 <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt> 2501 <dd><p>Parse a null-terminated string containing the document.</p> 2502 </dd> 2503</dl> 2504<dl> 2505 <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt> 2506 <dd><p>Parse an XML document contained in a (possibly compressed) 2507 file.</p> 2508 </dd> 2509</dl> 2510 2511<p>The parser returns a pointer to the document structure (or NULL in case of 2512failure).</p> 2513 2514<h3 id="Invoking1">Invoking the parser: the push method</h3> 2515 2516<p>In order for the application to keep the control when the document is 2517being fetched (which is common for GUI based programs) libxml provides a push 2518interface, too, as of version 1.8.3. Here are the interface functions:</p> 2519<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax, 2520 void *user_data, 2521 const char *chunk, 2522 int size, 2523 const char *filename); 2524int xmlParseChunk (xmlParserCtxtPtr ctxt, 2525 const char *chunk, 2526 int size, 2527 int terminate);</pre> 2528 2529<p>and here is a simple example showing how to use the interface:</p> 2530<pre> FILE *f; 2531 2532 f = fopen(filename, "r"); 2533 if (f != NULL) { 2534 int res, size = 1024; 2535 char chars[1024]; 2536 xmlParserCtxtPtr ctxt; 2537 2538 res = fread(chars, 1, 4, f); 2539 if (res > 0) { 2540 ctxt = xmlCreatePushParserCtxt(NULL, NULL, 2541 chars, res, filename); 2542 while ((res = fread(chars, 1, size, f)) > 0) { 2543 xmlParseChunk(ctxt, chars, res, 0); 2544 } 2545 xmlParseChunk(ctxt, chars, 0, 1); 2546 doc = ctxt->myDoc; 2547 xmlFreeParserCtxt(ctxt); 2548 } 2549 }</pre> 2550 2551<p>The HTML parser embedded into libxml also has a push interface; the 2552functions are just prefixed by "html" rather than "xml".</p> 2553 2554<h3 id="Invoking2">Invoking the parser: the SAX interface</h3> 2555 2556<p>The tree-building interface makes the parser memory-hungry, first loading 2557the document in memory and then building the tree itself. Reading a document 2558without building the tree is possible using the SAX interfaces (see SAX.h and 2559<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James 2560Henstridge's documentation</a>). Note also that the push interface can be 2561limited to SAX: just use the two first arguments of 2562<code>xmlCreatePushParserCtxt()</code>.</p> 2563 2564<h3><a name="Building">Building a tree from scratch</a></h3> 2565 2566<p>The other way to get an XML tree in memory is by building it. Basically 2567there is a set of functions dedicated to building new elements. (These are 2568also described in <libxml/tree.h>.) For example, here is a piece of 2569code that produces the XML document used in the previous examples:</p> 2570<pre> #include <libxml/tree.h> 2571 xmlDocPtr doc; 2572 xmlNodePtr tree, subtree; 2573 2574 doc = xmlNewDoc("1.0"); 2575 doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL); 2576 xmlSetProp(doc->children, "prop1", "gnome is great"); 2577 xmlSetProp(doc->children, "prop2", "& linux too"); 2578 tree = xmlNewChild(doc->children, NULL, "head", NULL); 2579 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome"); 2580 tree = xmlNewChild(doc->children, NULL, "chapter", NULL); 2581 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure"); 2582 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ..."); 2583 subtree = xmlNewChild(tree, NULL, "image", NULL); 2584 xmlSetProp(subtree, "href", "linus.gif");</pre> 2585 2586<p>Not really rocket science ...</p> 2587 2588<h3><a name="Traversing">Traversing the tree</a></h3> 2589 2590<p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your 2591code has access to the internal structure of all the elements of the tree. 2592The names should be somewhat simple like <strong>parent</strong>, 2593<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>, 2594<strong>properties</strong>, etc... For example, still with the previous 2595example:</p> 2596<pre><code>doc->children->children->children</code></pre> 2597 2598<p>points to the title element,</p> 2599<pre>doc->children->children->next->children->children</pre> 2600 2601<p>points to the text node containing the chapter title "The Linux 2602adventure".</p> 2603 2604<p><strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be 2605present before the document root, so <code>doc->children</code> may point 2606to an element which is not the document Root Element; a function 2607<code>xmlDocGetRootElement()</code> was added for this purpose.</p> 2608 2609<h3><a name="Modifying">Modifying the tree</a></h3> 2610 2611<p>Functions are provided for reading and writing the document content. Here 2612is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p> 2613<dl> 2614 <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const 2615 xmlChar *value);</code></dt> 2616 <dd><p>This sets (or changes) an attribute carried by an ELEMENT node. 2617 The value can be NULL.</p> 2618 </dd> 2619</dl> 2620<dl> 2621 <dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar 2622 *name);</code></dt> 2623 <dd><p>This function returns a pointer to new copy of the property 2624 content. Note that the user must deallocate the result.</p> 2625 </dd> 2626</dl> 2627 2628<p>Two functions are provided for reading and writing the text associated 2629with elements:</p> 2630<dl> 2631 <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar 2632 *value);</code></dt> 2633 <dd><p>This function takes an "external" string and converts it to one 2634 text node or possibly to a list of entity and text nodes. All 2635 non-predefined entity references like &Gnome; will be stored 2636 internally as entity nodes, hence the result of the function may not be 2637 a single node.</p> 2638 </dd> 2639</dl> 2640<dl> 2641 <dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int 2642 inLine);</code></dt> 2643 <dd><p>This function is the inverse of 2644 <code>xmlStringGetNodeList()</code>. It generates a new string 2645 containing the content of the text and entity nodes. Note the extra 2646 argument inLine. If this argument is set to 1, the function will expand 2647 entity references. For example, instead of returning the &Gnome; 2648 XML encoding in the string, it will substitute it with its value (say, 2649 "GNU Network Object Model Environment").</p> 2650 </dd> 2651</dl> 2652 2653<h3><a name="Saving">Saving a tree</a></h3> 2654 2655<p>Basically 3 options are possible:</p> 2656<dl> 2657 <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int 2658 *size);</code></dt> 2659 <dd><p>Returns a buffer into which the document has been saved.</p> 2660 </dd> 2661</dl> 2662<dl> 2663 <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt> 2664 <dd><p>Dumps a document to an open file descriptor.</p> 2665 </dd> 2666</dl> 2667<dl> 2668 <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt> 2669 <dd><p>Saves the document to a file. In this case, the compression 2670 interface is triggered if it has been turned on.</p> 2671 </dd> 2672</dl> 2673 2674<h3><a name="Compressio">Compression</a></h3> 2675 2676<p>The library transparently handles compression when doing file-based 2677accesses. The level of compression on saves can be turned on either globally 2678or individually for one file:</p> 2679<dl> 2680 <dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt> 2681 <dd><p>Gets the document compression ratio (0-9).</p> 2682 </dd> 2683</dl> 2684<dl> 2685 <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt> 2686 <dd><p>Sets the document compression ratio.</p> 2687 </dd> 2688</dl> 2689<dl> 2690 <dt><code>int xmlGetCompressMode(void);</code></dt> 2691 <dd><p>Gets the default compression ratio.</p> 2692 </dd> 2693</dl> 2694<dl> 2695 <dt><code>void xmlSetCompressMode(int mode);</code></dt> 2696 <dd><p>Sets the default compression ratio.</p> 2697 </dd> 2698</dl> 2699 2700<h2><a name="Entities">Entities or no entities</a></h2> 2701 2702<p>Entities in principle are similar to simple C macros. An entity defines an 2703abbreviation for a given string that you can reuse many times throughout the 2704content of your document. Entities are especially useful when a given string 2705may occur frequently within a document, or to confine the change needed to a 2706document to a restricted area in the internal subset of the document (at the 2707beginning). Example:</p> 2708<pre>1 <?xml version="1.0"?> 27092 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [ 27103 <!ENTITY xml "Extensible Markup Language"> 27114 ]> 27125 <EXAMPLE> 27136 &xml; 27147 </EXAMPLE></pre> 2715 2716<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing 2717its name with '&' and following it by ';' without any spaces added. There 2718are 5 predefined entities in libxml allowing you to escape charaters with 2719predefined meaning in some parts of the xml document content: 2720<strong>&lt;</strong> for the character '<', <strong>&gt;</strong> 2721for the character '>', <strong>&apos;</strong> for the character ''', 2722<strong>&quot;</strong> for the character '"', and 2723<strong>&amp;</strong> for the character '&'.</p> 2724 2725<p>One of the problems related to entities is that you may want the parser to 2726substitute an entity's content so that you can see the replacement text in 2727your application. Or you may prefer to keep entity references as such in the 2728content to be able to save the document back without losing this usually 2729precious information (if the user went through the pain of explicitly 2730defining entities, he may have a a rather negative attitude if you blindly 2731susbtitute them as saving time). The <a 2732href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> 2733function allows you to check and change the behaviour, which is to not 2734substitute entities by default.</p> 2735 2736<p>Here is the DOM tree built by libxml for the previous document in the 2737default case:</p> 2738<pre>/gnome/src/gnome-xml -> /xmllint --debug test/ent1 2739DOCUMENT 2740version=1.0 2741 ELEMENT EXAMPLE 2742 TEXT 2743 content= 2744 ENTITY_REF 2745 INTERNAL_GENERAL_ENTITY xml 2746 content=Extensible Markup Language 2747 TEXT 2748 content=</pre> 2749 2750<p>And here is the result when substituting entities:</p> 2751<pre>/gnome/src/gnome-xml -> /tester --debug --noent test/ent1 2752DOCUMENT 2753version=1.0 2754 ELEMENT EXAMPLE 2755 TEXT 2756 content= Extensible Markup Language</pre> 2757 2758<p>So, entities or no entities? Basically, it depends on your use case. I 2759suggest that you keep the non-substituting default behaviour and avoid using 2760entities in your XML document or data if you are not willing to handle the 2761entity references elements in the DOM tree.</p> 2762 2763<p>Note that at save time libxml enforces the conversion of the predefined 2764entities where necessary to prevent well-formedness problems, and will also 2765transparently replace those with chars (i.e. it will not generate entity 2766reference elements in the DOM tree or call the reference() SAX callback when 2767finding them in the input).</p> 2768 2769<p><span style="background-color: #FF0000">WARNING</span>: handling entities 2770on top of the libxml SAX interface is difficult!!! If you plan to use 2771non-predefined entities in your documents, then the learning cuvre to handle 2772then using the SAX API may be long. If you plan to use complex documents, I 2773strongly suggest you consider using the DOM interface instead and let libxml 2774deal with the complexity rather than trying to do it yourself.</p> 2775 2776<h2><a name="Namespaces">Namespaces</a></h2> 2777 2778<p>The libxml library implements <a 2779href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by 2780recognizing namespace contructs in the input, and does namespace lookup 2781automatically when building the DOM tree. A namespace declaration is 2782associated with an in-memory structure and all elements or attributes within 2783that namespace point to it. Hence testing the namespace is a simple and fast 2784equality operation at the user level.</p> 2785 2786<p>I suggest that people using libxml use a namespace, and declare it in the 2787root element of their document as the default namespace. Then they don't need 2788to use the prefix in the content but we will have a basis for future semantic 2789refinement and merging of data from different sources. This doesn't increase 2790the size of the XML output significantly, but significantly increases its 2791value in the long-term. Example:</p> 2792<pre><mydoc xmlns="http://mydoc.example.org/schemas/"> 2793 <elem1>...</elem1> 2794 <elem2>...</elem2> 2795</mydoc></pre> 2796 2797<p>The namespace value has to be an absolute URL, but the URL doesn't have to 2798point to any existing resource on the Web. It will bind all the element and 2799atributes with that URL. I suggest to use an URL within a domain you control, 2800and that the URL should contain some kind of version information if possible. 2801For example, <code>"http://www.gnome.org/gnumeric/1.0/"</code> is a good 2802namespace scheme.</p> 2803 2804<p>Then when you load a file, make sure that a namespace carrying the 2805version-independent prefix is installed on the root element of your document, 2806and if the version information don't match something you know, warn the user 2807and be liberal in what you accept as the input. Also do *not* try to base 2808namespace checking on the prefix value. <foo:text> may be exactly the 2809same as <bar:text> in another document. What really matters is the URI 2810associated with the element or the attribute, not the prefix string (which is 2811just a shortcut for the full URI). In libxml, element and attributes have an 2812<code>ns</code> field pointing to an xmlNs structure detailing the namespace 2813prefix and its URI.</p> 2814 2815<p>@@Interfaces@@</p> 2816 2817<p>@@Examples@@</p> 2818 2819<p>Usually people object to using namespaces together with validity checking. 2820I will try to make sure that using namespaces won't break validity checking, 2821so even if you plan to use or currently are using validation I strongly 2822suggest adding namespaces to your document. A default namespace scheme 2823<code>xmlns="http://...."</code> should not break validity even on less 2824flexible parsers. Using namespaces to mix and differentiate content coming 2825from multiple DTDs will certainly break current validation schemes. I will 2826try to provide ways to do this, but this may not be portable or 2827standardized.</p> 2828 2829<h2><a name="Upgrading">Upgrading 1.x code</a></h2> 2830 2831<p>Incompatible changes:</p> 2832 2833<p>Version 2 of libxml is the first version introducing serious backward 2834incompatible changes. The main goals were:</p> 2835<ul> 2836 <li>a general cleanup. A number of mistakes inherited from the very early 2837 versions couldn't be changed due to compatibility constraints. Example 2838 the "childs" element in the nodes.</li> 2839 <li>Uniformization of the various nodes, at least for their header and link 2840 parts (doc, parent, children, prev, next), the goal is a simpler 2841 programming model and simplifying the task of the DOM implementors.</li> 2842 <li>better conformances to the XML specification, for example version 1.x 2843 had an heuristic to try to detect ignorable white spaces. As a result the 2844 SAX event generated were ignorableWhitespace() while the spec requires 2845 character() in that case. This also mean that a number of DOM node 2846 containing blank text may populate the DOM tree which were not present 2847 before.</li> 2848</ul> 2849 2850<h3>How to fix libxml-1.x code:</h3> 2851 2852<p>So client code of libxml designed to run with version 1.x may have to be 2853changed to compile against version 2.x of libxml. Here is a list of changes 2854that I have collected, they may not be sufficient, so in case you find other 2855change which are required, <a href="mailto:Daniel.�eillardw3.org">drop me a 2856mail</a>:</p> 2857<ol> 2858 <li>The package name have changed from libxml to libxml2, the library name 2859 is now -lxml2 . There is a new xml2-config script which should be used to 2860 select the right parameters libxml2</li> 2861 <li>Node <strong>childs</strong> field has been renamed 2862 <strong>children</strong> so s/childs/children/g should be applied 2863 (probablility of having "childs" anywere else is close to 0+</li> 2864 <li>The document don't have anymore a <strong>root</strong> element it has 2865 been replaced by <strong>children</strong> and usually you will get a 2866 list of element here. For example a Dtd element for the internal subset 2867 and it's declaration may be found in that list, as well as processing 2868 instructions or comments found before or after the document root element. 2869 Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of 2870 a document. Alternatively if you are sure to not reference Dtds nor have 2871 PIs or comments before or after the root element 2872 s/->root/->children/g will probably do it.</li> 2873 <li>The white space issue, this one is more complex, unless special case of 2874 validating parsing, the line breaks and spaces usually used for indenting 2875 and formatting the document content becomes significant. So they are 2876 reported by SAX and if your using the DOM tree, corresponding nodes are 2877 generated. Too approach can be taken: 2878 <ol> 2879 <li>lazy one, use the compatibility call 2880 <strong>xmlKeepBlanksDefault(0)</strong> but be aware that you are 2881 relying on a special (and possibly broken) set of heuristics of 2882 libxml to detect ignorable blanks. Don't complain if it breaks or 2883 make your application not 100% clean w.r.t. to it's input.</li> 2884 <li>the Right Way: change you code to accept possibly unsignificant 2885 blanks characters, or have your tree populated with weird blank text 2886 nodes. You can spot them using the comodity function 2887 <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank 2888 nodes.</li> 2889 </ol> 2890 <p>Note also that with the new default the output functions don't add any 2891 extra indentation when saving a tree in order to be able to round trip 2892 (read and save) without inflating the document with extra formatting 2893 chars.</p> 2894 </li> 2895 <li>The include path has changed to $prefix/libxml/ and the includes 2896 themselves uses this new prefix in includes instructions... If you are 2897 using (as expected) the 2898 <pre>xml2-config --cflags</pre> 2899 <p>output to generate you compile commands this will probably work out of 2900 the box</p> 2901 </li> 2902 <li>xmlDetectCharEncoding takes an extra argument indicating the lenght in 2903 byte of the head of the document available for character detection.</li> 2904</ol> 2905 2906<h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3> 2907 2908<p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released 2909to allow smoth upgrade of existing libxml v1code while retaining 2910compatibility. They offers the following:</p> 2911<ol> 2912 <li>similar include naming, one should use 2913 <strong>#include<libxml/...></strong> in both cases.</li> 2914 <li>similar identifiers defined via macros for the child and root fields: 2915 respectively <strong>xmlChildrenNode</strong> and 2916 <strong>xmlRootNode</strong></li> 2917 <li>a new macro <strong>LIBXML_TEST_VERSION</strong> which should be 2918 inserted once in the client code</li> 2919</ol> 2920 2921<p>So the roadmap to upgrade your existing libxml applications is the 2922following:</p> 2923<ol> 2924 <li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li> 2925 <li>find all occurences where the xmlDoc <strong>root</strong> field is 2926 used and change it to <strong>xmlRootNode</strong></li> 2927 <li>similary find all occurences where the xmlNode <strong>childs</strong> 2928 field is used and change it to <strong>xmlChildrenNode</strong></li> 2929 <li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your 2930 <strong>main()</strong> or in the library init entry point</li> 2931 <li>Recompile, check compatibility, it should still work</li> 2932 <li>Change your configure script to look first for xml2-config and fallback 2933 using xml-config . Use the --cflags and --libs ouptut of the command as 2934 the Include and Linking parameters needed to use libxml.</li> 2935 <li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and 2936 libxml-devel-1.8.y can be kept simultaneously)</li> 2937 <li>remove your config.cache, relaunch your configuration mechanism, and 2938 recompile, if steps 2 and 3 were done right it should compile as-is</li> 2939 <li>Test that your application is still running correctly, if not this may 2940 be due to extra empty nodes due to formating spaces being kept in libxml2 2941 contrary to libxml1, in that case insert xmlKeepBlanksDefault(1) in your 2942 code before calling the parser (next to 2943 <strong>LIBXML_TEST_VERSION</strong> is a fine place).</li> 2944</ol> 2945 2946<p>Following those steps should work. It worked for some of my own code.</p> 2947 2948<p>Let me put some emphasis on the fact that there is far more changes from 2949libxml 1.x to 2.x than the ones you may have to patch for. The overall code 2950has been considerably cleaned up and the conformance to the XML specification 2951has been drastically improved too. Don't take those changes as an excuse to 2952not upgrade, it may cost a lot on the long term ...</p> 2953 2954<h2><a name="DOM"></a><a name="Principles">DOM Principles</a></h2> 2955 2956<p><a href="http://www.w3.org/DOM/">DOM</a> stands for the <em>Document 2957Object Model</em>; this is an API for accessing XML or HTML structured 2958documents. Native support for DOM in Gnome is on the way (module gnome-dom), 2959and will be based on gnome-xml. This will be a far cleaner interface to 2960manipulate XML files within Gnome since it won't expose the internal 2961structure.</p> 2962 2963<p>The current DOM implementation on top of libxml is the <a 2964href="http://cvs.gnome.org/lxr/source/gdome2/">gdome2 Gnome module</a>, this 2965is a full DOM interface, thanks to Paolo Casarini, check the <a 2966href="http://www.cs.unibo.it/~casarini/gdome2/">Gdome2 homepage</a> for more 2967informations.</p> 2968 2969<h2><a name="Example"></a><a name="real">A real example</a></h2> 2970 2971<p>Here is a real size example, where the actual content of the application 2972data is not kept in the DOM tree but uses internal structures. It is based on 2973a proposal to keep a database of jobs related to Gnome, with an XML based 2974storage structure. Here is an <a href="gjobs.xml">XML encoded jobs 2975base</a>:</p> 2976<pre><?xml version="1.0"?> 2977<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location"> 2978 <gjob:Jobs> 2979 2980 <gjob:Job> 2981 <gjob:Project ID="3"/> 2982 <gjob:Application>GBackup</gjob:Application> 2983 <gjob:Category>Development</gjob:Category> 2984 2985 <gjob:Update> 2986 <gjob:Status>Open</gjob:Status> 2987 <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified> 2988 <gjob:Salary>USD 0.00</gjob:Salary> 2989 </gjob:Update> 2990 2991 <gjob:Developers> 2992 <gjob:Developer> 2993 </gjob:Developer> 2994 </gjob:Developers> 2995 2996 <gjob:Contact> 2997 <gjob:Person>Nathan Clemons</gjob:Person> 2998 <gjob:Email>nathan@windsofstorm.net</gjob:Email> 2999 <gjob:Company> 3000 </gjob:Company> 3001 <gjob:Organisation> 3002 </gjob:Organisation> 3003 <gjob:Webpage> 3004 </gjob:Webpage> 3005 <gjob:Snailmail> 3006 </gjob:Snailmail> 3007 <gjob:Phone> 3008 </gjob:Phone> 3009 </gjob:Contact> 3010 3011 <gjob:Requirements> 3012 The program should be released as free software, under the GPL. 3013 </gjob:Requirements> 3014 3015 <gjob:Skills> 3016 </gjob:Skills> 3017 3018 <gjob:Details> 3019 A GNOME based system that will allow a superuser to configure 3020 compressed and uncompressed files and/or file systems to be backed 3021 up with a supported media in the system. This should be able to 3022 perform via find commands generating a list of files that are passed 3023 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine 3024 or via operations performed on the filesystem itself. Email 3025 notification and GUI status display very important. 3026 </gjob:Details> 3027 3028 </gjob:Job> 3029 3030 </gjob:Jobs> 3031</gjob:Helping></pre> 3032 3033<p>While loading the XML file into an internal DOM tree is a matter of 3034calling only a couple of functions, browsing the tree to gather the ata and 3035generate the internal structures is harder, and more error prone.</p> 3036 3037<p>The suggested principle is to be tolerant with respect to the input 3038structure. For example, the ordering of the attributes is not significant, 3039the XML specification is clear about it. It's also usually a good idea not to 3040depend on the order of the children of a given node, unless it really makes 3041things harder. Here is some code to parse the information for a person:</p> 3042<pre>/* 3043 * A person record 3044 */ 3045typedef struct person { 3046 char *name; 3047 char *email; 3048 char *company; 3049 char *organisation; 3050 char *smail; 3051 char *webPage; 3052 char *phone; 3053} person, *personPtr; 3054 3055/* 3056 * And the code needed to parse it 3057 */ 3058personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 3059 personPtr ret = NULL; 3060 3061DEBUG("parsePerson\n"); 3062 /* 3063 * allocate the struct 3064 */ 3065 ret = (personPtr) malloc(sizeof(person)); 3066 if (ret == NULL) { 3067 fprintf(stderr,"out of memory\n"); 3068 return(NULL); 3069 } 3070 memset(ret, 0, sizeof(person)); 3071 3072 /* We don't care what the top level element name is */ 3073 cur = cur->xmlChildrenNode; 3074 while (cur != NULL) { 3075 if ((!strcmp(cur->name, "Person")) && (cur->ns == ns)) 3076 ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 3077 if ((!strcmp(cur->name, "Email")) && (cur->ns == ns)) 3078 ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 3079 cur = cur->next; 3080 } 3081 3082 return(ret); 3083}</pre> 3084 3085<p>Here are a couple of things to notice:</p> 3086<ul> 3087 <li>Usually a recursive parsing style is the more convenient one: XML data 3088 is by nature subject to repetitive constructs and usually exibits highly 3089 stuctured patterns.</li> 3090 <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, 3091 i.e. the pointer to the global XML document and the namespace reserved to 3092 the application. Document wide information are needed for example to 3093 decode entities and it's a good coding practice to define a namespace for 3094 your application set of data and test that the element and attributes 3095 you're analyzing actually pertains to your application space. This is 3096 done by a simple equality test (cur->ns == ns).</li> 3097 <li>To retrieve text and attributes value, you can use the function 3098 <em>xmlNodeListGetString</em> to gather all the text and entity reference 3099 nodes generated by the DOM output and produce an single text string.</li> 3100</ul> 3101 3102<p>Here is another piece of code used to parse another level of the 3103structure:</p> 3104<pre>#include <libxml/tree.h> 3105/* 3106 * a Description for a Job 3107 */ 3108typedef struct job { 3109 char *projectID; 3110 char *application; 3111 char *category; 3112 personPtr contact; 3113 int nbDevelopers; 3114 personPtr developers[100]; /* using dynamic alloc is left as an exercise */ 3115} job, *jobPtr; 3116 3117/* 3118 * And the code needed to parse it 3119 */ 3120jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) { 3121 jobPtr ret = NULL; 3122 3123DEBUG("parseJob\n"); 3124 /* 3125 * allocate the struct 3126 */ 3127 ret = (jobPtr) malloc(sizeof(job)); 3128 if (ret == NULL) { 3129 fprintf(stderr,"out of memory\n"); 3130 return(NULL); 3131 } 3132 memset(ret, 0, sizeof(job)); 3133 3134 /* We don't care what the top level element name is */ 3135 cur = cur->xmlChildrenNode; 3136 while (cur != NULL) { 3137 3138 if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) { 3139 ret->projectID = xmlGetProp(cur, "ID"); 3140 if (ret->projectID == NULL) { 3141 fprintf(stderr, "Project has no ID\n"); 3142 } 3143 } 3144 if ((!strcmp(cur->name, "Application")) && (cur->ns == ns)) 3145 ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 3146 if ((!strcmp(cur->name, "Category")) && (cur->ns == ns)) 3147 ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1); 3148 if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns)) 3149 ret->contact = parsePerson(doc, ns, cur); 3150 cur = cur->next; 3151 } 3152 3153 return(ret); 3154}</pre> 3155 3156<p>Once you are used to it, writing this kind of code is quite simple, but 3157boring. Ultimately, it could be possble to write stubbers taking either C 3158data structure definitions, a set of XML examples or an XML DTD and produce 3159the code needed to import and export the content between C data and XML 3160storage. This is left as an exercise to the reader :-)</p> 3161 3162<p>Feel free to use <a href="example/gjobread.c">the code for the full C 3163parsing example</a> as a template, it is also available with Makefile in the 3164Gnome CVS base under gnome-xml/example</p> 3165 3166<h2><a name="Contributi">Contributions</a></h2> 3167<ul> 3168 <li>Bjorn Reese, William Brack and Thomas Broyer have provided a number of 3169 patches, Gary Pennington worked on the validation API, threading support 3170 and Solaris port.</li> 3171 <li>John Fleck helps maintaining the documentation and man pages.</li> 3172 <li><p><a href="mailto:ari@lusis.org">Ari Johnson</a></p> 3173 provides a C++ wrapper for libxml: 3174 <p>Website: <a 3175 href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p> 3176 <p>Download: <a 3177 href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p> 3178 </li> 3179 <li><a href="mailto:izlatkovic@daenet.de">Igor Zlatkovic</a> 3180 is now the maintainer of the Windows port, <a 3181 href="http://www.fh-frankfurt.de/~igor/projects/libxml/index.html">he 3182 provides binaries</a></li> 3183 <li><a href="mailto:Gary.Pennington@sun.com">Gary Pennington</a> 3184 provides <a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris 3185 binaries</a></li> 3186 <li><a 3187 href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt 3188 Sergeant</a> 3189 developped <a href="http://axkit.org/download/">XML::LibXSLT</a>, a perl 3190 wrapper for libxml2/libxslt as part of the <a 3191 href="http://axkit.com/">AxKit XML application server</a></li> 3192 <li><a href="mailto:fnatter@gmx.net">Felix Natter</a> 3193 and <a href="mailto:geertk@ai.rug.nl">Geert Kloosterman</a> provide <a 3194 href="libxml-doc.el">an emacs module</a> to lookup libxml(2) functions 3195 documentation</li> 3196 <li><a href="mailto:sherwin@nlm.nih.gov">Ziying Sherwin</a> 3197 provided <a href="http://xmlsoft.org/messages/0488.html">man 3198 pages</a></li> 3199 <li>there is a module for <a 3200 href="http://acs-misc.sourceforge.net/nsxml.html">libxml/libxslt support 3201 in OpenNSD/AOLServer</a></li> 3202 <li><a href="mailto:dkuhlman@cutter.rexx.com">Dave Kuhlman</a> 3203 provides libxml/libxslt <a href="http://www.rexx.com/~dkuhlman">wrappers 3204 for Python</a></li> 3205</ul> 3206 3207<p></p> 3208 3209<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p> 3210 3211<p>$Id: xml.html,v 1.114 2001/10/24 12:35:52 veillard Exp $</p> 3212</body> 3213</html> 3214