library.html revision 595978c978070456dac57acff4dcfd9039af0ce4
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> 2<html> 3<head> 4<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 5<link rel="SHORTCUT ICON" href="/favicon.ico"> 6<style type="text/css"><!-- 7TD {font-family: Verdana,Arial,Helvetica} 8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 9H1 {font-family: Verdana,Arial,Helvetica} 10H2 {font-family: Verdana,Arial,Helvetica} 11H3 {font-family: Verdana,Arial,Helvetica} 12A:link, A:visited, A:active { text-decoration: underline } 13--></style> 14<title>The parser interfaces</title> 15</head> 16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> 17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> 18<td width="180"> 19<a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo"></a></div> 20</td> 21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> 22<h1>The XML C library for Gnome</h1> 23<h2>The parser interfaces</h2> 24</td></tr></table></td></tr></table></td> 25</tr></table> 26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> 27<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> 28<table width="100%" border="0" cellspacing="1" cellpadding="3"> 29<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> 30<tr><td bgcolor="#fffacd"><ul> 31<li><a href="index.html">Home</a></li> 32<li><a href="intro.html">Introduction</a></li> 33<li><a href="FAQ.html">FAQ</a></li> 34<li><a href="docs.html">Documentation</a></li> 35<li><a href="bugs.html">Reporting bugs and getting help</a></li> 36<li><a href="help.html">How to help</a></li> 37<li><a href="downloads.html">Downloads</a></li> 38<li><a href="news.html">News</a></li> 39<li><a href="XMLinfo.html">XML</a></li> 40<li><a href="XSLT.html">XSLT</a></li> 41<li><a href="python.html">Python and bindings</a></li> 42<li><a href="architecture.html">libxml architecture</a></li> 43<li><a href="tree.html">The tree output</a></li> 44<li><a href="interface.html">The SAX interface</a></li> 45<li><a href="xmldtd.html">Validation & DTDs</a></li> 46<li><a href="xmlmem.html">Memory Management</a></li> 47<li><a href="encoding.html">Encodings support</a></li> 48<li><a href="xmlio.html">I/O Interfaces</a></li> 49<li><a href="catalog.html">Catalog support</a></li> 50<li><a href="library.html">The parser interfaces</a></li> 51<li><a href="entities.html">Entities or no entities</a></li> 52<li><a href="namespaces.html">Namespaces</a></li> 53<li><a href="upgrade.html">Upgrading 1.x code</a></li> 54<li><a href="threads.html">Thread safety</a></li> 55<li><a href="DOM.html">DOM Principles</a></li> 56<li><a href="example.html">A real example</a></li> 57<li><a href="contribs.html">Contributions</a></li> 58<li><a href="tutorial/index.html">Tutorial</a></li> 59<li> 60<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> 61</li> 62</ul></td></tr> 63</table> 64<table width="100%" border="0" cellspacing="1" cellpadding="3"> 65<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr> 66<tr><td bgcolor="#fffacd"> 67<form action="search.php" enctype="application/x-www-form-urlencoded" method="GET"> 68<input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ..."> 69</form> 70<ul> 71<li><a href="APIchunk0.html">Alphabetic</a></li> 72<li><a href="APIconstructors.html">Constructors</a></li> 73<li><a href="APIfunctions.html">Functions/Types</a></li> 74<li><a href="APIfiles.html">Modules</a></li> 75<li><a href="APIsymbols.html">Symbols</a></li> 76</ul> 77</td></tr> 78</table> 79<table width="100%" border="0" cellspacing="1" cellpadding="3"> 80<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> 81<tr><td bgcolor="#fffacd"><ul> 82<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> 83<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> 84<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li> 85<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li> 86<li><a href="ftp://xmlsoft.org/">FTP</a></li> 87<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> 88<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> 89<li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li> 90<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li> 91<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&product=libxml2">Bug Tracker</a></li> 92</ul></td></tr> 93</table> 94</td></tr></table></td> 95<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> 96<p>This section is directly intended to help programmers getting bootstrapped 97using the XML library from the C language. It is not intended to be 98extensive. I hope the automatically generated documents will provide the 99completeness required, but as a separate set of documents. The interfaces of 100the XML library are by principle low level, there is nearly zero abstraction. 101Those interested in a higher level API should <a href="#DOM">look at 102DOM</a>.</p> 103<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are 104separated from the <a href="html/libxml-htmlparser.html">HTML parser 105interfaces</a>. Let's have a look at how the XML parser can be called:</p> 106<h3><a name="Invoking">Invoking the parser : the pull method</a></h3> 107<p>Usually, the first thing to do is to read an XML input. The parser accepts 108documents either from in-memory strings or from files. The functions are 109defined in "parser.h":</p> 110<dl> 111<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt> 112 <dd> 113<p>Parse a null-terminated string containing the document.</p> 114 </dd> 115</dl> 116<dl> 117<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt> 118 <dd> 119<p>Parse an XML document contained in a (possibly compressed) 120 file.</p> 121 </dd> 122</dl> 123<p>The parser returns a pointer to the document structure (or NULL in case of 124failure).</p> 125<h3 id="Invoking1">Invoking the parser: the push method</h3> 126<p>In order for the application to keep the control when the document is 127being fetched (which is common for GUI based programs) libxml provides a push 128interface, too, as of version 1.8.3. Here are the interface functions:</p> 129<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax, 130 void *user_data, 131 const char *chunk, 132 int size, 133 const char *filename); 134int xmlParseChunk (xmlParserCtxtPtr ctxt, 135 const char *chunk, 136 int size, 137 int terminate);</pre> 138<p>and here is a simple example showing how to use the interface:</p> 139<pre> FILE *f; 140 141 f = fopen(filename, "r"); 142 if (f != NULL) { 143 int res, size = 1024; 144 char chars[1024]; 145 xmlParserCtxtPtr ctxt; 146 147 res = fread(chars, 1, 4, f); 148 if (res > 0) { 149 ctxt = xmlCreatePushParserCtxt(NULL, NULL, 150 chars, res, filename); 151 while ((res = fread(chars, 1, size, f)) > 0) { 152 xmlParseChunk(ctxt, chars, res, 0); 153 } 154 xmlParseChunk(ctxt, chars, 0, 1); 155 doc = ctxt->myDoc; 156 xmlFreeParserCtxt(ctxt); 157 } 158 }</pre> 159<p>The HTML parser embedded into libxml also has a push interface; the 160functions are just prefixed by "html" rather than "xml".</p> 161<h3 id="Invoking2">Invoking the parser: the SAX interface</h3> 162<p>The tree-building interface makes the parser memory-hungry, first loading 163the document in memory and then building the tree itself. Reading a document 164without building the tree is possible using the SAX interfaces (see SAX.h and 165<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James 166Henstridge's documentation</a>). Note also that the push interface can be 167limited to SAX: just use the two first arguments of 168<code>xmlCreatePushParserCtxt()</code>.</p> 169<h3><a name="Building">Building a tree from scratch</a></h3> 170<p>The other way to get an XML tree in memory is by building it. Basically 171there is a set of functions dedicated to building new elements. (These are 172also described in <libxml/tree.h>.) For example, here is a piece of 173code that produces the XML document used in the previous examples:</p> 174<pre> #include <libxml/tree.h> 175 xmlDocPtr doc; 176 xmlNodePtr tree, subtree; 177 178 doc = xmlNewDoc("1.0"); 179 doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL); 180 xmlSetProp(doc->children, "prop1", "gnome is great"); 181 xmlSetProp(doc->children, "prop2", "& linux too"); 182 tree = xmlNewChild(doc->children, NULL, "head", NULL); 183 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome"); 184 tree = xmlNewChild(doc->children, NULL, "chapter", NULL); 185 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure"); 186 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ..."); 187 subtree = xmlNewChild(tree, NULL, "image", NULL); 188 xmlSetProp(subtree, "href", "linus.gif");</pre> 189<p>Not really rocket science ...</p> 190<h3><a name="Traversing">Traversing the tree</a></h3> 191<p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your 192code has access to the internal structure of all the elements of the tree. 193The names should be somewhat simple like <strong>parent</strong>, 194<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>, 195<strong>properties</strong>, etc... For example, still with the previous 196example:</p> 197<pre><code>doc->children->children->children</code></pre> 198<p>points to the title element,</p> 199<pre>doc->children->children->next->children->children</pre> 200<p>points to the text node containing the chapter title "The Linux 201adventure".</p> 202<p> 203<strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be 204present before the document root, so <code>doc->children</code> may point 205to an element which is not the document Root Element; a function 206<code>xmlDocGetRootElement()</code> was added for this purpose.</p> 207<h3><a name="Modifying">Modifying the tree</a></h3> 208<p>Functions are provided for reading and writing the document content. Here 209is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p> 210<dl> 211<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const 212 xmlChar *value);</code></dt> 213 <dd> 214<p>This sets (or changes) an attribute carried by an ELEMENT node. 215 The value can be NULL.</p> 216 </dd> 217</dl> 218<dl> 219<dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar 220 *name);</code></dt> 221 <dd> 222<p>This function returns a pointer to new copy of the property 223 content. Note that the user must deallocate the result.</p> 224 </dd> 225</dl> 226<p>Two functions are provided for reading and writing the text associated 227with elements:</p> 228<dl> 229<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar 230 *value);</code></dt> 231 <dd> 232<p>This function takes an "external" string and converts it to one 233 text node or possibly to a list of entity and text nodes. All 234 non-predefined entity references like &Gnome; will be stored 235 internally as entity nodes, hence the result of the function may not be 236 a single node.</p> 237 </dd> 238</dl> 239<dl> 240<dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int 241 inLine);</code></dt> 242 <dd> 243<p>This function is the inverse of 244 <code>xmlStringGetNodeList()</code>. It generates a new string 245 containing the content of the text and entity nodes. Note the extra 246 argument inLine. If this argument is set to 1, the function will expand 247 entity references. For example, instead of returning the &Gnome; 248 XML encoding in the string, it will substitute it with its value (say, 249 "GNU Network Object Model Environment").</p> 250 </dd> 251</dl> 252<h3><a name="Saving">Saving a tree</a></h3> 253<p>Basically 3 options are possible:</p> 254<dl> 255<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int 256 *size);</code></dt> 257 <dd> 258<p>Returns a buffer into which the document has been saved.</p> 259 </dd> 260</dl> 261<dl> 262<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt> 263 <dd> 264<p>Dumps a document to an open file descriptor.</p> 265 </dd> 266</dl> 267<dl> 268<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt> 269 <dd> 270<p>Saves the document to a file. In this case, the compression 271 interface is triggered if it has been turned on.</p> 272 </dd> 273</dl> 274<h3><a name="Compressio">Compression</a></h3> 275<p>The library transparently handles compression when doing file-based 276accesses. The level of compression on saves can be turned on either globally 277or individually for one file:</p> 278<dl> 279<dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt> 280 <dd> 281<p>Gets the document compression ratio (0-9).</p> 282 </dd> 283</dl> 284<dl> 285<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt> 286 <dd> 287<p>Sets the document compression ratio.</p> 288 </dd> 289</dl> 290<dl> 291<dt><code>int xmlGetCompressMode(void);</code></dt> 292 <dd> 293<p>Gets the default compression ratio.</p> 294 </dd> 295</dl> 296<dl> 297<dt><code>void xmlSetCompressMode(int mode);</code></dt> 298 <dd> 299<p>Sets the default compression ratio.</p> 300 </dd> 301</dl> 302<p><a href="bugs.html">Daniel Veillard</a></p> 303</td></tr></table></td></tr></table></td></tr></table></td> 304</tr></table></td></tr></table> 305</body> 306</html> 307