library.html revision 52dcab3999cc9c480b275bc3ddf66dbb66f92d68
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
11A:link, A:visited, A:active { text-decoration: underline }
12--></style>
13<title>The parser interfaces</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>The parser interfaces</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
31<li><a href="intro.html">Introduction</a></li>
32<li><a href="FAQ.html">FAQ</a></li>
33<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
40<li><a href="architecture.html">libxml architecture</a></li>
41<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
43<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
49<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
51<li><a href="upgrade.html">Upgrading 1.x code</a></li>
52<li><a href="threads.html">Thread safety</a></li>
53<li><a href="DOM.html">DOM Principles</a></li>
54<li><a href="example.html">A real example</a></li>
55<li><a href="contribs.html">Contributions</a></li>
56<li>
57<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
58</li>
59</ul></td></tr>
60</table>
61<table width="100%" border="0" cellspacing="1" cellpadding="3">
62<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
63<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
64<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
65<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
66<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
67<li><a href="ftp://xmlsoft.org/">FTP</a></li>
68<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
69<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li>
70<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
71</ul></td></tr>
72</table>
73</td></tr></table></td>
74<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
75<p>This section is directly intended to help programmers getting bootstrapped
76using the XML library from the C language. It is not intended to be
77extensive. I hope the automatically generated documents will provide the
78completeness required, but as a separate set of documents. The interfaces of
79the XML library are by principle low level, there is nearly zero abstraction.
80Those interested in a higher level API should <a href="#DOM">look at
81DOM</a>.</p>
82<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are
83separated from the <a href="html/libxml-htmlparser.html">HTML parser
84interfaces</a>.  Let's have a look at how the XML parser can be called:</p>
85<h3><a name="Invoking">Invoking the parser : the pull method</a></h3>
86<p>Usually, the first thing to do is to read an XML input. The parser accepts
87documents either from in-memory strings or from files.  The functions are
88defined in &quot;parser.h&quot;:</p>
89<dl>
90<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
91<dd><p>Parse a null-terminated string containing the document.</p></dd>
92</dl>
93<dl>
94<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
95<dd><p>Parse an XML document contained in a (possibly compressed)
96      file.</p></dd>
97</dl>
98<p>The parser returns a pointer to the document structure (or NULL in case of
99failure).</p>
100<h3 id="Invoking1">Invoking the parser: the push method</h3>
101<p>In order for the application to keep the control when the document is
102being fetched (which is common for GUI based programs) libxml provides a push
103interface, too, as of version 1.8.3. Here are the interface functions:</p>
104<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
105                                         void *user_data,
106                                         const char *chunk,
107                                         int size,
108                                         const char *filename);
109int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
110                                         const char *chunk,
111                                         int size,
112                                         int terminate);</pre>
113<p>and here is a simple example showing how to use the interface:</p>
114<pre>            FILE *f;
115
116            f = fopen(filename, &quot;r&quot;);
117            if (f != NULL) {
118                int res, size = 1024;
119                char chars[1024];
120                xmlParserCtxtPtr ctxt;
121
122                res = fread(chars, 1, 4, f);
123                if (res &gt; 0) {
124                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
125                                chars, res, filename);
126                    while ((res = fread(chars, 1, size, f)) &gt; 0) {
127                        xmlParseChunk(ctxt, chars, res, 0);
128                    }
129                    xmlParseChunk(ctxt, chars, 0, 1);
130                    doc = ctxt-&gt;myDoc;
131                    xmlFreeParserCtxt(ctxt);
132                }
133            }</pre>
134<p>The HTML parser embedded into libxml also has a push interface; the
135functions are just prefixed by &quot;html&quot; rather than &quot;xml&quot;.</p>
136<h3 id="Invoking2">Invoking the parser: the SAX interface</h3>
137<p>The tree-building interface makes the parser memory-hungry, first loading
138the document in memory and then building the tree itself. Reading a document
139without building the tree is possible using the SAX interfaces (see SAX.h and
140<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James
141Henstridge's documentation</a>). Note also that the push interface can be
142limited to SAX: just use the two first arguments of
143<code>xmlCreatePushParserCtxt()</code>.</p>
144<h3><a name="Building">Building a tree from scratch</a></h3>
145<p>The other way to get an XML tree in memory is by building it. Basically
146there is a set of functions dedicated to building new elements. (These are
147also described in &lt;libxml/tree.h&gt;.) For example, here is a piece of
148code that produces the XML document used in the previous examples:</p>
149<pre>    #include &lt;libxml/tree.h&gt;
150    xmlDocPtr doc;
151    xmlNodePtr tree, subtree;
152
153    doc = xmlNewDoc(&quot;1.0&quot;);
154    doc-&gt;children = xmlNewDocNode(doc, NULL, &quot;EXAMPLE&quot;, NULL);
155    xmlSetProp(doc-&gt;children, &quot;prop1&quot;, &quot;gnome is great&quot;);
156    xmlSetProp(doc-&gt;children, &quot;prop2&quot;, &quot;&amp; linux too&quot;);
157    tree = xmlNewChild(doc-&gt;children, NULL, &quot;head&quot;, NULL);
158    subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;Welcome to Gnome&quot;);
159    tree = xmlNewChild(doc-&gt;children, NULL, &quot;chapter&quot;, NULL);
160    subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;The Linux adventure&quot;);
161    subtree = xmlNewChild(tree, NULL, &quot;p&quot;, &quot;bla bla bla ...&quot;);
162    subtree = xmlNewChild(tree, NULL, &quot;image&quot;, NULL);
163    xmlSetProp(subtree, &quot;href&quot;, &quot;linus.gif&quot;);</pre>
164<p>Not really rocket science ...</p>
165<h3><a name="Traversing">Traversing the tree</a></h3>
166<p>Basically by <a href="html/libxml-tree.html">including &quot;tree.h&quot;</a> your
167code has access to the internal structure of all the elements of the tree.
168The names should be somewhat simple like <strong>parent</strong>,
169<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>,
170<strong>properties</strong>, etc... For example, still with the previous
171example:</p>
172<pre><code>doc-&gt;children-&gt;children-&gt;children</code></pre>
173<p>points to the title element,</p>
174<pre>doc-&gt;children-&gt;children-&gt;next-&gt;children-&gt;children</pre>
175<p>points to the text node containing the chapter title &quot;The Linux
176adventure&quot;.</p>
177<p>
178<strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be
179present before the document root, so <code>doc-&gt;children</code> may point
180to an element which is not the document Root Element; a function
181<code>xmlDocGetRootElement()</code> was added for this purpose.</p>
182<h3><a name="Modifying">Modifying the tree</a></h3>
183<p>Functions are provided for reading and writing the document content. Here
184is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p>
185<dl>
186<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
187  xmlChar *value);</code></dt>
188<dd><p>This sets (or changes) an attribute carried by an ELEMENT node.
189      The value can be NULL.</p></dd>
190</dl>
191<dl>
192<dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
193  *name);</code></dt>
194<dd><p>This function returns a pointer to new copy of the property
195      content. Note that the user must deallocate the result.</p></dd>
196</dl>
197<p>Two functions are provided for reading and writing the text associated
198with elements:</p>
199<dl>
200<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
201  *value);</code></dt>
202<dd><p>This function takes an &quot;external&quot; string and converts it to one
203      text node or possibly to a list of entity and text nodes. All
204      non-predefined entity references like &amp;Gnome; will be stored
205      internally as entity nodes, hence the result of the function may not be
206      a single node.</p></dd>
207</dl>
208<dl>
209<dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
210  inLine);</code></dt>
211<dd><p>This function is the inverse of
212      <code>xmlStringGetNodeList()</code>. It generates a new string
213      containing the content of the text and entity nodes. Note the extra
214      argument inLine. If this argument is set to 1, the function will expand
215      entity references.  For example, instead of returning the &amp;Gnome;
216      XML encoding in the string, it will substitute it with its value (say,
217      &quot;GNU Network Object Model Environment&quot;).</p></dd>
218</dl>
219<h3><a name="Saving">Saving a tree</a></h3>
220<p>Basically 3 options are possible:</p>
221<dl>
222<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
223  *size);</code></dt>
224<dd><p>Returns a buffer into which the document has been saved.</p></dd>
225</dl>
226<dl>
227<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
228<dd><p>Dumps a document to an open file descriptor.</p></dd>
229</dl>
230<dl>
231<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
232<dd><p>Saves the document to a file. In this case, the compression
233      interface is triggered if it has been turned on.</p></dd>
234</dl>
235<h3><a name="Compressio">Compression</a></h3>
236<p>The library transparently handles compression when doing file-based
237accesses. The level of compression on saves can be turned on either globally
238or individually for one file:</p>
239<dl>
240<dt><code>int  xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
241<dd><p>Gets the document compression ratio (0-9).</p></dd>
242</dl>
243<dl>
244<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
245<dd><p>Sets the document compression ratio.</p></dd>
246</dl>
247<dl>
248<dt><code>int  xmlGetCompressMode(void);</code></dt>
249<dd><p>Gets the default compression ratio.</p></dd>
250</dl>
251<dl>
252<dt><code>void xmlSetCompressMode(int mode);</code></dt>
253<dd><p>Sets the default compression ratio.</p></dd>
254</dl>
255<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
256</td></tr></table></td></tr></table></td></tr></table></td>
257</tr></table></td></tr></table>
258</body>
259</html>
260