xmlio.html revision 1eb242413e72d9bdf81d16771e748dbe665e78b2
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-family: Verdana,Arial,Helvetica}
7BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
8H1 {font-family: Verdana,Arial,Helvetica}
9H2 {font-family: Verdana,Arial,Helvetica}
10H3 {font-family: Verdana,Arial,Helvetica}
11A:link, A:visited, A:active { text-decoration: underline }
12--></style>
13<title>I/O Interfaces</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>I/O Interfaces</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul>
30<li><a href="index.html">Home</a></li>
31<li><a href="intro.html">Introduction</a></li>
32<li><a href="FAQ.html">FAQ</a></li>
33<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XMLinfo.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
40<li><a href="python.html">Python and bindings</a></li>
41<li><a href="architecture.html">libxml architecture</a></li>
42<li><a href="tree.html">The tree output</a></li>
43<li><a href="interface.html">The SAX interface</a></li>
44<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
45<li><a href="xmlmem.html">Memory Management</a></li>
46<li><a href="encoding.html">Encodings support</a></li>
47<li><a href="xmlio.html">I/O Interfaces</a></li>
48<li><a href="catalog.html">Catalog support</a></li>
49<li><a href="library.html">The parser interfaces</a></li>
50<li><a href="entities.html">Entities or no entities</a></li>
51<li><a href="namespaces.html">Namespaces</a></li>
52<li><a href="upgrade.html">Upgrading 1.x code</a></li>
53<li><a href="threads.html">Thread safety</a></li>
54<li><a href="DOM.html">DOM Principles</a></li>
55<li><a href="example.html">A real example</a></li>
56<li><a href="contribs.html">Contributions</a></li>
57<li>
58<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
59</li>
60</ul></td></tr>
61</table>
62<table width="100%" border="0" cellspacing="1" cellpadding="3">
63<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
64<tr><td bgcolor="#fffacd"><ul>
65<li><a href="APIchunk0.html">Alphabetic</a></li>
66<li><a href="APIconstructors.html">Constructors</a></li>
67<li><a href="APIfunctions.html">Functions/Types</a></li>
68<li><a href="APIfiles.html">Modules</a></li>
69<li><a href="APIsymbols.html">Symbols</a></li>
70</ul></td></tr>
71</table>
72<table width="100%" border="0" cellspacing="1" cellpadding="3">
73<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
74<tr><td bgcolor="#fffacd"><ul>
75<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
76<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
77<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
78<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
79<li><a href="ftp://xmlsoft.org/">FTP</a></li>
80<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
81<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
82<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&product=libxml2">Bug Tracker</a></li>
83</ul></td></tr>
84</table>
85</td></tr></table></td>
86<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
87<p>Table of Content:</p>
88<ol>
89<li><a href="#General1">General overview</a></li>
90<li><a href="#basic">The basic buffer type</a></li>
91<li><a href="#Input">Input I/O handlers</a></li>
92<li><a href="#Output">Output I/O handlers</a></li>
93<li><a href="#entities">The entities loader</a></li>
94<li><a href="#Example2">Example of customized I/O</a></li>
95</ol>
96<h3><a name="General1">General overview</a></h3>
97<p>The module <code><a href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides
98the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
99<ul>
100<li>Entities loader, this is a routine which tries to fetch the entities
101    (files) based on their PUBLIC and SYSTEM identifiers. The default loader
102    don't look at the public identifier since libxml do not maintain a
103    catalog. You can redefine you own entity loader by using
104    <code>xmlGetExternalEntityLoader()</code> and
105    <code>xmlSetExternalEntityLoader()</code>. <a href="#entities">Check the
106    example</a>.</li>
107<li>Input I/O buffers which are a commodity structure used by the parser(s)
108    input layer to handle fetching the informations to feed the parser. This
109    provides buffering and is also a placeholder where the encoding
110    convertors to UTF8 are piggy-backed.</li>
111<li>Output I/O buffers are similar to the Input ones and fulfill similar
112    task but when generating a serialization from a tree.</li>
113<li>A mechanism to register sets of I/O callbacks and associate them with
114    specific naming schemes like the protocol part of the URIs.
115    <p>This affect the default I/O operations and allows to use specific I/O
116    handlers for certain names.</p>
117</li>
118</ul>
119<p>The general mechanism used when loading http://rpmfind.net/xml.html for
120example in the HTML parser is the following:</p>
121<ol>
122<li>The default entity loader calls <code>xmlNewInputFromFile()</code> with
123    the parsing context and the URI string.</li>
124<li>the URI string is checked against the existing registered handlers
125    using their match() callback function, if the HTTP module was compiled
126    in, it is registered and its match() function will succeeds</li>
127<li>the open() function of the handler is called and if successful will
128    return an I/O Input buffer</li>
129<li>the parser will the start reading from this buffer and progressively
130    fetch information from the resource, calling the read() function of the
131    handler until the resource is exhausted</li>
132<li>if an encoding change is detected it will be installed on the input
133    buffer, providing buffering and efficient use of the conversion
134  routines</li>
135<li>once the parser has finished, the close() function of the handler is
136    called once and the Input buffer and associed resources are
137  deallocated.</li>
138</ol>
139<p>The user defined callbacks are checked first to allow overriding of the
140default libxml I/O routines.</p>
141<h3><a name="basic">The basic buffer type</a></h3>
142<p>All the buffer manipulation handling is done using the
143<code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a></code>which is a
144resizable memory buffer. The buffer allocation strategy can be selected to be
145either best-fit or use an exponential doubling one (CPU vs. memory use
146tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
147<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
148system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
149of functions allows to manipulate buffers with names starting with the
150<code>xmlBuffer...</code> prefix.</p>
151<h3><a name="Input">Input I/O handlers</a></h3>
152<p>An Input I/O handler is a simple structure
153<code>xmlParserInputBuffer</code> containing a context associated to the
154resource (file descriptor, or pointer to a protocol handler), the read() and
155close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset
156encoding handler are also present to support charset conversion when
157needed.</p>
158<h3><a name="Output">Output I/O handlers</a></h3>
159<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an
160Input one except the callbacks are write() and close().</p>
161<h3><a name="entities">The entities loader</a></h3>
162<p>The entity loader resolves requests for new entities and create inputs for
163the parser. Creating an input from a filename or an URI string is done
164through the xmlNewInputFromFile() routine.  The default entity loader do not
165handle the PUBLIC identifier associated with an entity (if any). So it just
166calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in
167XML).</p>
168<p>If you want to hook up a catalog mechanism then you simply need to
169override the default entity loader, here is an example:</p>
170<pre>#include &lt;libxml/xmlIO.h&gt;
171
172xmlExternalEntityLoader defaultLoader = NULL;
173
174xmlParserInputPtr
175xmlMyExternalEntityLoader(const char *URL, const char *ID,
176                               xmlParserCtxtPtr ctxt) {
177    xmlParserInputPtr ret;
178    const char *fileID = NULL;
179    /* lookup for the fileID depending on ID */
180
181    ret = xmlNewInputFromFile(ctxt, fileID);
182    if (ret != NULL)
183        return(ret);
184    if (defaultLoader != NULL)
185        ret = defaultLoader(URL, ID, ctxt);
186    return(ret);
187}
188
189int main(..) {
190    ...
191
192    /*
193     * Install our own entity loader
194     */
195    defaultLoader = xmlGetExternalEntityLoader();
196    xmlSetExternalEntityLoader(xmlMyExternalEntityLoader);
197
198    ...
199}</pre>
200<h3><a name="Example2">Example of customized I/O</a></h3>
201<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a
202real use case</a>,  xmlDocDump() closes the FILE * passed by the application
203and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
204new output handler with the closing call deactivated:</p>
205<ol>
206<li>First define a new I/O ouput allocator where the output don't close the
207    file:
208    <pre>xmlOutputBufferPtr
209xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
210����xmlOutputBufferPtr ret;
211����
212����if (xmlOutputCallbackInitialized == 0)
213��������xmlRegisterDefaultOutputCallbacks();
214
215����if (file == NULL) return(NULL);
216����ret = xmlAllocOutputBuffer(encoder);
217����if (ret != NULL) {
218��������ret-&gt;context = file;
219��������ret-&gt;writecallback = xmlFileWrite;
220��������ret-&gt;closecallback = NULL;  /* No close callback */
221����}
222����return(ret); <br>
223
224} </pre>
225</li>
226<li>And then use it to save the document:
227    <pre>FILE *f;
228xmlOutputBufferPtr output;
229xmlDocPtr doc;
230int res;
231
232f = ...
233doc = ....
234
235output = xmlOutputBufferCreateOwn(f, NULL);
236res = xmlSaveFileTo(output, doc, NULL);
237    </pre>
238</li>
239</ol>
240<p><a href="bugs.html">Daniel Veillard</a></p>
241</td></tr></table></td></tr></table></td></tr></table></td>
242</tr></table></td></tr></table>
243</body>
244</html>
245