xmlio.html revision 7216cfd6622d947695c67b7b430edef8cc0af967
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> 2<html> 3<head> 4<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 5<link rel="SHORTCUT ICON" href="/favicon.ico"> 6<style type="text/css"><!-- 7TD {font-family: Verdana,Arial,Helvetica} 8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 9H1 {font-family: Verdana,Arial,Helvetica} 10H2 {font-family: Verdana,Arial,Helvetica} 11H3 {font-family: Verdana,Arial,Helvetica} 12A:link, A:visited, A:active { text-decoration: underline } 13--></style> 14<title>I/O Interfaces</title> 15</head> 16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> 17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> 18<td width="180"> 19<a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo"></a></div> 20</td> 21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> 22<h1>The XML C library for Gnome</h1> 23<h2>I/O Interfaces</h2> 24</td></tr></table></td></tr></table></td> 25</tr></table> 26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> 27<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> 28<table width="100%" border="0" cellspacing="1" cellpadding="3"> 29<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> 30<tr><td bgcolor="#fffacd"><ul> 31<li><a href="index.html">Home</a></li> 32<li><a href="intro.html">Introduction</a></li> 33<li><a href="FAQ.html">FAQ</a></li> 34<li><a href="docs.html">Documentation</a></li> 35<li><a href="bugs.html">Reporting bugs and getting help</a></li> 36<li><a href="help.html">How to help</a></li> 37<li><a href="downloads.html">Downloads</a></li> 38<li><a href="news.html">News</a></li> 39<li><a href="XMLinfo.html">XML</a></li> 40<li><a href="XSLT.html">XSLT</a></li> 41<li><a href="python.html">Python and bindings</a></li> 42<li><a href="architecture.html">libxml architecture</a></li> 43<li><a href="tree.html">The tree output</a></li> 44<li><a href="interface.html">The SAX interface</a></li> 45<li><a href="xmldtd.html">Validation & DTDs</a></li> 46<li><a href="xmlmem.html">Memory Management</a></li> 47<li><a href="encoding.html">Encodings support</a></li> 48<li><a href="xmlio.html">I/O Interfaces</a></li> 49<li><a href="catalog.html">Catalog support</a></li> 50<li><a href="library.html">The parser interfaces</a></li> 51<li><a href="entities.html">Entities or no entities</a></li> 52<li><a href="namespaces.html">Namespaces</a></li> 53<li><a href="upgrade.html">Upgrading 1.x code</a></li> 54<li><a href="threads.html">Thread safety</a></li> 55<li><a href="DOM.html">DOM Principles</a></li> 56<li><a href="example.html">A real example</a></li> 57<li><a href="contribs.html">Contributions</a></li> 58<li><a href="tutorial/index.html">Tutorial</a></li> 59<li> 60<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> 61</li> 62</ul></td></tr> 63</table> 64<table width="100%" border="0" cellspacing="1" cellpadding="3"> 65<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr> 66<tr><td bgcolor="#fffacd"> 67<form action="search.php" enctype="application/x-www-form-urlencoded" method="GET"> 68<input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ..."> 69</form> 70<ul> 71<li><a href="APIchunk0.html">Alphabetic</a></li> 72<li><a href="APIconstructors.html">Constructors</a></li> 73<li><a href="APIfunctions.html">Functions/Types</a></li> 74<li><a href="APIfiles.html">Modules</a></li> 75<li><a href="APIsymbols.html">Symbols</a></li> 76</ul> 77</td></tr> 78</table> 79<table width="100%" border="0" cellspacing="1" cellpadding="3"> 80<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> 81<tr><td bgcolor="#fffacd"><ul> 82<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> 83<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> 84<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li> 85<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li> 86<li><a href="ftp://xmlsoft.org/">FTP</a></li> 87<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> 88<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> 89<li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li> 90<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li> 91<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&product=libxml2">Bug Tracker</a></li> 92</ul></td></tr> 93</table> 94</td></tr></table></td> 95<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> 96<p>Table of Content:</p> 97<ol> 98<li><a href="#General1">General overview</a></li> 99 <li><a href="#basic">The basic buffer type</a></li> 100 <li><a href="#Input">Input I/O handlers</a></li> 101 <li><a href="#Output">Output I/O handlers</a></li> 102 <li><a href="#entities">The entities loader</a></li> 103 <li><a href="#Example2">Example of customized I/O</a></li> 104</ol> 105<h3><a name="General1">General overview</a></h3> 106<p>The module <code><a href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides 107the interfaces to the libxml I/O system. This consists of 4 main parts:</p> 108<ul> 109<li>Entities loader, this is a routine which tries to fetch the entities 110 (files) based on their PUBLIC and SYSTEM identifiers. The default loader 111 don't look at the public identifier since libxml do not maintain a 112 catalog. You can redefine you own entity loader by using 113 <code>xmlGetExternalEntityLoader()</code> and 114 <code>xmlSetExternalEntityLoader()</code>. <a href="#entities">Check the 115 example</a>.</li> 116 <li>Input I/O buffers which are a commodity structure used by the parser(s) 117 input layer to handle fetching the informations to feed the parser. This 118 provides buffering and is also a placeholder where the encoding 119 converters to UTF8 are piggy-backed.</li> 120 <li>Output I/O buffers are similar to the Input ones and fulfill similar 121 task but when generating a serialization from a tree.</li> 122 <li>A mechanism to register sets of I/O callbacks and associate them with 123 specific naming schemes like the protocol part of the URIs. 124 <p>This affect the default I/O operations and allows to use specific I/O 125 handlers for certain names.</p> 126 </li> 127</ul> 128<p>The general mechanism used when loading http://rpmfind.net/xml.html for 129example in the HTML parser is the following:</p> 130<ol> 131<li>The default entity loader calls <code>xmlNewInputFromFile()</code> with 132 the parsing context and the URI string.</li> 133 <li>the URI string is checked against the existing registered handlers 134 using their match() callback function, if the HTTP module was compiled 135 in, it is registered and its match() function will succeeds</li> 136 <li>the open() function of the handler is called and if successful will 137 return an I/O Input buffer</li> 138 <li>the parser will the start reading from this buffer and progressively 139 fetch information from the resource, calling the read() function of the 140 handler until the resource is exhausted</li> 141 <li>if an encoding change is detected it will be installed on the input 142 buffer, providing buffering and efficient use of the conversion 143 routines</li> 144 <li>once the parser has finished, the close() function of the handler is 145 called once and the Input buffer and associated resources are 146 deallocated.</li> 147</ol> 148<p>The user defined callbacks are checked first to allow overriding of the 149default libxml I/O routines.</p> 150<h3><a name="basic">The basic buffer type</a></h3> 151<p>All the buffer manipulation handling is done using the 152<code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a 153resizable memory buffer. The buffer allocation strategy can be selected to be 154either best-fit or use an exponential doubling one (CPU vs. memory use 155trade-off). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and 156<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a 157system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number 158of functions allows to manipulate buffers with names starting with the 159<code>xmlBuffer...</code> prefix.</p> 160<h3><a name="Input">Input I/O handlers</a></h3> 161<p>An Input I/O handler is a simple structure 162<code>xmlParserInputBuffer</code> containing a context associated to the 163resource (file descriptor, or pointer to a protocol handler), the read() and 164close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset 165encoding handler are also present to support charset conversion when 166needed.</p> 167<h3><a name="Output">Output I/O handlers</a></h3> 168<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an 169Input one except the callbacks are write() and close().</p> 170<h3><a name="entities">The entities loader</a></h3> 171<p>The entity loader resolves requests for new entities and create inputs for 172the parser. Creating an input from a filename or an URI string is done 173through the xmlNewInputFromFile() routine. The default entity loader do not 174handle the PUBLIC identifier associated with an entity (if any). So it just 175calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in 176XML).</p> 177<p>If you want to hook up a catalog mechanism then you simply need to 178override the default entity loader, here is an example:</p> 179<pre>#include <libxml/xmlIO.h> 180 181xmlExternalEntityLoader defaultLoader = NULL; 182 183xmlParserInputPtr 184xmlMyExternalEntityLoader(const char *URL, const char *ID, 185 xmlParserCtxtPtr ctxt) { 186 xmlParserInputPtr ret; 187 const char *fileID = NULL; 188 /* lookup for the fileID depending on ID */ 189 190 ret = xmlNewInputFromFile(ctxt, fileID); 191 if (ret != NULL) 192 return(ret); 193 if (defaultLoader != NULL) 194 ret = defaultLoader(URL, ID, ctxt); 195 return(ret); 196} 197 198int main(..) { 199 ... 200 201 /* 202 * Install our own entity loader 203 */ 204 defaultLoader = xmlGetExternalEntityLoader(); 205 xmlSetExternalEntityLoader(xmlMyExternalEntityLoader); 206 207 ... 208}</pre> 209<h3><a name="Example2">Example of customized I/O</a></h3> 210<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a 211real use case</a>, xmlDocDump() closes the FILE * passed by the application 212and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a 213new output handler with the closing call deactivated:</p> 214<ol> 215<li>First define a new I/O output allocator where the output don't close 216 the file: 217 <pre>xmlOutputBufferPtr 218xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) { 219����xmlOutputBufferPtr ret; 220���� 221����if (xmlOutputCallbackInitialized == 0) 222��������xmlRegisterDefaultOutputCallbacks(); 223 224����if (file == NULL) return(NULL); 225����ret = xmlAllocOutputBuffer(encoder); 226����if (ret != NULL) { 227��������ret->context = file; 228��������ret->writecallback = xmlFileWrite; 229��������ret->closecallback = NULL; /* No close callback */ 230����} 231����return(ret); <br> 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252} </pre> 253 </li> 254 <li>And then use it to save the document: 255 <pre>FILE *f; 256xmlOutputBufferPtr output; 257xmlDocPtr doc; 258int res; 259 260f = ... 261doc = .... 262 263output = xmlOutputBufferCreateOwn(f, NULL); 264res = xmlSaveFileTo(output, doc, NULL); 265 </pre> 266 </li> 267</ol> 268<p><a href="bugs.html">Daniel Veillard</a></p> 269</td></tr></table></td></tr></table></td></tr></table></td> 270</tr></table></td></tr></table> 271</body> 272</html> 273