xmltutorial.xml revision 63f3a47dbff34aa8e2075cd435bc8b19869ab4d1
1<?xml version="1.0"?>
2<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3    "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
4<!ENTITY KEYWORD SYSTEM "includekeyword.c">
5<!ENTITY XPATH SYSTEM "includexpath.c">
6<!ENTITY STORY SYSTEM "includestory.xml">
7<!ENTITY ADDKEYWORD SYSTEM "includeaddkeyword.c">
8<!ENTITY ADDATTRIBUTE SYSTEM "includeaddattribute.c">
9<!ENTITY GETATTRIBUTE SYSTEM "includegetattribute.c">
10<!ENTITY CONVERT SYSTEM "includeconvert.c">
11]>
12<article lang="en">
13  <articleinfo>
14    <title>Libxml Tutorial</title>
15    <author>
16      <firstname>John</firstname>
17      <surname>Fleck</surname>
18      <email>jfleck@inkstain.net</email>
19    </author>
20    <copyright>
21      <year>2002, 2003</year>
22      <holder>John Fleck</holder>
23    </copyright>
24    <revhistory>
25      <revision>
26	<revnumber>1</revnumber>
27	<date>June 4, 2002</date>
28	<revremark>Initial draft</revremark>
29      </revision>
30      <revision>
31	<revnumber>2</revnumber>
32	<date>June 12, 2002</date>
33	<revremark>retrieving attribute value added</revremark>
34      </revision>
35      <revision>
36	<revnumber>3</revnumber>
37	<date>Aug. 31, 2002</date>
38	<revremark>freeing memory fix</revremark>
39      </revision>
40      <revision>
41	<revnumber>4</revnumber>
42	<date>Nov. 10, 2002</date>
43	<revremark>encoding discussion added</revremark>
44      </revision>
45      <revision>
46	<revnumber>5</revnumber>
47	<date>Dec. 15, 2002</date>
48	<revremark>more memory freeing changes</revremark>
49      </revision>
50      <revision>
51	<revnumber>6</revnumber>
52	<date>Jan. 26. 2003</date>
53	<revremark>add index</revremark>
54      </revision>
55      <revision>
56	<revnumber>7</revnumber>
57	<date>April 25, 2003</date>
58	<revremark>add compilation appendix</revremark>
59      </revision>
60      <revision>
61	<revnumber>8</revnumber>
62	<date>July 24, 2003</date>
63	<revremark>add XPath example</revremark>
64      </revision>
65    </revhistory>
66  </articleinfo>
67  <abstract>
68    <para>Libxml is a freely licensed C language library for handling
69    <acronym>XML</acronym>, portable across a large number of platforms. This
70    tutorial provides examples of its basic functions.</para>
71  </abstract>
72  <sect1 id="introduction">
73    <title>Introduction</title>
74    <para>Libxml is a C language library implementing functions for reading,
75      creating and manipulating <acronym>XML</acronym> data. This tutorial
76    provides example code and explanations of its basic functionality.</para>
77    <para>Libxml and more details about its use are available on <ulink
78									url="http://www.xmlsoft.org/">the project home page</ulink>. Included there is complete <ulink url="http://xmlsoft.org/html/libxml-lib.html">
79	<acronym>API</acronym> documentation</ulink>. This tutorial is not meant
80    to substitute for that complete documentation, but to illustrate the
81    functions needed to use the library to perform basic operations.
82<!--
83 Links to
84      other resources can be found in <xref linkend="furtherresources" />.
85-->
86</para>
87    <para>The tutorial is based on a simple <acronym>XML</acronym> application I
88    use for articles I write. The format includes metadata and the body
89    of the article.</para>
90    <para>The example code in this tutorial demonstrates how to:
91      <itemizedlist>
92	<listitem>
93	  <para>Parse the document.</para>
94	</listitem>
95	<listitem>
96	  <para>Extract the text within a specified element.</para>
97	</listitem>
98	<listitem>
99	  <para>Add an element and its content.</para>
100	</listitem>
101	<listitem>
102	  <para>Add an attribute.</para>
103	</listitem>      
104	<listitem>
105	  <para>Extract the value of an attribute.</para>
106	</listitem>
107      </itemizedlist>
108    </para>
109    <para>Full code for the examples is included in the appendices.</para>
110
111  </sect1>
112
113  <sect1 id="xmltutorialdatatypes">
114    <title>Data Types</title>
115    <para><application>Libxml</application> declares a number of data types we
116    will encounter repeatedly, hiding the messy stuff so you do not have to deal
117    with it unless you have some specific need.</para>
118    <para>
119      <variablelist>
120	<varlistentry>
121	  <term><indexterm>
122	      <primary>xmlChar</primary>
123	    </indexterm>
124<ulink
125	  url="http://xmlsoft.org/html/libxml-tree.html#XMLCHAR">xmlChar</ulink></term>
126	  <listitem>
127	    <para>A basic replacement for char, a byte in a UTF-8 encoded
128	    string. If your data uses another encoding, it must be converted to
129	      UTF-8 for use with <application>libxml's</application>
130	      functions. More information on encoding is available on the <ulink
131		url="http://www.xmlsoft.org/encoding.html"><application>libxml</application> encoding support web page</ulink>.</para>
132	  </listitem>
133	</varlistentry>
134	<varlistentry>
135	  <term><indexterm>
136	      <primary>xmlDoc</primary>
137	    </indexterm>
138	    <ulink url="http://xmlsoft.org/html/libxml-tree.html#XMLDOC">xmlDoc</ulink></term>
139	  <listitem>
140	    <para>A structure containing the tree created by a parsed doc. <ulink
141	  url="http://xmlsoft.org/html/libxml-tree.html#XMLDOCPTR">xmlDocPtr</ulink>
142	  is a pointer to the structure.</para>
143	  </listitem>
144	</varlistentry>
145	<varlistentry>
146	  <term><indexterm>
147	      <primary>xmlNodePtr</primary>
148	    </indexterm>
149<ulink
150	  url="http://xmlsoft.org/html/libxml-tree.html#XMLNODEPTR">xmlNodePtr</ulink>
151	    and <ulink url="http://xmlsoft.org/html/libxml-tree.html#XMLNODE">xmlNode</ulink></term>
152	  <listitem>
153	    <para>A structure containing a single node. <ulink
154	  url="http://xmlsoft.org/html/libxml-tree.html#XMLNODEPTR">xmlNodePtr</ulink>
155	  is a pointer to the structure, and is used in traversing the document tree.</para>
156	  </listitem>
157	</varlistentry>
158      </variablelist>
159    </para>
160
161  </sect1>
162
163  <sect1 id="xmltutorialparsing">
164    <title>Parsing the file</title>
165    <para><indexterm id="fileparsing" class="startofrange">
166	<primary>file</primary>
167	<secondary>parsing</secondary>
168      </indexterm>
169Parsing the file requires only the name of the file and a single
170      function call, plus error checking. Full code: <xref
171    linkend="keywordappendix" /></para>
172    <para>
173    <programlisting>
174        <co id="declaredoc" /> xmlDocPtr doc;
175	<co id="declarenode" /> xmlNodePtr cur;
176
177	<co id="parsefile" /> doc = xmlParseFile(docname);
178	
179	<co id="checkparseerror" /> if (doc == NULL ) {
180		fprintf(stderr,"Document not parsed successfully. \n");
181		return;
182	}
183
184	<co id="getrootelement" /> cur = xmlDocGetRootElement(doc);
185	
186	<co id="checkemptyerror" /> if (cur == NULL) {
187		fprintf(stderr,"empty document\n");
188		xmlFreeDoc(doc);
189		return;
190	}
191	
192	<co id="checkroottype" /> if (xmlStrcmp(cur->name, (const xmlChar *) "story")) {
193		fprintf(stderr,"document of the wrong type, root node != story");
194		xmlFreeDoc(doc);
195		return;
196	}
197
198    </programlisting>
199      <calloutlist>
200	<callout arearefs="declaredoc">
201	  <para>Declare the pointer that will point to your parsed document.</para>
202	</callout>
203	<callout arearefs="declarenode">
204	  <para>Declare a node pointer (you'll need this in order to
205	  interact with individual nodes).</para>
206	</callout>
207	<callout arearefs="checkparseerror">
208	  <para>Check to see that the document was successfully parsed. If it
209	    was not, <application>libxml</application> will at this point
210	    register an error and stop. 
211	    <note>
212	      <para><indexterm>
213		  <primary>encoding</primary>
214		</indexterm>
215One common example of an error at this point is improper
216	    handling of encoding. The <acronym>XML</acronym> standard requires
217	    documents stored with an encoding other than UTF-8 or UTF-16 to
218	    contain an explicit declaration of their encoding. If the
219	    declaration is there, <application>libxml</application> will
220	    automatically perform the necessary conversion to UTF-8 for
221		you. More information on <acronym>XML's</acronym> encoding
222		requirements is contained in the <ulink
223		  url="http://www.w3.org/TR/REC-xml#charencoding">standard</ulink>.</para>
224	    </note>
225	  </para>
226	</callout>
227	<callout arearefs="getrootelement">
228	  <para>Retrieve the document's root element.</para>
229	</callout>
230	<callout arearefs="checkemptyerror">
231	  <para>Check to make sure the document actually contains something.</para>
232	</callout>
233	<callout arearefs="checkroottype">
234	  <para>In our case, we need to make sure the document is the right
235	  type. &quot;story&quot; is the root type of the documents used in this
236	  tutorial.</para>
237	</callout>
238      </calloutlist>
239      <indexterm startref="fileparsing" class="endofrange" />
240    </para>
241  </sect1>
242
243  <sect1 id="xmltutorialgettext">
244    <title>Retrieving Element Content</title>
245    <para><indexterm>
246	<primary>element</primary>
247	<secondary>retrieving content</secondary>
248      </indexterm>
249Retrieving the content of an element involves traversing the document
250    tree until you find what you are looking for. In this case, we are looking
251    for an element called &quot;keyword&quot; contained within element called &quot;story&quot;. The
252    process to find the node we are interested in involves tediously walking the
253    tree. We assume you already have an xmlDocPtr called <varname>doc</varname>
254    and an xmlNodPtr called <varname>cur</varname>.</para>
255
256    <para>
257      <programlisting>
258	<co id="getchildnode" />cur = cur->xmlChildrenNode;
259	<co id="huntstoryinfo" />while (cur != NULL) {
260		if ((!xmlStrcmp(cur->name, (const xmlChar *)"storyinfo"))){
261			parseStory (doc, cur);
262		}
263		 
264	cur = cur->next;
265	}
266      </programlisting>
267
268      <calloutlist>
269	<callout arearefs="getchildnode">
270	  <para>Get the first child node of <varname>cur</varname>. At this
271	    point, <varname>cur</varname> points at the document root, which is
272	    the element &quot;story&quot;.</para>
273	</callout>
274	<callout arearefs="huntstoryinfo">
275	  <para>This loop iterates through the elements that are children of
276	  &quot;story&quot;, looking for one called &quot;storyinfo&quot;. That
277	  is the element that will contain the &quot;keywords&quot; we are
278	    looking for. It uses the <application>libxml</application> string
279	  comparison
280	    function, <function><ulink
281				       url="http://xmlsoft.org/html/libxml-parser.html#XMLSTRCMP">xmlStrcmp</ulink></function>. If there is a match, it calls the function <function>parseStory</function>.</para>
282	</callout>
283      </calloutlist>
284    </para>
285
286    <para>
287      <programlisting>
288void
289parseStory (xmlDocPtr doc, xmlNodePtr cur) {
290
291	xmlChar *key;
292	<co id="anothergetchild" /> cur = cur->xmlChildrenNode;
293	<co id="findkeyword" /> while (cur != NULL) {
294	    if ((!xmlStrcmp(cur->name, (const xmlChar *)"keyword"))) {
295	<co id="foundkeyword" />	    key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
296		    printf("keyword: %s\n", key);
297		    xmlFree(key);
298 	    }
299	cur = cur->next;
300	}
301    return;
302}
303      </programlisting>
304      <calloutlist>
305	<callout arearefs="anothergetchild">
306	  <para>Again we get the first child node.</para>
307	</callout>
308	<callout arearefs="findkeyword">
309	  <para>Like the loop above, we then iterate through the nodes, looking
310	  for one that matches the element we're interested in, in this case
311	  &quot;keyword&quot;.</para>
312	</callout>
313	<callout arearefs="foundkeyword">
314	  <para>When we find the &quot;keyword&quot; element, we need to print
315	    its contents. Remember that in <acronym>XML</acronym>, the text
316	    contained within an element is a child node of that element, so we
317	    turn to <varname>cur-&gt;xmlChildrenNode</varname>. To retrieve it, we
318	    use the function <function><ulink
319					      url="http://xmlsoft.org/html/libxml-tree.html#XMLNODELISTGETSTRING">xmlNodeListGetString</ulink></function>, which also takes the <varname>doc</varname> pointer as an argument. In this case, we just print it out.</para>
320	  <note>
321	    <para>Because <function>xmlNodeListGetString</function> allocates
322	      memory for the string it returns, you must use
323	      <function>xmlFree</function> to free it.</para>
324	  </note>
325	</callout>
326      </calloutlist>
327    </para>
328
329  </sect1>
330  <sect1 id="xmltutorialxpath">
331    <title>Using XPath to Retrieve Element Content</title>
332    <para>In addition to walking the document tree to find an element,
333    <application>Libxml2</application> includes support for
334      use of <application>XPath</application> expressions to retrieve sets of
335      nodes that match a specified criteria. Full documentation of the
336      <application>XPath</application> <acronym>API</acronym> is <ulink
337	url="http://xmlsoft.org/html/libxml-xpath.html">here</ulink>.
338    </para>
339    <para><application>XPath</application> allows searching through a document
340    for nodes that match specified criteria. In the example below we search
341      through a document for the contents of all <varname>keyword</varname>
342    elements.
343      <note>
344	<para>A full discussion of <application>XPath</application> is beyond
345	  the scope of this document. For details on its use, see the <ulink
346	    url="http://www.w3.org/TR/xpath">XPath specification</ulink>.</para>
347      </note>
348      Full code for this example is at <xref linkend="xpathappendix" />.
349    </para>
350    <para>Using <application>XPath</application> requires setting up an
351      xmlXPathContext and then supplying the <application>XPath</application>
352      expression and the context to the
353      <function>xmlXPathEvalExpression</function> function. The function returns
354      an xmlXPathObjectPtr, which includes the set of nodes satisfying the
355      <application>XPath</application> expression.</para>
356    <para>
357      <programlisting>
358	xmlXPathObjectPtr
359	getnodeset (xmlDocPtr doc, xmlChar *xpath){
360	
361	<co id="cocontext" />xmlXPathContextPtr context;
362	xmlXPathObjectPtr result;
363
364	<co id="cocreatecontext" />context = xmlXPathNewContext(doc);
365	<co id="corunxpath" />result = xmlXPathEvalExpression(xpath, context);
366	<co id="cocheckxpathresult" />if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
367                printf("No result\n");
368                return NULL;
369	}
370	xmlXPathFreeContext(context);
371	return result;
372      </programlisting>
373      <calloutlist>
374	<callout arearefs="cocontext">
375	  <para>First we declare our variables.</para>
376	</callout>
377	<callout arearefs="cocreatecontext">
378	  <para>Initialize the <varname>context</varname> variable.</para>
379	</callout>
380	<callout arearefs="corunxpath">
381	  <para>Apply the <application>XPath</application> expression.</para>
382	</callout>
383	<callout arearefs="cocheckxpathresult">
384	  <para>Check the result.</para>
385	</callout>
386      </calloutlist>
387    </para>
388    <para>The xmlPathObjectPtr returned by the function contains a set of nodes
389    and other information needed to iterate through the set and act on the
390      results. For this example, our functions returns the
391    <varname>xmlXPathObjectPtr</varname>. We use it to print the contents of
392      <varname>keyword</varname> nodes in our document. The node set object
393      includes the number of elements in the set (<varname>nodeNr</varname>) and
394      an array of nodes (<varname>nodeTab</varname>):
395      <programlisting>
396	<co id="conodesetcounter" />for (i=0; i &lt; nodeset->nodeNr; i++) {
397	<co id="coprintkeywords" />keyword = xmlNodeListGetString(doc, nodeset->nodeTab[i]->xmlChildrenNode, 1);
398		printf("keyword: %s\n", keyword);
399	}
400      </programlisting>
401      <calloutlist>
402	<callout arearefs="conodesetcounter">
403	  <para>The value of <varname>nodeset->Nr</varname> holds the number of
404	  elements in the node set. Here we use it to iterate through the array.</para>
405	</callout>
406	<callout arearefs="coprintkeywords">
407	  <para>Here we print the contents of each of the nodes returned.
408	    <note>
409	      <para>Note that we are printing the child node of the node that is
410		returned, because the contents of the <varname>keyword</varname>
411		element are a child text node.</para>
412	    </note>
413	  </para>
414	</callout>
415      </calloutlist>
416    </para>
417  </sect1>
418<sect1 id="xmltutorialwritingcontent">
419    <title>Writing element content</title>
420    <para><indexterm>
421	<primary>element</primary>
422	<secondary>writing content</secondary>
423      </indexterm>
424      Writing element content uses many of the same steps we used above
425      &mdash; parsing the document and walking the tree. We parse the document,
426      then traverse the tree to find the place we want to insert our element. For
427      this example, we want to again find the &quot;storyinfo&quot; element and
428      this time insert a keyword. Then we'll write the file to disk. Full code:
429      <xref linkend="addkeywordappendix" /></para>
430    <para>
431      The main difference in this example is in
432      <function>parseStory</function>:
433
434      <programlisting>
435void
436parseStory (xmlDocPtr doc, xmlNodePtr cur, char *keyword) {
437
438	<co id="addkeyword" /> xmlNewTextChild (cur, NULL, "keyword", keyword);
439    return;
440}
441      </programlisting>
442      <calloutlist>
443	<callout arearefs="addkeyword">
444	  <para>The <function><ulink
445				     url="http://xmlsoft.org/html/libxml-tree.html#XMLNEWTEXTCHILD">xmlNewTextChild</ulink></function>
446				     function adds a new child element at the
447				     current node pointer's location in the
448	    tree, specified by <varname>cur</varname>.</para>
449	</callout>
450      </calloutlist>
451         </para>
452
453    <para>
454      <indexterm>
455	<primary>file</primary>
456	<secondary>saving</secondary>
457      </indexterm>
458      Once the node has been added, we would like to write the document to
459      file. Is you want the element to have a namespace, you can add it here as
460      well. In our case, the namespace is NULL.
461      <programlisting>
462	xmlSaveFormatFile (docname, doc, 1);
463      </programlisting>
464      The first parameter is the name of the file to be written. You'll notice
465      it is the same as the file we just read. In this case, we just write over
466      the old file. The second parameter is a pointer to the xmlDoc
467      structure. Setting the third parameter equal to one ensures indenting on output.
468    </para>
469  </sect1>
470
471  <sect1 id="xmltutorialwritingattribute">
472    <title>Writing Attribute</title>
473    <para><indexterm>
474	<primary>attribute</primary>
475	<secondary>writing</secondary>
476      </indexterm>
477Writing an attribute is similar to writing text to a new element. In
478      this case, we'll add a reference <acronym>URI</acronym> to our
479      document. Full code:<xref linkend="addattributeappendix" />.</para>
480    <para>
481      A <sgmltag>reference</sgmltag> is a child of the <sgmltag>story</sgmltag>
482      element, so finding the place to put our new element and attribute is
483      simple. As soon as we do the error-checking test in our
484      <function>parseDoc</function>, we are in the right spot to add our
485      element. But before we do that, we need to make a declaration using a
486      data type we have not seen yet:
487      <programlisting>
488	xmlAttrPtr newattr;
489      </programlisting>
490      We also need an extra xmlNodePtr:
491      <programlisting>
492	xmlNodePtr newnode;
493      </programlisting>
494    </para>
495    <para>
496      The rest of <function>parseDoc</function> is the same as before until we
497      check to see if our root element is <sgmltag>story</sgmltag>. If it is,
498      then we know we are at the right spot to add our element:
499
500      <programlisting>
501	<co id="addreferencenode" /> newnode = xmlNewTextChild (cur, NULL, "reference", NULL);
502	<co id="addattributenode" /> newattr = xmlNewProp (newnode, "uri", uri);	
503      </programlisting>
504      <calloutlist>
505	<callout arearefs="addreferencenode">
506	  <para>First we add a new node at the location of the current node
507	    pointer, <varname>cur.</varname> using the <ulink
508							      url="http://xmlsoft.org/html/libxml-tree.html#XMLNEWTEXTCHILD">xmlNewTextChild</ulink> function.</para>
509	</callout>
510      </calloutlist>
511   </para>
512
513    <para>Once the node is added, the file is written to disk just as in the
514    previous example in which we added an element with text content.</para>
515
516  </sect1>
517
518  <sect1 id="xmltutorialattribute">
519    <title>Retrieving Attributes</title>
520    <para><indexterm>
521	<primary>attribute</primary>
522	<secondary>retrieving value</secondary>
523      </indexterm>
524Retrieving the value of an attribute is similar to the previous
525    example in which we retrieved a node's text contents. In this case we'll
526      extract the value of the <acronym>URI</acronym> we added in the previous
527      section. Full code: <xref linkend="getattributeappendix" />.</para>
528    <para>
529      The initial steps for this example are similar to the previous ones: parse
530      the doc, find the element you are interested in, then enter a function to
531      carry out the specific task required. In this case, we call
532      <function>getReference</function>:
533      <programlisting>
534void
535getReference (xmlDocPtr doc, xmlNodePtr cur) {
536
537	xmlChar *uri;
538	cur = cur->xmlChildrenNode;
539	while (cur != NULL) {
540	    if ((!xmlStrcmp(cur->name, (const xmlChar *)"reference"))) {
541		   <co id="getattributevalue" /> uri = xmlGetProp(cur, "uri");
542		    printf("uri: %s\n", uri);
543		    xmlFree(uri);
544	    }
545	    cur = cur->next;
546	}
547	return;
548}
549      </programlisting>
550    
551      <calloutlist>
552	<callout arearefs="getattributevalue">
553	  <para>
554	    The key function is <function><ulink
555					   url="http://xmlsoft.org/html/libxml-tree.html#XMLGETPROP">xmlGetProp</ulink></function>, which returns an
556      <varname>xmlChar</varname> containing the attribute's value. In this case,
557					   we just print it out.
558      <note>
559	<para>
560	  If you are using a <acronym>DTD</acronym> that declares a fixed or
561	  default value for the attribute, this function will retrieve it.
562	</para>
563	    </note>
564	  </para>
565	</callout>
566      </calloutlist>
567     
568    </para>
569  </sect1>
570
571  <sect1 id="xmltutorialconvert">
572    <title>Encoding Conversion</title>
573
574    <para><indexterm>
575	<primary>encoding</primary>
576      </indexterm>
577Data encoding compatibility problems are one of the most common
578      difficulties encountered by programmers new to <acronym>XML</acronym> in
579      general and <application>libxml</application> in particular. Thinking
580      through the design of your application in light of this issue will help
581      avoid difficulties later. Internally, <application>libxml</application>
582      stores and manipulates data in the UTF-8 format. Data used by your program
583      in other formats, such as the commonly used ISO-8859-1 encoding, must be
584      converted to UTF-8 before passing it to <application>libxml</application>
585      functions. If you want your program's output in an encoding other than
586      UTF-8, you also must convert it.</para>
587
588      <para><application>Libxml</application> uses
589      <application>iconv</application> if it is available to convert
590    data. Without <application>iconv</application>, only UTF-8, UTF-16 and
591    ISO-8859-1 can be used as external formats. With
592    <application>iconv</application>, any format can be used provided
593    <application>iconv</application> is able to convert it to and from
594    UTF-8. Currently <application>iconv</application> supports about 150
595    different character formats with ability to convert from any to any. While
596    the actual number of supported formats varies between implementations, every
597    <application>iconv</application> implementation is almost guaranteed to
598    support every format anyone has ever heard of.</para>
599
600    <warning>
601      <para>A common mistake is to use different formats for the internal data
602	in different parts of one's code. The most common case is an application
603	that assumes ISO-8859-1 to be the internal data format, combined with
604	<application>libxml</application>, which assumes UTF-8 to be the
605	internal data format. The result is an application that treats internal
606	data differently, depending on which code section is executing. The one or
607	the other part of code will then, naturally, misinterpret the data.
608      </para>
609    </warning>
610
611    <para>This example constructs a simple document, then adds content provided
612    at the command line to the document's root element and outputs the results
613    to <filename>stdout</filename> in the proper encoding. For this example, we
614    use ISO-8859-1 encoding. The encoding of the string input at the command
615    line is converted from ISO-8859-1 to UTF-8. Full code: <xref
616    linkend="convertappendix" /></para>
617
618    <para>The conversion, encapsulated in the example code in the
619      <function>convert</function> function, uses
620      <application>libxml's</application>
621    <function>xmlFindCharEncodingHandler</function> function:
622      <programlisting>
623	<co id="handlerdatatype" />xmlCharEncodingHandlerPtr handler;
624        <co id="calcsize" />size = (int)strlen(in)+1; 
625        out_size = size*2-1; 
626        out = malloc((size_t)out_size); 
627
628&hellip;
629	<co id="findhandlerfunction" />handler = xmlFindCharEncodingHandler(encoding);
630&hellip;
631	<co id="callconversionfunction" />handler->input(out, &amp;out_size, in, &amp;temp);
632&hellip;	
633	<co id="outputencoding" />xmlSaveFormatFileEnc("-", doc, encoding, 1);
634      </programlisting>
635      <calloutlist>
636	<callout arearefs="handlerdatatype">
637	  <para><varname>handler</varname> is declared as a pointer to an
638	    <function>xmlCharEncodingHandler</function> function.</para>
639	</callout>
640	<callout arearefs="calcsize">
641	  <para>The <function>xmlCharEncodingHandler</function> function needs
642	  to be given the size of the input and output strings, which are
643	    calculated here for strings <varname>in</varname> and
644	  <varname>out</varname>.</para>
645	</callout>
646	<callout arearefs="findhandlerfunction">
647	  <para><function>xmlFindCharEncodingHandler</function> takes as its
648	    argument the data's initial encoding and searches
649	    <application>libxml's</application> built-in set of conversion
650	    handlers, returning a pointer to the function or NULL if none is
651	    found.</para>
652	</callout>
653	<callout arearefs="callconversionfunction">
654	  <para>The conversion function identified by <varname>handler</varname>
655	  requires as its arguments pointers to the input and output strings,
656	  along with the length of each. The lengths must be determined
657	  separately by the application.</para>
658	</callout>
659	<callout arearefs="outputencoding">
660	  <para>To output in a specified encoding rather than UTF-8, we use
661	    <function>xmlSaveFormatFileEnc</function>, specifying the
662	    encoding.</para>
663	</callout>
664      </calloutlist>
665    </para>
666  </sect1>
667
668  <appendix id="compilation">
669    <title>Compilation</title>
670    <para><indexterm>
671	<primary>compiler flags</primary>
672      </indexterm>
673      <application>Libxml</application> includes a script,
674    <application>xml2-config</application>, that can be used to generate
675    flags for compilation and linking of programs written with the
676      library. For pre-processor and compiler flags, use <command>xml2-config
677	--cflags</command>. For library linking flags, use <command>xml2-config
678	--libs</command>. Other options are available using <command>xml2-config
679    --help</command>.</para>   
680  </appendix>
681
682  <appendix id="sampledoc">
683    <title>Sample Document</title>
684    <programlisting>&STORY;</programlisting>
685  </appendix>
686  <appendix id="keywordappendix">
687    <title>Code for Keyword Example</title>
688    <para>
689      <programlisting>&KEYWORD;</programlisting>
690    </para>
691  </appendix>
692  <appendix id="xpathappendix">
693    <title>Code for XPath Example</title>
694    <para>
695      <programlisting>&XPATH;</programlisting>
696    </para>
697  </appendix>
698<appendix id="addkeywordappendix">
699    <title>Code for Add Keyword Example</title>
700    <para>
701      <programlisting>&ADDKEYWORD;</programlisting>
702    </para>
703  </appendix>
704<appendix id="addattributeappendix">
705    <title>Code for Add Attribute Example</title>
706    <para>
707      <programlisting>&ADDATTRIBUTE;</programlisting>
708    </para>
709  </appendix>
710<appendix id="getattributeappendix">
711    <title>Code for Retrieving Attribute Value Example</title>
712    <para>
713      <programlisting>&GETATTRIBUTE;</programlisting>
714    </para>
715  </appendix>
716  <appendix id="convertappendix">
717    <title>Code for Encoding Conversion Example</title>
718    <para>
719      <programlisting>&CONVERT;</programlisting>
720    </para>
721  </appendix>
722  <appendix>
723    <title>Acknowledgements</title>
724    <para>A number of people have generously offered feedback, code and
725    suggested improvements to this tutorial. In no particular order:
726      <simplelist type="inline">
727	<member>Daniel Veillard</member>
728	<member>Marcus Labib Iskander</member>
729	<member>Christopher R. Harris</member>
730	<member>Igor Zlatkovic</member>
731	<member>Niraj Tolia</member>
732	<member>David Turover</member>
733      </simplelist>
734    </para>
735  </appendix>
736  <index />
737</article>
738