entities.html revision 3bf65bea7d5dba63099cf1cf8a01cfa29dcf1766
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> 2<html> 3<head> 4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> 5<style type="text/css"><!-- 6TD {font-size: 14pt; font-family: Verdana,Arial,Helvetica} 7BODY {font-size: 14pt; font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} 8H1 {font-size: 20pt; font-family: Verdana,Arial,Helvetica} 9H2 {font-size: 18pt; font-family: Verdana,Arial,Helvetica} 10H3 {font-size: 16pt; font-family: Verdana,Arial,Helvetica} 11A:link, A:visited, A:active { text-decoration: underline } 12--></style> 13<title>Entities or no entities</title> 14</head> 15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> 16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> 17<td width="180"> 18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a> 19</td> 20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> 21<h1>The XML C library for Gnome</h1> 22<h2>Entities or no entities</h2> 23</td></tr></table></td></tr></table></td> 24</tr></table> 25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> 26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> 27<table width="100%" border="0" cellspacing="1" cellpadding="3"> 28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> 29<tr><td bgcolor="#fffacd"><ul> 30<li><a href="index.html">Home</a></li> 31<li><a href="intro.html">Introduction</a></li> 32<li><a href="FAQ.html">FAQ</a></li> 33<li><a href="docs.html">Documentation</a></li> 34<li><a href="bugs.html">Reporting bugs and getting help</a></li> 35<li><a href="help.html">How to help</a></li> 36<li><a href="downloads.html">Downloads</a></li> 37<li><a href="news.html">News</a></li> 38<li><a href="XMLinfo.html">XML</a></li> 39<li><a href="XSLT.html">XSLT</a></li> 40<li><a href="architecture.html">libxml architecture</a></li> 41<li><a href="tree.html">The tree output</a></li> 42<li><a href="interface.html">The SAX interface</a></li> 43<li><a href="xmldtd.html">Validation & DTDs</a></li> 44<li><a href="xmlmem.html">Memory Management</a></li> 45<li><a href="encoding.html">Encodings support</a></li> 46<li><a href="xmlio.html">I/O Interfaces</a></li> 47<li><a href="catalog.html">Catalog support</a></li> 48<li><a href="library.html">The parser interfaces</a></li> 49<li><a href="entities.html">Entities or no entities</a></li> 50<li><a href="namespaces.html">Namespaces</a></li> 51<li><a href="upgrade.html">Upgrading 1.x code</a></li> 52<li><a href="threads.html">Thread safety</a></li> 53<li><a href="DOM.html">DOM Principles</a></li> 54<li><a href="example.html">A real example</a></li> 55<li><a href="contribs.html">Contributions</a></li> 56<li> 57<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> 58</li> 59</ul></td></tr> 60</table> 61<table width="100%" border="0" cellspacing="1" cellpadding="3"> 62<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr> 63<tr><td bgcolor="#fffacd"><ul> 64<li><a href="APIconstructors.html">Constructors</a></li> 65<li><a href="APIfunctions.html">Functions/Types</a></li> 66<li><a href="APIfiles.html">Modules</a></li> 67<li><a href="APIsymbols.html">Symbols</a></li> 68</ul></td></tr> 69</table> 70<table width="100%" border="0" cellspacing="1" cellpadding="3"> 71<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> 72<tr><td bgcolor="#fffacd"><ul> 73<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> 74<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> 75<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li> 76<li><a href="ftp://xmlsoft.org/">FTP</a></li> 77<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> 78<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> 79<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li> 80</ul></td></tr> 81</table> 82</td></tr></table></td> 83<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> 84<p>Entities in principle are similar to simple C macros. An entity defines an 85abbreviation for a given string that you can reuse many times throughout the 86content of your document. Entities are especially useful when a given string 87may occur frequently within a document, or to confine the change needed to a 88document to a restricted area in the internal subset of the document (at the 89beginning). Example:</p> 90<pre>1 <?xml version="1.0"?> 912 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [ 923 <!ENTITY xml "Extensible Markup Language"> 934 ]> 945 <EXAMPLE> 956 &xml; 967 </EXAMPLE></pre> 97<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing 98its name with '&' and following it by ';' without any spaces added. There 99are 5 predefined entities in libxml allowing you to escape charaters with 100predefined meaning in some parts of the xml document content: 101<strong>&lt;</strong> for the character '<', <strong>&gt;</strong> 102for the character '>', <strong>&apos;</strong> for the character ''', 103<strong>&quot;</strong> for the character '"', and 104<strong>&amp;</strong> for the character '&'.</p> 105<p>One of the problems related to entities is that you may want the parser to 106substitute an entity's content so that you can see the replacement text in 107your application. Or you may prefer to keep entity references as such in the 108content to be able to save the document back without losing this usually 109precious information (if the user went through the pain of explicitly 110defining entities, he may have a a rather negative attitude if you blindly 111susbtitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> 112function allows you to check and change the behaviour, which is to not 113substitute entities by default.</p> 114<p>Here is the DOM tree built by libxml for the previous document in the 115default case:</p> 116<pre>/gnome/src/gnome-xml -> /xmllint --debug test/ent1 117DOCUMENT 118version=1.0 119 ELEMENT EXAMPLE 120 TEXT 121 content= 122 ENTITY_REF 123 INTERNAL_GENERAL_ENTITY xml 124 content=Extensible Markup Language 125 TEXT 126 content=</pre> 127<p>And here is the result when substituting entities:</p> 128<pre>/gnome/src/gnome-xml -> /tester --debug --noent test/ent1 129DOCUMENT 130version=1.0 131 ELEMENT EXAMPLE 132 TEXT 133 content= Extensible Markup Language</pre> 134<p>So, entities or no entities? Basically, it depends on your use case. I 135suggest that you keep the non-substituting default behaviour and avoid using 136entities in your XML document or data if you are not willing to handle the 137entity references elements in the DOM tree.</p> 138<p>Note that at save time libxml enforces the conversion of the predefined 139entities where necessary to prevent well-formedness problems, and will also 140transparently replace those with chars (i.e. it will not generate entity 141reference elements in the DOM tree or call the reference() SAX callback when 142finding them in the input).</p> 143<p> 144<span style="background-color: #FF0000">WARNING</span>: handling entities 145on top of the libxml SAX interface is difficult!!! If you plan to use 146non-predefined entities in your documents, then the learning cuvre to handle 147then using the SAX API may be long. If you plan to use complex documents, I 148strongly suggest you consider using the DOM interface instead and let libxml 149deal with the complexity rather than trying to do it yourself.</p> 150<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p> 151</td></tr></table></td></tr></table></td></tr></table></td> 152</tr></table></td></tr></table> 153</body> 154</html> 155