1ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru/* 2ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru********************************************************************** 3ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru* Copyright (C) 1999-2007, International Business Machines 4ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru* Corporation and others. All Rights Reserved. 5ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru********************************************************************** 6ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru* Date Name Description 7ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru* 11/17/99 aliu Creation. 8ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru********************************************************************** 9ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru*/ 10ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#ifndef RBT_H 11ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#define RBT_H 12ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 13ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#include "unicode/utypes.h" 14ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 15ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#if !UCONFIG_NO_TRANSLITERATION 16ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 17ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#include "unicode/translit.h" 18ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#include "unicode/utypes.h" 19ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#include "unicode/parseerr.h" 20ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#include "unicode/udata.h" 21ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 22ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#define U_ICUDATA_TRANSLIT U_ICUDATA_NAME U_TREE_SEPARATOR_STRING "translit" 23ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 24ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste QueruU_NAMESPACE_BEGIN 25ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 26ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruclass TransliterationRuleData; 27ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 28ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru/** 29ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <code>RuleBasedTransliterator</code> is a transliterator 30ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * that reads a set of rules in order to determine how to perform 31ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * translations. Rule sets are stored in resource bundles indexed by 32ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * name. Rules within a rule set are separated by semicolons (';'). 33ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * To include a literal semicolon, prefix it with a backslash ('\'). 34ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Whitespace, as defined by <code>Character.isWhitespace()</code>, 35ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * is ignored. If the first non-blank character on a line is '#', 36ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the entire line is ignored as a comment. </p> 37ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 38ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Each set of rules consists of two groups, one forward, and one 39ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * reverse. This is a convention that is not enforced; rules for one 40ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * direction may be omitted, with the result that translations in 41ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * that direction will not modify the source text. In addition, 42ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * bidirectional forward-reverse rules may be specified for 43ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * symmetrical transformations.</p> 44ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 45ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>Rule syntax</b> </p> 46ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 47ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Rule statements take one of the following forms: </p> 48ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 49ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dl> 50ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dt><code>$alefmadda=\u0622;</code></dt> 51ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dd><strong>Variable definition.</strong> The name on the 52ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * left is assigned the text on the right. In this example, 53ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * after this statement, instances of the left hand name, 54ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * "<code>$alefmadda</code>", will be replaced by 55ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the Unicode character U+0622. Variable names must begin 56ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * with a letter and consist only of letters, digits, and 57ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * underscores. Case is significant. Duplicate names cause 58ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * an exception to be thrown, that is, variables cannot be 59ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * redefined. The right hand side may contain well-formed 60ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * text of any length, including no text at all ("<code>$empty=;</code>"). 61ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * The right hand side may contain embedded <code>UnicodeSet</code> 62ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * patterns, for example, "<code>$softvowel=[eiyEIY]</code>".</dd> 63ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dd> </dd> 64ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dt><code>ai>$alefmadda;</code></dt> 65ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dd><strong>Forward translation rule.</strong> This rule 66ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * states that the string on the left will be changed to the 67ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * string on the right when performing forward 68ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * transliteration.</dd> 69ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dt> </dt> 70ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dt><code>ai<$alefmadda;</code></dt> 71ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dd><strong>Reverse translation rule.</strong> This rule 72ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * states that the string on the right will be changed to 73ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the string on the left when performing reverse 74ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * transliteration.</dd> 75ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </dl> 76ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 77ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dl> 78ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dt><code>ai<>$alefmadda;</code></dt> 79ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <dd><strong>Bidirectional translation rule.</strong> This 80ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * rule states that the string on the right will be changed 81ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * to the string on the left when performing forward 82ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * transliteration, and vice versa when performing reverse 83ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * transliteration.</dd> 84ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </dl> 85ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 86ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Translation rules consist of a <em>match pattern</em> and an <em>output 87ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * string</em>. The match pattern consists of literal characters, 88ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * optionally preceded by context, and optionally followed by 89ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * context. Context characters, like literal pattern characters, 90ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * must be matched in the text being transliterated. However, unlike 91ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * literal pattern characters, they are not replaced by the output 92ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * text. For example, the pattern "<code>abc{def}</code>" 93ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * indicates the characters "<code>def</code>" must be 94ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * preceded by "<code>abc</code>" for a successful match. 95ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * If there is a successful match, "<code>def</code>" will 96ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * be replaced, but not "<code>abc</code>". The final '<code>}</code>' 97ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * is optional, so "<code>abc{def</code>" is equivalent to 98ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * "<code>abc{def}</code>". Another example is "<code>{123}456</code>" 99ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * (or "<code>123}456</code>") in which the literal 100ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * pattern "<code>123</code>" must be followed by "<code>456</code>". 101ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </p> 102ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 103ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>The output string of a forward or reverse rule consists of 104ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * characters to replace the literal pattern characters. If the 105ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * output string contains the character '<code>|</code>', this is 106ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * taken to indicate the location of the <em>cursor</em> after 107ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * replacement. The cursor is the point in the text at which the 108ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * next replacement, if any, will be applied. The cursor is usually 109ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * placed within the replacement text; however, it can actually be 110ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * placed into the precending or following context by using the 111ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * special character '<code>@</code>'. Examples:</p> 112ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 113ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <blockquote> 114ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>a {foo} z > | @ bar; # foo -> bar, move cursor 115ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * before a<br> 116ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * {foo} xyz > bar @@|; # foo -> bar, cursor between 117ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * y and z</code></p> 118ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </blockquote> 119ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 120ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>UnicodeSet</b></p> 121ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 122ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>UnicodeSet</code> patterns may appear anywhere that 123ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * makes sense. They may appear in variable definitions. 124ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Contrariwise, <code>UnicodeSet</code> patterns may themselves 125ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * contain variable references, such as "<code>$a=[a-z];$not_a=[^$a]</code>", 126ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * or "<code>$range=a-z;$ll=[$range]</code>".</p> 127ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 128ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>UnicodeSet</code> patterns may also be embedded directly 129ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * into rule strings. Thus, the following two rules are equivalent:</p> 130ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 131ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <blockquote> 132ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>$vowel=[aeiou]; $vowel>'*'; # One way to do this<br> 133ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * [aeiou]>'*'; 134ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * # 135ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Another way</code></p> 136ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </blockquote> 137ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 138ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>See {@link UnicodeSet} for more documentation and examples.</p> 139ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 140ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>Segments</b></p> 141ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 142ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Segments of the input string can be matched and copied to the 143ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * output string. This makes certain sets of rules simpler and more 144ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * general, and makes reordering possible. For example:</p> 145ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 146ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <blockquote> 147ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>([a-z]) > $1 $1; 148ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * # 149ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * double lowercase letters<br> 150ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * ([:Lu:]) ([:Ll:]) > $2 $1; # reverse order of Lu-Ll pairs</code></p> 151ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </blockquote> 152ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 153ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>The segment of the input string to be copied is delimited by 154ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * "<code>(</code>" and "<code>)</code>". Up to 155ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * nine segments may be defined. Segments may not overlap. In the 156ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * output string, "<code>$1</code>" through "<code>$9</code>" 157ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * represent the input string segments, in left-to-right order of 158ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * definition.</p> 159ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 160ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>Anchors</b></p> 161ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 162ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Patterns can be anchored to the beginning or the end of the text. This is done with the 163ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * special characters '<code>^</code>' and '<code>$</code>'. For example:</p> 164ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 165ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <blockquote> 166ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>^ a > 'BEG_A'; # match 'a' at start of text<br> 167ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * a > 'A'; # match other instances 168ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * of 'a'<br> 169ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * z $ > 'END_Z'; # match 'z' at end of text<br> 170ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * z > 'Z'; # match other instances 171ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * of 'z'</code></p> 172ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </blockquote> 173ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 174ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>It is also possible to match the beginning or the end of the text using a <code>UnicodeSet</code>. 175ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * This is done by including a virtual anchor character '<code>$</code>' at the end of the 176ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * set pattern. Although this is usually the match chafacter for the end anchor, the set will 177ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * match either the beginning or the end of the text, depending on its placement. For 178ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * example:</p> 179ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 180ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <blockquote> 181ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><code>$x = [a-z$]; # match 'a' through 'z' OR anchor<br> 182ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * $x 1 > 2; # match '1' after a-z or at the start<br> 183ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 3 $x > 4; # match '3' before a-z or at the end</code></p> 184ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </blockquote> 185ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 186ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>Example</b> </p> 187ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 188ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>The following example rules illustrate many of the features of 189ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the rule language. </p> 190ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 191ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <table border="0" cellpadding="4"> 192ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 193ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Rule 1.</td> 194ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>abc{def}>x|y</code></td> 195ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 196ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 197ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Rule 2.</td> 198ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>xyz>r</code></td> 199ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 200ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 201ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Rule 3.</td> 202ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>yz>q</code></td> 203ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 204ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </table> 205ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 206ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Applying these rules to the string "<code>adefabcdefz</code>" 207ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * yields the following results: </p> 208ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 209ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <table border="0" cellpadding="4"> 210ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 211ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>|adefabcdefz</code></td> 212ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Initial state, no rules match. Advance 213ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * cursor.</td> 214ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 215ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 216ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>a|defabcdefz</code></td> 217ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Still no match. Rule 1 does not match 218ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * because the preceding context is not present.</td> 219ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 220ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 221ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>ad|efabcdefz</code></td> 222ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Still no match. Keep advancing until 223ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * there is a match...</td> 224ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 225ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 226ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>ade|fabcdefz</code></td> 227ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">...</td> 228ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 229ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 230ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adef|abcdefz</code></td> 231ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">...</td> 232ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 233ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 234ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adefa|bcdefz</code></td> 235ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">...</td> 236ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 237ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 238ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adefab|cdefz</code></td> 239ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">...</td> 240ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 241ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 242ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adefabc|defz</code></td> 243ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Rule 1 matches; replace "<code>def</code>" 244ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * with "<code>xy</code>" and back up the cursor 245ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * to before the '<code>y</code>'.</td> 246ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 247ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 248ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adefabcx|yz</code></td> 249ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">Although "<code>xyz</code>" is 250ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * present, rule 2 does not match because the cursor is 251ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * before the '<code>y</code>', not before the '<code>x</code>'. 252ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Rule 3 does match. Replace "<code>yz</code>" 253ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * with "<code>q</code>".</td> 254ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 255ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <tr> 256ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top" nowrap><code>adefabcxq|</code></td> 257ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <td valign="top">The cursor is at the end; 258ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * transliteration is complete.</td> 259ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </tr> 260ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </table> 261ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 262ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>The order of rules is significant. If multiple rules may match 263ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * at some point, the first matching rule is applied. </p> 264ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 265ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Forward and reverse rules may have an empty output string. 266ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Otherwise, an empty left or right hand side of any statement is a 267ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * syntax error. </p> 268ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 269ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>Single quotes are used to quote any character other than a 270ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * digit or letter. To specify a single quote itself, inside or 271ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * outside of quotes, use two single quotes in a row. For example, 272ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the rule "<code>'>'>o''clock</code>" changes the 273ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * string "<code>></code>" to the string "<code>o'clock</code>". 274ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </p> 275ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 276ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p><b>Notes</b> </p> 277ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 278ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <p>While a RuleBasedTransliterator is being built, it checks that 279ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the rules are added in proper order. For example, if the rule 280ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * "a>x" is followed by the rule "ab>y", 281ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * then the second rule will throw an exception. The reason is that 282ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * the second rule can never be triggered, since the first rule 283ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * always matches anything it matches. In other words, the first 284ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * rule <em>masks</em> the second rule. </p> 285ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 286ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @author Alan Liu 287ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 288ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 289ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruclass RuleBasedTransliterator : public Transliterator { 290ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruprivate: 291ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 292ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * The data object is immutable, so we can freely share it with 293ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * other instances of RBT, as long as we do NOT own this object. 294ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * TODO: data is no longer immutable. See bugs #1866, 2155 295ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 296ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru TransliterationRuleData* fData; 297ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 298ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 299ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * If true, we own the data object and must delete it. 300ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 301ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UBool isDataOwned; 302ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 303ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Querupublic: 304ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 305ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 306ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Constructs a new transliterator from the given rules. 307ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param rules rules, separated by ';' 308ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param direction either FORWARD or REVERSE. 309ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @exception IllegalArgumentException if rules are malformed. 310ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 311ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 312ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru RuleBasedTransliterator(const UnicodeString& id, 313ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const UnicodeString& rules, 314ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UTransDirection direction, 315ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UnicodeFilter* adoptedFilter, 316ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UParseError& parseError, 317ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status); 318ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 319ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 320ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Constructs a new transliterator from the given rules. 321ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param rules rules, separated by ';' 322ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param direction either FORWARD or REVERSE. 323ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @exception IllegalArgumentException if rules are malformed. 324ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 325ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 326ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /*RuleBasedTransliterator(const UnicodeString& id, 327ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const UnicodeString& rules, 328ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UTransDirection direction, 329ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UnicodeFilter* adoptedFilter, 330ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status);*/ 331ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 332ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 333ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Covenience constructor with no filter. 334ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 335ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 336ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /*RuleBasedTransliterator(const UnicodeString& id, 337ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const UnicodeString& rules, 338ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UTransDirection direction, 339ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status);*/ 340ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 341ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 342ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Covenience constructor with no filter and FORWARD direction. 343ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 344ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 345ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /*RuleBasedTransliterator(const UnicodeString& id, 346ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const UnicodeString& rules, 347ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status);*/ 348ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 349ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 350ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Covenience constructor with FORWARD direction. 351ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 352ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 353ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /*RuleBasedTransliterator(const UnicodeString& id, 354ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const UnicodeString& rules, 355ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UnicodeFilter* adoptedFilter, 356ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status);*/ 357ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruprivate: 358ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 359ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru friend class TransliteratorRegistry; // to access TransliterationRuleData convenience ctor 360ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 361ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Covenience constructor. 362ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param id the id for the transliterator. 363ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param theData the rule data for the transliterator. 364ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param adoptedFilter the filter for the transliterator 365ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 366ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru RuleBasedTransliterator(const UnicodeString& id, 367ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru const TransliterationRuleData* theData, 368ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UnicodeFilter* adoptedFilter = 0); 369ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 370ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 371ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru friend class Transliterator; // to access following ct 372ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 373ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 374ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Internal constructor. 375ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param id the id for the transliterator. 376ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param theData the rule data for the transliterator. 377ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param isDataAdopted determine who will own the 'data' object. True, the caller should not delete 'data'. 378ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 379ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru RuleBasedTransliterator(const UnicodeString& id, 380ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru TransliterationRuleData* data, 381ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UBool isDataAdopted); 382ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 383ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Querupublic: 384ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 385ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 386ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Copy constructor. 387ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 388ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 389ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru RuleBasedTransliterator(const RuleBasedTransliterator&); 390ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 391ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual ~RuleBasedTransliterator(); 392ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 393ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 394ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Implement Transliterator API. 395ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 396ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 397ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual Transliterator* clone(void) const; 398ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 399ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruprotected: 400ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 401ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Implements {@link Transliterator#handleTransliterate}. 402ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 403ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 404ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual void handleTransliterate(Replaceable& text, UTransPosition& offsets, 405ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UBool isIncremental) const; 406ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 407ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Querupublic: 408ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 409ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Return a representation of this transliterator as source rules. 410ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * These rules will produce an equivalent transliterator if used 411ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * to construct a new transliterator. 412ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param result the string to receive the rules. Previous 413ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * contents will be deleted. 414ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @param escapeUnprintable if TRUE then convert unprintable 415ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * character to their hex escape representations, \uxxxx or 416ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * \Uxxxxxxxx. Unprintable characters are those other than 417ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * U+000A, U+0020..U+007E. 418ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 419ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 420ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual UnicodeString& toRules(UnicodeString& result, 421ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UBool escapeUnprintable) const; 422ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 423ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruprotected: 424ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 425ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Implement Transliterator framework 426ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 427ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual void handleGetSourceSet(UnicodeSet& result) const; 428ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 429ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Querupublic: 430ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 431ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Override Transliterator framework 432ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 433ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual UnicodeSet& getTargetSet(UnicodeSet& result) const; 434ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 435ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 436ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Return the class ID for this class. This is useful only for 437ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * comparing to a return value from getDynamicClassID(). For example: 438ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * <pre> 439ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * . Base* polymorphic_pointer = createPolymorphicObject(); 440ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * . if (polymorphic_pointer->getDynamicClassID() == 441ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * . Derived::getStaticClassID()) ... 442ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * </pre> 443ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @return The class ID for all objects of this class. 444ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @internal Use transliterator factory methods instead since this class will be removed in that release. 445ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 446ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru U_I18N_API static UClassID U_EXPORT2 getStaticClassID(void); 447ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 448ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru /** 449ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * Returns a unique class ID <b>polymorphically</b>. This method 450ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * is to implement a simple version of RTTI, since not all C++ 451ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * compilers support genuine RTTI. Polymorphic operator==() and 452ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * clone() methods call this method. 453ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * 454ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * @return The class ID for this object. All objects of a given 455ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * class have the same class ID. Objects of other classes have 456ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru * different class IDs. 457ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru */ 458ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru virtual UClassID getDynamicClassID(void) const; 459ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 460ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queruprivate: 461ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 462ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru void _construct(const UnicodeString& rules, 463ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UTransDirection direction, 464ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UParseError& parseError, 465ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru UErrorCode& status); 466ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru}; 467ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 468ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 469ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste QueruU_NAMESPACE_END 470ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 471ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#endif /* #if !UCONFIG_NO_TRANSLITERATION */ 472ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru 473ac04d0bbe12b3ef54518635711412f178cb4d16Jean-Baptiste Queru#endif 474