1f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 2f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ******************************************************************** 3f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * COPYRIGHT: 4f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Copyright (c) 1996-2010, International Business Machines Corporation and 5f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * others. All Rights Reserved. 6f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ******************************************************************** 7f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 8f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 9f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#ifndef NORMLZR_H 10f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define NORMLZR_H 11f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 12f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/utypes.h" 13f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 14f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 15f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * \file 16f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * \brief C++ API: Unicode Normalization 17f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 18f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 19f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#if !UCONFIG_NO_NORMALIZATION 20f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 21f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/chariter.h" 22f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/normalizer2.h" 23f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/unistr.h" 24f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/unorm.h" 25f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/uobject.h" 26f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 27f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_NAMESPACE_BEGIN 28f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 29f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The Normalizer class supports the standard normalization forms described in 30f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <a href="http://www.unicode.org/unicode/reports/tr15/" target="unicode"> 31f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Unicode Standard Annex #15: Unicode Normalization Forms</a>. 32f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 33f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Note: This API has been replaced by the Normalizer2 class and is only available 34f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * for backward compatibility. This class simply delegates to the Normalizer2 class. 35f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * There is one exception: The new API does not provide a replacement for Normalizer::compare(). 36f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 37f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The Normalizer class consists of two parts: 38f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - static functions that normalize strings or test if strings are normalized 39f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - a Normalizer object is an iterator that takes any kind of text and 40f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * provides iteration over its normalized form 41f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 42f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The Normalizer class is not suitable for subclassing. 43f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 44f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For basic information about normalization forms and details about the C API 45f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * please see the documentation in unorm.h. 46f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 47f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The iterator API with the Normalizer constructors and the non-static functions 48f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * use a CharacterIterator as input. It is possible to pass a string which 49f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * is then internally wrapped in a CharacterIterator. 50f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The input text is not normalized all at once, but incrementally where needed 51f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (providing efficient random access). 52f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This allows to pass in a large text but spend only a small amount of time 53f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * normalizing a small part of that text. 54f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * However, if the entire text is normalized, then the iterator will be 55f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * slower than normalizing the entire text at once and iterating over the result. 56f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * A possible use of the Normalizer iterator is also to report an index into the 57f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * original text that is close to where the normalized characters come from. 58f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 59f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <em>Important:</em> The iterator API was cleaned up significantly for ICU 2.0. 60f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The earlier implementation reported the getIndex() inconsistently, 61f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * and previous() could not be used after setIndex(), next(), first(), and current(). 62f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 63f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Normalizer allows to start normalizing from anywhere in the input text by 64f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * calling setIndexOnly(), first(), or last(). 65f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Without calling any of these, the iterator will start at the beginning of the text. 66f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 67f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * At any time, next() returns the next normalized code point (UChar32), 68f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * with post-increment semantics (like CharacterIterator::next32PostInc()). 69f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * previous() returns the previous normalized code point (UChar32), 70f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * with pre-decrement semantics (like CharacterIterator::previous32()). 71f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 72f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * current() returns the current code point 73f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (respectively the one at the newly set index) without moving 74f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the getIndex(). Note that if the text at the current position 75f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * needs to be normalized, then these functions will do that. 76f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (This is why current() is not const.) 77f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * It is more efficient to call setIndexOnly() instead, which does not 78f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * normalize. 79f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 80f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * getIndex() always refers to the position in the input text where the normalized 81f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code points are returned from. It does not always change with each returned 82f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code point. 83f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The code point that is returned from any of the functions 84f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * corresponds to text at or after getIndex(), according to the 85f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * function's iteration semantics (post-increment or pre-decrement). 86f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 87f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * next() returns a code point from at or after the getIndex() 88f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from before the next() call. After the next() call, the getIndex() 89f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * might have moved to where the next code point will be returned from 90f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (from a next() or current() call). 91f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is semantically equivalent to array access with array[index++] 92f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (post-increment semantics). 93f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 94f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * previous() returns a code point from at or after the getIndex() 95f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from after the previous() call. 96f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is semantically equivalent to array access with array[--index] 97f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (pre-decrement semantics). 98f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 99f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Internally, the Normalizer iterator normalizes a small piece of text 100f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * starting at the getIndex() and ending at a following "safe" index. 101f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The normalized results is stored in an internal string buffer, and 102f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the code points are iterated from there. 103f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * With multiple iteration calls, this is repeated until the next piece 104f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * of text needs to be normalized, and the getIndex() needs to be moved. 105f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 106f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The following "safe" index, the internal buffer, and the secondary 107f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * iteration index into that buffer are not exposed on the API. 108f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This also means that it is currently not practical to return to 109f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * a particular, arbitrary position in the text because one would need to 110f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * know, and be able to set, in addition to the getIndex(), at least also the 111f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * current index into the internal buffer. 112f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * It is currently only possible to observe when getIndex() changes 113f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (with careful consideration of the iteration semantics), 114f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * at which time the internal index will be 0. 115f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For example, if getIndex() is different after next() than before it, 116f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * then the internal index is 0 and one can return to this getIndex() 117f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * later with setIndexOnly(). 118f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 119f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Note: While the setIndex() and getIndex() refer to indices in the 120f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * underlying Unicode input text, the next() and previous() methods 121f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * iterate through characters in the normalized output. 122f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This means that there is not necessarily a one-to-one correspondence 123f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * between characters returned by next() and previous() and the indices 124f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * passed to and returned from setIndex() and getIndex(). 125f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * It is for this reason that Normalizer does not implement the CharacterIterator interface. 126f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 127f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @author Laura Werner, Mark Davis, Markus Scherer 128f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 129f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 130f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)class U_COMMON_API Normalizer : public UObject { 131f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)public: 132f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 133f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If DONE is returned from an iteration function that returns a code point, 134f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * then there are no more normalization results available. 135f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 136f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 137f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) enum { 138f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) DONE=0xffff 139f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) }; 140f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 141f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Constructors 142f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 143f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 144f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Creates a new <code>Normalizer</code> object for iterating over the 145f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * normalized form of a given string. 146f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 147f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param str The string to be normalized. The normalization 148f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * will start at the beginning of the string. 149f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 150f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode The normalization mode. 151f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 152f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 153f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer(const UnicodeString& str, UNormalizationMode mode); 154f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 155f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 156f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Creates a new <code>Normalizer</code> object for iterating over the 157f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * normalized form of a given string. 158f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 159f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param str The string to be normalized. The normalization 160f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * will start at the beginning of the string. 161f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 162f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length Length of the string, or -1 if NUL-terminated. 163f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode The normalization mode. 164f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 165f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 166f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer(const UChar* str, int32_t length, UNormalizationMode mode); 167f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 168f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 169f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Creates a new <code>Normalizer</code> object for iterating over the 170f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * normalized form of the given text. 171f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 172f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param iter The input text to be normalized. The normalization 173f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * will start at the beginning of the string. 174f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 175f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode The normalization mode. 176f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 177f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 178f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer(const CharacterIterator& iter, UNormalizationMode mode); 179f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 180f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 181f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Copy constructor. 182f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param copy The object to be copied. 183f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 184f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 185f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer(const Normalizer& copy); 186f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 187f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 188f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Destructor 189f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 190f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 191f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) virtual ~Normalizer(); 192f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 193f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 194f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 195f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Static utility methods 196f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 197f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 198f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 199f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Normalizes a <code>UnicodeString</code> according to the specified normalization mode. 200f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is a wrapper for unorm_normalize(), using UnicodeString's. 201f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 202f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The <code>options</code> parameter specifies which optional 203f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <code>Normalizer</code> features are to be enabled for this operation. 204f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 205f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param source the input string to be normalized. 206f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode the normalization mode 207f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options the optional features to be enabled (0 for no options) 208f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param result The normalized string (on output). 209f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status The error code. 210f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 211f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 212f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static void U_EXPORT2 normalize(const UnicodeString& source, 213f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode mode, int32_t options, 214f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UnicodeString& result, 215f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 216f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 217f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 218f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Compose a <code>UnicodeString</code>. 219f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is equivalent to normalize() with mode UNORM_NFC or UNORM_NFKC. 220f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is a wrapper for unorm_normalize(), using UnicodeString's. 221f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 222f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The <code>options</code> parameter specifies which optional 223f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <code>Normalizer</code> features are to be enabled for this operation. 224f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 225f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param source the string to be composed. 226f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param compat Perform compatibility decomposition before composition. 227f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If this argument is <code>FALSE</code>, only canonical 228f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * decomposition will be performed. 229f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options the optional features to be enabled (0 for no options) 230f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param result The composed string (on output). 231f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status The error code. 232f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 233f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 234f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static void U_EXPORT2 compose(const UnicodeString& source, 235f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool compat, int32_t options, 236f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UnicodeString& result, 237f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 238f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 239f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 240f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Static method to decompose a <code>UnicodeString</code>. 241f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is equivalent to normalize() with mode UNORM_NFD or UNORM_NFKD. 242f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is a wrapper for unorm_normalize(), using UnicodeString's. 243f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 244f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The <code>options</code> parameter specifies which optional 245f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <code>Normalizer</code> features are to be enabled for this operation. 246f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 247f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param source the string to be decomposed. 248f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param compat Perform compatibility decomposition. 249f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If this argument is <code>FALSE</code>, only canonical 250f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * decomposition will be performed. 251f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options the optional features to be enabled (0 for no options) 252f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param result The decomposed string (on output). 253f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status The error code. 254f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 255f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 256f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static void U_EXPORT2 decompose(const UnicodeString& source, 257f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool compat, int32_t options, 258f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UnicodeString& result, 259f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 260f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 261f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 262f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Performing quick check on a string, to quickly determine if the string is 263f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * in a particular normalization format. 264f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is a wrapper for unorm_quickCheck(), using a UnicodeString. 265f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 266f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Three types of result can be returned UNORM_YES, UNORM_NO or 267f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * UNORM_MAYBE. Result UNORM_YES indicates that the argument 268f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * string is in the desired normalized format, UNORM_NO determines that 269f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * argument string is not in the desired normalized format. A 270f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * UNORM_MAYBE result indicates that a more thorough check is required, 271f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the user may have to put the string in its normalized form and compare the 272f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * results. 273f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param source string for determining if it is in a normalized format 274f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode normalization format 275f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status A reference to a UErrorCode to receive any errors 276f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return UNORM_YES, UNORM_NO or UNORM_MAYBE 277f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 278f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see isNormalized 279f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 280f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 281f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static inline UNormalizationCheckResult 282f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) quickCheck(const UnicodeString &source, UNormalizationMode mode, UErrorCode &status); 283f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 284f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 285f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Performing quick check on a string; same as the other version of quickCheck 286f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * but takes an extra options parameter like most normalization functions. 287f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 288f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param source string for determining if it is in a normalized format 289f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode normalization format 290f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options the optional features to be enabled (0 for no options) 291f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status A reference to a UErrorCode to receive any errors 292f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return UNORM_YES, UNORM_NO or UNORM_MAYBE 293f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 294f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see isNormalized 295f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.6 296f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 297f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static UNormalizationCheckResult 298f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) quickCheck(const UnicodeString &source, UNormalizationMode mode, int32_t options, UErrorCode &status); 299f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 300f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 301f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test if a string is in a given normalization form. 302f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is semantically equivalent to source.equals(normalize(source, mode)) . 303f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 304f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Unlike unorm_quickCheck(), this function returns a definitive result, 305f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * never a "maybe". 306f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For NFD, NFKD, and FCD, both functions work exactly the same. 307f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For NFC and NFKC where quickCheck may return "maybe", this function will 308f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * perform further tests to arrive at a TRUE/FALSE result. 309f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 310f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param src String that is to be tested if it is in a normalization format. 311f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode Which normalization form to test for. 312f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param errorCode ICU error code in/out parameter. 313f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Must fulfill U_SUCCESS before the function call. 314f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return Boolean value indicating whether the source string is in the 315f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * "mode" normalization form. 316f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 317f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see quickCheck 318f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.2 319f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 320f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static inline UBool 321f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) isNormalized(const UnicodeString &src, UNormalizationMode mode, UErrorCode &errorCode); 322f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 323f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 324f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test if a string is in a given normalization form; same as the other version of isNormalized 325f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * but takes an extra options parameter like most normalization functions. 326f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 327f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param src String that is to be tested if it is in a normalization format. 328f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode Which normalization form to test for. 329f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options the optional features to be enabled (0 for no options) 330f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param errorCode ICU error code in/out parameter. 331f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Must fulfill U_SUCCESS before the function call. 332f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return Boolean value indicating whether the source string is in the 333f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * "mode" normalization form. 334f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 335f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see quickCheck 336f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.6 337f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 338f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static UBool 339f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) isNormalized(const UnicodeString &src, UNormalizationMode mode, int32_t options, UErrorCode &errorCode); 340f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 341f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 342f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Concatenate normalized strings, making sure that the result is normalized as well. 343f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 344f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If both the left and the right strings are in 345f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the normalization form according to "mode/options", 346f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * then the result will be 347f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 348f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * \code 349f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * dest=normalize(left+right, mode, options) 350f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * \endcode 351f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 352f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For details see unorm_concatenate in unorm.h. 353f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 354f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param left Left source string. 355f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param right Right source string. 356f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param result The output string. 357f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param mode The normalization mode. 358f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options A bit set of normalization options. 359f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param errorCode ICU error code in/out parameter. 360f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Must fulfill U_SUCCESS before the function call. 361f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return result 362f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 363f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unorm_concatenate 364f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see normalize 365f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unorm_next 366f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unorm_previous 367f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 368f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.1 369f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 370f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static UnicodeString & 371f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) U_EXPORT2 concatenate(UnicodeString &left, UnicodeString &right, 372f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UnicodeString &result, 373f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode mode, int32_t options, 374f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &errorCode); 375f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 376f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 377f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Compare two strings for canonical equivalence. 378f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Further options include case-insensitive comparison and 379f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code point order (as opposed to code unit order). 380f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 381f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Canonical equivalence between two strings is defined as their normalized 382f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * forms (NFD or NFC) being identical. 383f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This function compares strings incrementally instead of normalizing 384f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (and optionally case-folding) both strings entirely, 385f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * improving performance significantly. 386f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 387f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Bulk normalization is only necessary if the strings do not fulfill the FCD 388f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * conditions. Only in this case, and only if the strings are relatively long, 389f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * is memory allocated temporarily. 390f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For FCD strings and short non-FCD strings there is no memory allocation. 391f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 392f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Semantically, this is equivalent to 393f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * strcmp[CodePointOrder](NFD(foldCase(s1)), NFD(foldCase(s2))) 394f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * where code point order and foldCase are all optional. 395f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 396f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * UAX 21 2.5 Caseless Matching specifies that for a canonical caseless match 397f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the case folding must be performed first, then the normalization. 398f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 399f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s1 First source string. 400f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s2 Second source string. 401f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 402f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param options A bit set of options: 403f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - U_FOLD_CASE_DEFAULT or 0 is used for default options: 404f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Case-sensitive comparison in code unit order, and the input strings 405f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * are quick-checked for FCD. 406f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 407f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - UNORM_INPUT_IS_FCD 408f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set if the caller knows that both s1 and s2 fulfill the FCD conditions. 409f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If not set, the function will quickCheck for FCD 410f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * and normalize if necessary. 411f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 412f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - U_COMPARE_CODE_POINT_ORDER 413f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set to choose code point order instead of code unit order 414f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (see u_strCompare for details). 415f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 416f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - U_COMPARE_IGNORE_CASE 417f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set to compare strings case-insensitively using case folding, 418f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * instead of case-sensitively. 419f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If set, then the following case folding options are used. 420f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 421f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - Options as used with case-insensitive comparisons, currently: 422f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 423f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - U_FOLD_CASE_EXCLUDE_SPECIAL_I 424f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (see u_strCaseCompare for details) 425f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 426f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * - regular normalization options shifted left by UNORM_COMPARE_NORM_OPTIONS_SHIFT 427f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 428f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param errorCode ICU error code in/out parameter. 429f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Must fulfill U_SUCCESS before the function call. 430f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return <0 or 0 or >0 as usual for string comparisons 431f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 432f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unorm_compare 433f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see normalize 434f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see UNORM_FCD 435f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see u_strCompare 436f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see u_strCaseCompare 437f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 438f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.2 439f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 440f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static inline int32_t 441f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) compare(const UnicodeString &s1, const UnicodeString &s2, 442f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) uint32_t options, 443f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &errorCode); 444f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 445f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 446f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Iteration API 447f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 448f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 449f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 450f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the current character in the normalized text. 451f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * current() may need to normalize some text at getIndex(). 452f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The getIndex() is not changed. 453f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 454f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the current normalized code point 455f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 456f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 457f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar32 current(void); 458f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 459f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 460f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the first character in the normalized text. 461f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is equivalent to setIndexOnly(startIndex()) followed by next(). 462f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (Post-increment semantics.) 463f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 464f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the first normalized code point 465f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 466f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 467f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar32 first(void); 468f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 469f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 470f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the last character in the normalized text. 471f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is equivalent to setIndexOnly(endIndex()) followed by previous(). 472f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (Pre-decrement semantics.) 473f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 474f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the last normalized code point 475f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 476f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 477f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar32 last(void); 478f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 479f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 480f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the next character in the normalized text. 481f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (Post-increment semantics.) 482f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If the end of the text has already been reached, DONE is returned. 483f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The DONE value could be confused with a U+FFFF non-character code point 484f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * in the text. If this is possible, you can test getIndex()<endIndex() 485f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * before calling next(), or (getIndex()<endIndex() || last()!=DONE) 486f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * after calling next(). (Calling last() will change the iterator state!) 487f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 488f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The C API unorm_next() is more efficient and does not have this ambiguity. 489f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 490f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the next normalized code point 491f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 492f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 493f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar32 next(void); 494f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 495f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 496f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the previous character in the normalized text and decrement. 497f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (Pre-decrement semantics.) 498f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If the beginning of the text has already been reached, DONE is returned. 499f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The DONE value could be confused with a U+FFFF non-character code point 500f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * in the text. If this is possible, you can test 501f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (getIndex()>startIndex() || first()!=DONE). (Calling first() will change 502f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the iterator state!) 503f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 504f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The C API unorm_previous() is more efficient and does not have this ambiguity. 505f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 506f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the previous normalized code point 507f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 508f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 509f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar32 previous(void); 510f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 511f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 512f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set the iteration position in the input text that is being normalized, 513f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * without any immediate normalization. 514f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * After setIndexOnly(), getIndex() will return the same index that is 515f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * specified here. 516f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 517f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param index the desired index in the input text. 518f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 519f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 520f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setIndexOnly(int32_t index); 521f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 522f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 523f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Reset the index to the beginning of the text. 524f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is equivalent to setIndexOnly(startIndex)). 525f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 526f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 527f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void reset(void); 528f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 529f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 530f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Retrieve the current iteration position in the input text that is 531f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * being normalized. 532f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 533f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * A following call to next() will return a normalized code point from 534f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the input text at or after this index. 535f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 536f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * After a call to previous(), getIndex() will point at or before the 537f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * position in the input text where the normalized code point 538f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * was returned from with previous(). 539f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 540f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the current index in the input text 541f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 542f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 543f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t getIndex(void) const; 544f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 545f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 546f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Retrieve the index of the start of the input text. This is the begin index 547f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * of the <code>CharacterIterator</code> or the start (i.e. index 0) of the string 548f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * over which this <code>Normalizer</code> is iterating. 549f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 550f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the smallest index in the input text where the Normalizer operates 551f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 552f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 553f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t startIndex(void) const; 554f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 555f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 556f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Retrieve the index of the end of the input text. This is the end index 557f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * of the <code>CharacterIterator</code> or the length of the string 558f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * over which this <code>Normalizer</code> is iterating. 559f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This end index is exclusive, i.e., the Normalizer operates only on characters 560f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * before this index. 561f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 562f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the first index in the input text where the Normalizer does not operate 563f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 564f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 565f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t endIndex(void) const; 566f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 567f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 568f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Returns TRUE when both iterators refer to the same character in the same 569f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * input text. 570f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 571f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param that a Normalizer object to compare this one to 572f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return comparison result 573f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 574f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 575f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool operator==(const Normalizer& that) const; 576f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 577f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 578f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Returns FALSE when both iterators refer to the same character in the same 579f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * input text. 580f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 581f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param that a Normalizer object to compare this one to 582f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return comparison result 583f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 584f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 585f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) inline UBool operator!=(const Normalizer& that) const; 586f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 587f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 588f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Returns a pointer to a new Normalizer that is a clone of this one. 589f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The caller is responsible for deleting the new clone. 590f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return a pointer to a new Normalizer 591f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 592f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 593f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer* clone(void) const; 594f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 595f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 596f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Generates a hash code for this iterator. 597f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 598f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the hash code 599f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 600f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 601f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t hashCode(void) const; 602f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 603f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 604f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Property access methods 605f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 606f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 607f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 608f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set the normalization mode for this object. 609f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 610f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <b>Note:</b>If the normalization mode is changed while iterating 611f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * over a string, calls to {@link #next() } and {@link #previous() } may 612f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * return previously buffers characters in the old normalization mode 613f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * until the iteration is able to re-sync at the next base character. 614f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * It is safest to call {@link #setIndexOnly }, {@link #reset() }, 615f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * {@link #setText }, {@link #first() }, 616f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * {@link #last() }, etc. after calling <code>setMode</code>. 617f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 618f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param newMode the new mode for this <code>Normalizer</code>. 619f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see #getUMode 620f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 621f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 622f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setMode(UNormalizationMode newMode); 623f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 624f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 625f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return the normalization mode for this object. 626f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 627f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This is an unusual name because there used to be a getMode() that 628f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * returned a different type. 629f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 630f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the mode for this <code>Normalizer</code> 631f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see #setMode 632f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 633f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 634f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode getUMode(void) const; 635f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 636f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 637f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set options that affect this <code>Normalizer</code>'s operation. 638f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Options do not change the basic composition or decomposition operation 639f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * that is being performed, but they control whether 640f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * certain optional portions of the operation are done. 641f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Currently the only available option is obsolete. 642f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 643f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * It is possible to specify multiple options that are all turned on or off. 644f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 645f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param option the option(s) whose value is/are to be set. 646f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param value the new setting for the option. Use <code>TRUE</code> to 647f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * turn the option(s) on and <code>FALSE</code> to turn it/them off. 648f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 649f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see #getOption 650f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 651f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 652f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setOption(int32_t option, 653f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool value); 654f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 655f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 656f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Determine whether an option is turned on or off. 657f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * If multiple options are specified, then the result is TRUE if any 658f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * of them are set. 659f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <p> 660f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param option the option(s) that are to be checked 661f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return TRUE if any of the option(s) are set 662f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see #setOption 663f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 664f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 665f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool getOption(int32_t option) const; 666f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 667f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 668f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set the input text over which this <code>Normalizer</code> will iterate. 669f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The iteration position is set to the beginning. 670f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 671f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param newText a string that replaces the current input text 672f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status a UErrorCode 673f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 674f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 675f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setText(const UnicodeString& newText, 676f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 677f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 678f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 679f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set the input text over which this <code>Normalizer</code> will iterate. 680f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The iteration position is set to the beginning. 681f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 682f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param newText a CharacterIterator object that replaces the current input text 683f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status a UErrorCode 684f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 685f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 686f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setText(const CharacterIterator& newText, 687f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 688f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 689f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 690f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Set the input text over which this <code>Normalizer</code> will iterate. 691f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The iteration position is set to the beginning. 692f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 693f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param newText a string that replaces the current input text 694f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length the length of the string, or -1 if NUL-terminated 695f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param status a UErrorCode 696f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 697f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 698f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void setText(const UChar* newText, 699f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t length, 700f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status); 701f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 702f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Copies the input text into the UnicodeString argument. 703f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 704f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param result Receives a copy of the text under iteration. 705f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.0 706f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 707f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void getText(UnicodeString& result); 708f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 709f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 710f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * ICU "poor man's RTTI", returns a UClassID for this class. 711f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @returns a UClassID for this class. 712f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.2 713f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 714f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) static UClassID U_EXPORT2 getStaticClassID(); 715f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 716f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /** 717f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * ICU "poor man's RTTI", returns a UClassID for the actual class. 718f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return a UClassID for the actual class. 719f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @stable ICU 2.2 720f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 721f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) virtual UClassID getDynamicClassID() const; 722f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 723f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)private: 724f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 725f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Private functions 726f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 727f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 728f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer(); // default constructor not implemented 729f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Normalizer &operator=(const Normalizer &that); // assignment operator not implemented 730f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 731f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Private utility methods for iteration 732f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // For documentation, see the source code 733f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool nextNormalize(); 734f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UBool previousNormalize(); 735f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 736f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void init(); 737f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) void clearBuffer(void); 738f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 739f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 740f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // Private data 741f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) //------------------------------------------------------------------------- 742f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 743f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) FilteredNormalizer2*fFilteredNorm2; // owned if not NULL 744f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) const Normalizer2 *fNorm2; // not owned; may be equal to fFilteredNorm2 745f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode fUMode; 746f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t fOptions; 747f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 748f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // The input text and our position in it 749f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) CharacterIterator *text; 750f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 751f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // The normalization buffer is the result of normalization 752f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // of the source in [currentIndex..nextIndex[ . 753f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t currentIndex, nextIndex; 754f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 755f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // A buffer for holding intermediate results 756f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UnicodeString buffer; 757f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t bufferPos; 758f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}; 759f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 760f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)//------------------------------------------------------------------------- 761f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)// Inline implementations 762f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)//------------------------------------------------------------------------- 763f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 764f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)inline UBool 765f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)Normalizer::operator!= (const Normalizer& other) const 766f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles){ return ! operator==(other); } 767f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 768f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)inline UNormalizationCheckResult 769f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)Normalizer::quickCheck(const UnicodeString& source, 770f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode mode, 771f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status) { 772f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return quickCheck(source, mode, 0, status); 773f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 774f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 775f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)inline UBool 776f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)Normalizer::isNormalized(const UnicodeString& source, 777f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UNormalizationMode mode, 778f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &status) { 779f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return isNormalized(source, mode, 0, status); 780f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 781f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 782f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)inline int32_t 783f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)Normalizer::compare(const UnicodeString &s1, const UnicodeString &s2, 784f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) uint32_t options, 785f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode &errorCode) { 786f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) // all argument checking is done in unorm_compare 787f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return unorm_compare(s1.getBuffer(), s1.length(), 788f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) s2.getBuffer(), s2.length(), 789f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) options, 790f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) &errorCode); 791f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 792f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 793f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_NAMESPACE_END 794f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 795f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#endif /* #if !UCONFIG_NO_NORMALIZATION */ 796f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 797f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#endif // NORMLZR_H 798