1/*
2 *******************************************************************************
3 * Copyright (C) 1996-2004, International Business Machines Corporation and    *
4 * others. All Rights Reserved.                                                *
5 *******************************************************************************
6 */
7package com.ibm.icu.text;
8
9/**
10 * Interface that defines an API for forward-only iteration
11 * on text objects.
12 * This is a minimal interface for iteration without random access
13 * or backwards iteration. It is especially useful for wrapping
14 * streams with converters into an object for collation or
15 * normalization.
16 *
17 * <p>Characters can be accessed in two ways: as code units or as
18 * code points.
19 * Unicode code points are 21-bit integers and are the scalar values
20 * of Unicode characters. ICU uses the type <code>int</code> for them.
21 * Unicode code units are the storage units of a given
22 * Unicode/UCS Transformation Format (a character encoding scheme).
23 * With UTF-16, all code points can be represented with either one
24 * or two code units ("surrogates").
25 * String storage is typically based on code units, while properties
26 * of characters are typically determined using code point values.
27 * Some processes may be designed to work with sequences of code units,
28 * or it may be known that all characters that are important to an
29 * algorithm can be represented with single code units.
30 * Other processes will need to use the code point access functions.</p>
31 *
32 * <p>ForwardCharacterIterator provides next() to access
33 * a code unit and advance an internal position into the text object,
34 * similar to a <code>return text[position++]</code>.<br>
35 * It provides nextCodePoint() to access a code point and advance an internal
36 * position.</p>
37 *
38 * <p>nextCodePoint() assumes that the current position is that of
39 * the beginning of a code point, i.e., of its first code unit.
40 * After nextCodePoint(), this will be true again.
41 * In general, access to code units and code points in the same
42 * iteration loop should not be mixed. In UTF-16, if the current position
43 * is on a second code unit (Low Surrogate), then only that code unit
44 * is returned even by nextCodePoint().</p>
45 *
46 * Usage:
47 * <code>
48 *  public void function1(UForwardCharacterIterator it) {
49 *     int c;
50 *     while((c=it.next())!=UForwardCharacterIterator.DONE) {
51 *         // use c
52 *      }
53 *  }
54 * </code>
55 * </p>
56 * @stable ICU 2.4
57 *
58 */
59
60public interface UForwardCharacterIterator {
61
62    /**
63     * Indicator that we have reached the ends of the UTF16 text.
64     * @stable ICU 2.4
65     */
66    public static final int DONE = -1;
67    /**
68     * Returns the UTF16 code unit at index, and increments to the next
69     * code unit (post-increment semantics).  If index is out of
70     * range, DONE is returned, and the iterator is reset to the limit
71     * of the text.
72     * @return the next UTF16 code unit, or DONE if the index is at the limit
73     *         of the text.
74     * @stable ICU 2.4
75     */
76    public int next();
77
78    /**
79     * Returns the code point at index, and increments to the next code
80     * point (post-increment semantics).  If index does not point to a
81     * valid surrogate pair, the behavior is the same as
82     * <code>next()<code>.  Otherwise the iterator is incremented past
83     * the surrogate pair, and the code point represented by the pair
84     * is returned.
85     * @return the next codepoint in text, or DONE if the index is at
86     *         the limit of the text.
87     * @stable ICU 2.4
88     */
89    public int nextCodePoint();
90
91}
92