1f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
2f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)******************************************************************************
3f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*
4f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   Copyright (C) 2002-2010, International Business Machines
5f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   Corporation and others.  All Rights Reserved.
6f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*
7f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)******************************************************************************
8f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   file name:  bocu1tst.c
9f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   encoding:   US-ASCII
10f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   tab size:   8 (not used)
11f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   indentation:4
12f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*
13f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   created on: 2002may27
14f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   created by: Markus W. Scherer
15f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*
16f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   This is the reference implementation of BOCU-1,
17f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   the MIME-friendly form of the Binary Ordered Compression for Unicode,
18f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   taken directly from ### http://source.icu-project.org/repos/icu/icuhtml/trunk/design/conversion/bocu1/
19f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   The files bocu1.h and bocu1.c from the design folder are taken
20f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   verbatim (minus copyright and #include) and copied together into this file.
21f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   The reference code and some of the reference bocu1tst.c
22f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   is modified to run as part of the ICU cintltst
23f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   test framework (minus main(), log_ln() etc. instead of printf()).
24f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*
25f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   This reference implementation is used here to verify
26f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   the ICU BOCU-1 implementation, which is
27f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   adapted for ICU conversion APIs and optimized.
28f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*   ### links in design doc to here and to ucnvbocu.c
29f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*/
30f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
31f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/utypes.h"
32f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/ustring.h"
33f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/ucnv.h"
34f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "cmemory.h"
35f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "cintltst.h"
36f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
37f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define LENGTHOF(array) (sizeof(array)/sizeof((array)[0]))
38f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
39f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1.h ---------------------------------- */
40f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
41f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* BOCU-1 constants and macros ---------------------------------------------- */
42f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
43f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
44f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 encodes the code points of a Unicode string as
45f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * a sequence of byte-encoded differences (slope detection),
46f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * preserving lexical order.
47f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
48f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Optimize the difference-taking for runs of Unicode text within
49f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * small scripts:
50f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
51f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Most small scripts are allocated within aligned 128-blocks of Unicode
52f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code points. Lexical order is preserved if the "previous code point" state
53f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * is always moved into the middle of such a block.
54f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
55f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Additionally, "prev" is moved from anywhere in the Unihan and Hangul
56f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * areas into the middle of those areas.
57f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
58f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * C0 control codes and space are encoded with their US-ASCII bytes.
59f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * "prev" is reset for C0 controls but not for space.
60f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
61f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
62f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* initial value for "prev": middle of the ASCII range */
63f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_ASCII_PREV        0x40
64f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
65f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* bounding byte values for differences */
66f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MIN               0x21
67f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MIDDLE            0x90
68f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MAX_LEAD          0xfe
69f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
70f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* add the L suffix to make computations with BOCU1_MAX_TRAIL work on 16-bit compilers */
71f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MAX_TRAIL         0xffL
72f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_RESET             0xff
73f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
74f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of lead bytes */
75f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_COUNT             (BOCU1_MAX_LEAD-BOCU1_MIN+1)
76f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
77f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* adjust trail byte counts for the use of some C0 control byte values */
78f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_CONTROLS_COUNT  20
79f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_BYTE_OFFSET     (BOCU1_MIN-BOCU1_TRAIL_CONTROLS_COUNT)
80f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
81f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of trail bytes */
82f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_COUNT       ((BOCU1_MAX_TRAIL-BOCU1_MIN+1)+BOCU1_TRAIL_CONTROLS_COUNT)
83f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
84f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
85f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * number of positive and negative single-byte codes
86f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (counting 0==BOCU1_MIDDLE among the positive ones)
87f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
88f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_SINGLE            64
89f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
90f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of lead bytes for positive and negative 2/3/4-byte sequences */
91f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_2            43
92f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_3            3
93f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_4            1
94f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
95f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for single-byters. */
96f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_1   (BOCU1_SINGLE-1)
97f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_1   (-BOCU1_SINGLE)
98f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
99f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for double-byters. */
100f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_2   (BOCU1_REACH_POS_1+BOCU1_LEAD_2*BOCU1_TRAIL_COUNT)
101f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_2   (BOCU1_REACH_NEG_1-BOCU1_LEAD_2*BOCU1_TRAIL_COUNT)
102f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
103f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for 3-byters. */
104f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_3   \
105f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    (BOCU1_REACH_POS_2+BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT)
106f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
107f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_3   (BOCU1_REACH_NEG_2-BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT)
108f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
109f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The lead byte start values. */
110f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_2   (BOCU1_MIDDLE+BOCU1_REACH_POS_1+1)
111f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_3   (BOCU1_START_POS_2+BOCU1_LEAD_2)
112f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_4   (BOCU1_START_POS_3+BOCU1_LEAD_3)
113f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     /* ==BOCU1_MAX_LEAD */
114f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
115f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_2   (BOCU1_MIDDLE+BOCU1_REACH_NEG_1)
116f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_3   (BOCU1_START_NEG_2-BOCU1_LEAD_2)
117f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_4   (BOCU1_START_NEG_3-BOCU1_LEAD_3)
118f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     /* ==BOCU1_MIN+1 */
119f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
120f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The length of a byte sequence, according to the lead byte (!=BOCU1_RESET). */
121f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LENGTH_FROM_LEAD(lead) \
122f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    ((BOCU1_START_NEG_2<=(lead) && (lead)<BOCU1_START_POS_2) ? 1 : \
123f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     (BOCU1_START_NEG_3<=(lead) && (lead)<BOCU1_START_POS_3) ? 2 : \
124f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     (BOCU1_START_NEG_4<=(lead) && (lead)<BOCU1_START_POS_4) ? 3 : 4)
125f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
126f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The length of a byte sequence, according to its packed form. */
127f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LENGTH_FROM_PACKED(packed) \
128f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    ((uint32_t)(packed)<0x04000000 ? (packed)>>24 : 4)
129f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
130f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
131f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 12 commonly used C0 control codes (and space) are only used to encode
132f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * themselves directly,
133f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * which makes BOCU-1 MIME-usable and reasonably safe for
134f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * ASCII-oriented software.
135f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
136f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * These controls are
137f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  0   NUL
138f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
139f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  7   BEL
140f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  8   BS
141f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
142f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  9   TAB
143f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  a   LF
144f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  b   VT
145f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  c   FF
146f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  d   CR
147f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
148f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  e   SO
149f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *  f   SI
150f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
151f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 1a   SUB
152f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 1b   ESC
153f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
154f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The other 20 C0 controls are also encoded directly (to preserve order)
155f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * but are also used as trail bytes in difference encoding
156f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (for better compression).
157f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
158f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_TO_BYTE(t) ((t)>=BOCU1_TRAIL_CONTROLS_COUNT ? (t)+BOCU1_TRAIL_BYTE_OFFSET : bocu1TrailToByte[t])
159f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
160f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
161f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Byte value map for control codes,
162f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from external byte values 0x00..0x20
163f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to trail byte values 0..19 (0..0x13) as used in the difference calculation.
164f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * External byte values that are illegal as trail bytes are mapped to -1.
165f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
166f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int8_t
167f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1ByteToTrail[BOCU1_MIN]={
168f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  0     1     2     3     4     5     6     7    */
169f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    -1,   0x00, 0x01, 0x02, 0x03, 0x04, 0x05, -1,
170f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
171f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  8     9     a     b     c     d     e     f    */
172f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    -1,   -1,   -1,   -1,   -1,   -1,   -1,   -1,
173f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
174f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  10    11    12    13    14    15    16    17   */
175f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d,
176f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
177f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  18    19    1a    1b    1c    1d    1e    1f   */
178f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    0x0e, 0x0f, -1,   -1,   0x10, 0x11, 0x12, 0x13,
179f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
180f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  20   */
181f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    -1
182f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)};
183f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
184f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
185f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Byte value map for control codes,
186f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from trail byte values 0..19 (0..0x13) as used in the difference calculation
187f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to external byte values 0x00..0x20.
188f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
189f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int8_t
190f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1TrailToByte[BOCU1_TRAIL_CONTROLS_COUNT]={
191f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  0     1     2     3     4     5     6     7    */
192f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x10, 0x11,
193f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
194f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  8     9     a     b     c     d     e     f    */
195f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19,
196f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
197f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*  10    11    12    13   */
198f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    0x1c, 0x1d, 0x1e, 0x1f
199f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)};
200f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
201f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
202f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Integer division and modulo with negative numerators
203f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * yields negative modulo results and quotients that are one more than
204f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * what we need here.
205f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This macro adjust the results so that the modulo-value m is always >=0.
206f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
207f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For positive n, the if() condition is always FALSE.
208f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
209f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param n Number to be split into quotient and rest.
210f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *          Will be modified to contain the quotient.
211f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param d Divisor.
212f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param m Output variable for the rest (modulo result).
213f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
214f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define NEGDIVMOD(n, d, m) { \
215f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    (m)=(n)%(d); \
216f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    (n)/=(d); \
217f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if((m)<0) { \
218f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        --(n); \
219f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        (m)+=(d); \
220f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } \
221f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
222f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
223f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* State for BOCU-1 decoder function. */
224f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)struct Bocu1Rx {
225f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t prev, count, diff;
226f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)};
227f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
228f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)typedef struct Bocu1Rx Bocu1Rx;
229f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
230f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* Function prototypes ------------------------------------------------------ */
231f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
232f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* see bocu1.c */
233f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
234f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)packDiff(int32_t diff);
235f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
236f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
237f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)encodeBocu1(int32_t *pPrev, int32_t c);
238f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
239f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
240f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1(Bocu1Rx *pRx, uint8_t b);
241f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
242f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1.c ---------------------------------- */
243f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
244f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* BOCU-1 implementation functions ------------------------------------------ */
245f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
246f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
247f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Compute the next "previous" value for differencing
248f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from the current code point.
249f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
250f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param c current code point, 0..0x10ffff
251f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return "previous code point" state value
252f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
253f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static U_INLINE int32_t
254f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1Prev(int32_t c) {
255f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* compute new prev */
256f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(0x3040<=c && c<=0x309f) {
257f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* Hiragana is not 128-aligned */
258f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return 0x3070;
259f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else if(0x4e00<=c && c<=0x9fa5) {
260f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* CJK Unihan */
261f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return 0x4e00-BOCU1_REACH_NEG_2;
262f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else if(0xac00<=c && c<=0xd7a3) {
263f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* Korean Hangul (cast to int32_t to avoid wraparound on 16-bit compilers) */
264f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return ((int32_t)0xd7a3+(int32_t)0xac00)/2;
265f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
266f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* mostly small scripts */
267f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return (c&~0x7f)+BOCU1_ASCII_PREV;
268f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
269f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
270f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
271f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
272f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode a difference -0x10ffff..0x10ffff in 1..4 bytes
273f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * and return a packed integer with them.
274f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
275f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The encoding favors small absolut differences with short encodings
276f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to compress runs of same-script characters.
277f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
278f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param diff difference value -0x10ffff..0x10ffff
279f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return
280f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      0x010000zz for 1-byte sequence zz
281f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      0x0200yyzz for 2-byte sequence yy zz
282f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      0x03xxyyzz for 3-byte sequence xx yy zz
283f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      0xwwxxyyzz for 4-byte sequence ww xx yy zz (ww>0x03)
284f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
285f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
286f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)packDiff(int32_t diff) {
287f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t result, m, lead, count, shift;
288f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
289f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(diff>=BOCU1_REACH_NEG_1) {
290f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* mostly positive differences, and single-byte negative ones */
291f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(diff<=BOCU1_REACH_POS_1) {
292f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* single byte */
293f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return 0x01000000|(BOCU1_MIDDLE+diff);
294f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(diff<=BOCU1_REACH_POS_2) {
295f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* two bytes */
296f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_POS_1+1;
297f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_POS_2;
298f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=1;
299f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(diff<=BOCU1_REACH_POS_3) {
300f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* three bytes */
301f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_POS_2+1;
302f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_POS_3;
303f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=2;
304f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
305f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* four bytes */
306f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_POS_3+1;
307f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_POS_4;
308f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=3;
309f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
310f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
311f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* two- and four-byte negative differences */
312f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(diff>=BOCU1_REACH_NEG_2) {
313f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* two bytes */
314f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_NEG_1;
315f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_NEG_2;
316f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=1;
317f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(diff>=BOCU1_REACH_NEG_3) {
318f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* three bytes */
319f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_NEG_2;
320f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_NEG_3;
321f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=2;
322f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
323f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* four bytes */
324f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            diff-=BOCU1_REACH_NEG_3;
325f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            lead=BOCU1_START_NEG_4;
326f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=3;
327f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
328f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
329f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
330f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* encode the length of the packed result */
331f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(count<3) {
332f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        result=(count+1)<<24;
333f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else /* count==3, MSB used for the lead byte */ {
334f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        result=0;
335f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
336f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
337f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* calculate trail bytes like digits in itoa() */
338f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    shift=0;
339f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    do {
340f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m);
341f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        result|=BOCU1_TRAIL_TO_BYTE(m)<<shift;
342f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        shift+=8;
343f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } while(--count>0);
344f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
345f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* add lead byte */
346f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    result|=(lead+diff)<<shift;
347f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
348f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return result;
349f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
350f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
351f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
352f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 encoder function.
353f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
354f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pPrev pointer to the integer that holds
355f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *        the "previous code point" state;
356f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *        the initial value should be 0 which
357f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *        encodeBocu1 will set to the actual BOCU-1 initial state value
358f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param c the code point to encode
359f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the packed 1/2/3/4-byte encoding, see packDiff(),
360f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *         or 0 if an error occurs
361f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
362f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff
363f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
364f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
365f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)encodeBocu1(int32_t *pPrev, int32_t c) {
366f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t prev;
367f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
368f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(pPrev==NULL || c<0 || c>0x10ffff) {
369f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* illegal argument */
370f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return 0;
371f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
372f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
373f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    prev=*pPrev;
374f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(prev==0) {
375f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* lenient handling of initial value 0 */
376f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        prev=*pPrev=BOCU1_ASCII_PREV;
377f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
378f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
379f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(c<=0x20) {
380f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /*
381f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * ISO C0 control & space:
382f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * Encode directly for MIME compatibility,
383f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * and reset state except for space, to not disrupt compression.
384f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         */
385f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(c!=0x20) {
386f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            *pPrev=BOCU1_ASCII_PREV;
387f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
388f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return 0x01000000|c;
389f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
390f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
391f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /*
392f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * all other Unicode code points c==U+0021..U+10ffff
393f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * are encoded with the difference c-prev
394f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     *
395f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * a new prev is computed from c,
396f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * placed in the middle of a 0x80-block (for most small scripts) or
397f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * in the middle of the Unihan and Hangul blocks
398f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * to statistically minimize the following difference
399f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     */
400f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    *pPrev=bocu1Prev(c);
401f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return packDiff(c-prev);
402f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
403f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
404f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
405f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Function for BOCU-1 decoder; handles multi-byte lead bytes.
406f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
407f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure
408f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b lead byte;
409f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *          BOCU1_MIN<=b<BOCU1_START_NEG_2 or BOCU1_START_POS_2<=b<=BOCU1_MAX_LEAD
410f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return -1 (state change only)
411f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
412f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see decodeBocu1
413f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
414f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
415f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1LeadByte(Bocu1Rx *pRx, uint8_t b) {
416f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t c, count;
417f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
418f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(b>=BOCU1_START_NEG_2) {
419f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* positive difference */
420f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(b<BOCU1_START_POS_3) {
421f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* two bytes */
422f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=((int32_t)b-BOCU1_START_POS_2)*BOCU1_TRAIL_COUNT+BOCU1_REACH_POS_1+1;
423f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=1;
424f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(b<BOCU1_START_POS_4) {
425f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* three bytes */
426f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=((int32_t)b-BOCU1_START_POS_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_POS_2+1;
427f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=2;
428f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
429f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* four bytes */
430f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=BOCU1_REACH_POS_3+1;
431f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=3;
432f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
433f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
434f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* negative difference */
435f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(b>=BOCU1_START_NEG_3) {
436f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* two bytes */
437f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=((int32_t)b-BOCU1_START_NEG_2)*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_1;
438f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=1;
439f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(b>BOCU1_MIN) {
440f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* three bytes */
441f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=((int32_t)b-BOCU1_START_NEG_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_2;
442f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=2;
443f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
444f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* four bytes */
445f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=-BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_3;
446f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            count=3;
447f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
448f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
449f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
450f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* set the state for decoding the trail byte(s) */
451f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    pRx->diff=c;
452f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    pRx->count=count;
453f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return -1;
454f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
455f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
456f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
457f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Function for BOCU-1 decoder; handles multi-byte trail bytes.
458f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
459f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure
460f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b trail byte
461f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return result value, same as decodeBocu1
462f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
463f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see decodeBocu1
464f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
465f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
466f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1TrailByte(Bocu1Rx *pRx, uint8_t b) {
467f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t t, c, count;
468f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
469f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(b<=0x20) {
470f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* skip some C0 controls and make the trail byte range contiguous */
471f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        t=bocu1ByteToTrail[b];
472f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(t<0) {
473f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* illegal trail byte value */
474f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->prev=BOCU1_ASCII_PREV;
475f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->count=0;
476f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return -99;
477f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
478f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#if BOCU1_MAX_TRAIL<0xff
479f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else if(b>BOCU1_MAX_TRAIL) {
480f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return -99;
481f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#endif
482f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
483f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        t=(int32_t)b-BOCU1_TRAIL_BYTE_OFFSET;
484f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
485f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
486f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* add trail byte into difference and decrement count */
487f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    c=pRx->diff;
488f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    count=pRx->count;
489f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
490f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(count==1) {
491f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* final trail byte, deliver a code point */
492f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        c=pRx->prev+c+t;
493f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(0<=c && c<=0x10ffff) {
494f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* valid code point result */
495f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->prev=bocu1Prev(c);
496f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->count=0;
497f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return c;
498f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
499f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* illegal code point result */
500f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->prev=BOCU1_ASCII_PREV;
501f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->count=0;
502f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return -99;
503f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
504f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
505f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
506f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* intermediate trail byte */
507f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(count==2) {
508f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        pRx->diff=c+t*BOCU1_TRAIL_COUNT;
509f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else /* count==3 */ {
510f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        pRx->diff=c+t*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT;
511f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
512f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    pRx->count=count-1;
513f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return -1;
514f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
515f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
516f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
517f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 decoder function.
518f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
519f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure;
520f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *        the initial values should be 0 which
521f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *        decodeBocu1 will set to actual initial state values
522f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b an input byte
523f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return
524f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      0..0x10ffff for a result code point
525f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *      -1 if only the state changed without code point output
526f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *     <-1 if an error occurs
527f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
528f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t
529f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1(Bocu1Rx *pRx, uint8_t b) {
530f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t prev, c, count;
531f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
532f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(pRx==NULL) {
533f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* illegal argument */
534f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return -99;
535f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
536f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
537f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    prev=pRx->prev;
538f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(prev==0) {
539f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* lenient handling of initial 0 values */
540f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        prev=pRx->prev=BOCU1_ASCII_PREV;
541f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        count=pRx->count=0;
542f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
543f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        count=pRx->count;
544f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
545f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
546f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(count==0) {
547f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* byte in lead position */
548f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(b<=0x20) {
549f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /*
550f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)             * Direct-encoded C0 control code or space.
551f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)             * Reset prev for C0 control codes but not for space.
552f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)             */
553f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            if(b!=0x20) {
554f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                pRx->prev=BOCU1_ASCII_PREV;
555f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            }
556f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return b;
557f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
558f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
559f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /*
560f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * b is a difference lead byte.
561f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         *
562f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * Return a code point directly from a single-byte difference.
563f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         *
564f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * For multi-byte difference lead bytes, set the decoder state
565f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * with the partial difference value from the lead byte and
566f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * with the number of trail bytes.
567f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         *
568f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * For four-byte differences, the signedness also affects the
569f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         * first trail byte, which has special handling farther below.
570f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)         */
571f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(b>=BOCU1_START_NEG_2 && b<BOCU1_START_POS_2) {
572f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* single-byte difference */
573f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            c=prev+((int32_t)b-BOCU1_MIDDLE);
574f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->prev=bocu1Prev(c);
575f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return c;
576f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else if(b==BOCU1_RESET) {
577f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            /* only reset the state, no code point */
578f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            pRx->prev=BOCU1_ASCII_PREV;
579f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return -1;
580f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
581f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return decodeBocu1LeadByte(pRx, b);
582f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
583f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
584f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* trail byte in any position */
585f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return decodeBocu1TrailByte(pRx, b);
586f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
587f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
588f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
589f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1tst.c ------------------------------- */
590f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
591f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test code ---------------------------------------------------------------- */
592f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
593f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test code options */
594f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
595f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* ignore comma when processing name lists in testText() */
596f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define TEST_IGNORE_COMMA       1
597f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
598f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
599f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Write a packed BOCU-1 byte sequence into a byte array,
600f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * without overflow check.
601f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function.
602f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
603f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param packed packed BOCU-1 byte sequence, see packDiff()
604f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to byte array
605f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of bytes
606f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
607f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff
608f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
609f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
610f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writePacked(int32_t packed, uint8_t *p) {
611f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t count=BOCU1_LENGTH_FROM_PACKED(packed);
612f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    switch(count) {
613f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 4:
614f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *p++=(uint8_t)(packed>>24);
615f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 3:
616f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *p++=(uint8_t)(packed>>16);
617f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 2:
618f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *p++=(uint8_t)(packed>>8);
619f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 1:
620f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *p++=(uint8_t)packed;
621f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    default:
622f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        break;
623f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
624f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
625f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return count;
626f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
627f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
628f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
629f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Unpack a packed BOCU-1 non-C0/space byte sequence and get
630f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the difference to initialPrev.
631f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Used only for round-trip testing of the difference encoding and decoding.
632f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function.
633f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
634f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param initialPrev bogus "previous code point" value to make sure that
635f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *                    the resulting code point is in the range 0..0x10ffff
636f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param packed packed BOCU-1 byte sequence
637f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the difference to initialPrev
638f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
639f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff
640f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see writeDiff
641f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
642f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
643f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)unpackDiff(int32_t initialPrev, int32_t packed) {
644f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    Bocu1Rx rx={ 0, 0, 0 };
645f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t count;
646f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
647f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    rx.prev=initialPrev;
648f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    count=BOCU1_LENGTH_FROM_PACKED(packed);
649f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    switch(count) {
650f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 4:
651f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        decodeBocu1(&rx, (uint8_t)(packed>>24));
652f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 3:
653f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        decodeBocu1(&rx, (uint8_t)(packed>>16));
654f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 2:
655f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        decodeBocu1(&rx, (uint8_t)(packed>>8));
656f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    case 1:
657f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* subtract initial prev */
658f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return decodeBocu1(&rx, (uint8_t)packed)-initialPrev;
659f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    default:
660f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return -0x7fffffff;
661f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
662f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
663f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
664f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
665f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode one difference value -0x10ffff..+0x10ffff in 1..4 bytes,
666f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * preserving lexical order.
667f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Also checks for roundtripping of the difference encoding.
668f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function.
669f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
670f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param diff difference value to test, -0x10ffff..0x10ffff
671f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to output byte array
672f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return p advanced by number of bytes output
673f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
674f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unpackDiff
675f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
676f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static uint8_t *
677f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writeDiff(int32_t diff, uint8_t *p) {
678f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* generate the difference as a packed value and serialize it */
679f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t packed, initialPrev;
680f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
681f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    packed=packDiff(diff);
682f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
683f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /*
684f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * bogus initial "prev" to work around
685f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     * code point range check in decodeBocu1()
686f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)     */
687f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(diff<=0) {
688f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        initialPrev=0x10ffff;
689f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
690f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        initialPrev=-1;
691f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
692f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
693f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(diff!=unpackDiff(initialPrev, packed)) {
694f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("error: unpackDiff(packDiff(diff=%ld)=0x%08lx)=%ld!=diff\n",
695f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                diff, packed, unpackDiff(initialPrev, packed));
696f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
697f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return p+writePacked(packed, p);
698f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
699f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
700f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
701f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode a UTF-16 string in BOCU-1.
702f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Does not check for overflows, but otherwise useful function.
703f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
704f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s input UTF-16 string
705f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length number of UChar code units in s
706f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to output byte array
707f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of bytes output
708f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
709f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
710f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writeString(const UChar *s, int32_t length, uint8_t *p) {
711f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    uint8_t *p0;
712f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t c, prev, i;
713f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
714f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    prev=0;
715f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    p0=p;
716f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=0;
717f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    while(i<length) {
718f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        UTF_NEXT_CHAR(s, i, length, c);
719f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        p+=writePacked(encodeBocu1(&prev, c), p);
720f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
721f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return (int32_t)(p-p0);
722f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
723f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
724f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
725f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Decode a BOCU-1 byte sequence to a UTF-16 string.
726f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Does not check for overflows, but otherwise useful function.
727f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
728f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to input BOCU-1 bytes
729f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length number of input bytes
730f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s point to output UTF-16 string array
731f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of UChar code units output
732f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
733f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t
734f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)readString(const uint8_t *p, int32_t length, UChar *s) {
735f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    Bocu1Rx rx={ 0, 0, 0 };
736f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t c, i, sLength;
737f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
738f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=sLength=0;
739f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    while(i<length) {
740f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        c=decodeBocu1(&rx, p[i++]);
741f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(c<-1) {
742f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            log_err("error: readString detects encoding error at string index %ld\n", i);
743f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            return -1;
744f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
745f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(c>=0) {
746f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            UTF_APPEND_CHAR_UNSAFE(s, sLength, c);
747f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
748f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
749f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return sLength;
750f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
751f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
752f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static U_INLINE char
753f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)hexDigit(uint8_t digit) {
754f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    return digit<=9 ? (char)('0'+digit) : (char)('a'-10+digit);
755f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
756f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
757f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
758f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Pretty-print 0-terminated byte values.
759f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Helper function for test output.
760f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *
761f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param bytes 0-terminated byte array to print
762f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
763f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void
764f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)printBytes(uint8_t *bytes, char *out) {
765f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int i;
766f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    uint8_t b;
767f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
768f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=0;
769f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    while((b=*bytes++)!=0) {
770f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *out++=' ';
771f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *out++=hexDigit((uint8_t)(b>>4));
772f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *out++=hexDigit((uint8_t)(b&0xf));
773f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        ++i;
774f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
775f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=3*(5-i);
776f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    while(i>0) {
777f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *out++=' ';
778f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        --i;
779f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
780f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    *out=0;
781f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
782f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
783f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/**
784f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Basic BOCU-1 test function, called when there are no command line arguments.
785f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Prints some of the #define values and performs round-trip tests of the
786f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * difference encoding and decoding.
787f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
788f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void
789f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)TestBOCU1RefDiff(void) {
790f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    char buf1[80], buf2[80];
791f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    uint8_t prev[5], level[5];
792f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t i, cmp, countErrors;
793f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
794f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("reach of single bytes: %ld\n", 1+BOCU1_REACH_POS_1-BOCU1_REACH_NEG_1);
795f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("reach of 2 bytes     : %ld\n", 1+BOCU1_REACH_POS_2-BOCU1_REACH_NEG_2);
796f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("reach of 3 bytes     : %ld\n\n", 1+BOCU1_REACH_POS_3-BOCU1_REACH_NEG_3);
797f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
798f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_REACH_NEG_1 %8ld    BOCU1_REACH_POS_1 %8ld\n", BOCU1_REACH_NEG_1, BOCU1_REACH_POS_1);
799f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_REACH_NEG_2 %8ld    BOCU1_REACH_POS_2 %8ld\n", BOCU1_REACH_NEG_2, BOCU1_REACH_POS_2);
800f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_REACH_NEG_3 %8ld    BOCU1_REACH_POS_3 %8ld\n\n", BOCU1_REACH_NEG_3, BOCU1_REACH_POS_3);
801f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
802f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_MIDDLE      0x%02x\n", BOCU1_MIDDLE);
803f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_START_NEG_2 0x%02x    BOCU1_START_POS_2 0x%02x\n", BOCU1_START_NEG_2, BOCU1_START_POS_2);
804f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("    BOCU1_START_NEG_3 0x%02x    BOCU1_START_POS_3 0x%02x\n\n", BOCU1_START_NEG_3, BOCU1_START_POS_3);
805f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
806f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* test packDiff() & unpackDiff() with some specific values */
807f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(0, level);
808f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(1, level);
809f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(65, level);
810f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(130, level);
811f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(30000, level);
812f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(1000000, level);
813f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(-65, level);
814f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(-130, level);
815f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(-30000, level);
816f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writeDiff(-1000000, level);
817f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
818f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* test that each value is smaller than any following one */
819f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    countErrors=0;
820f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=-0x10ffff;
821f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    *writeDiff(i, prev)=0;
822f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
823f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* show first number and bytes */
824f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    printBytes(prev, buf1);
825f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("              wD(%8ld)                    %s\n", i, buf1);
826f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
827f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    for(++i; i<=0x10ffff; ++i) {
828f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        *writeDiff(i, level)=0;
829f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        cmp=strcmp((const char *)prev, (const char *)level);
830f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(BOCU1_LENGTH_FROM_LEAD(level[0])!=(int32_t)strlen((const char *)level)) {
831f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            log_verbose("BOCU1_LENGTH_FROM_LEAD(0x%02x)=%ld!=%ld=strlen(writeDiff(%ld))\n",
832f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                   level[0], BOCU1_LENGTH_FROM_LEAD(level[0]), strlen((const char *)level), i);
833f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
834f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(cmp<0) {
835f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            if(i==0 || i==1 || strlen((const char *)prev)!=strlen((const char *)level)) {
836f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                /*
837f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                 * if the result is good, then print only if the length changed
838f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                 * to get little but interesting output
839f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                 */
840f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                printBytes(prev, buf1);
841f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                printBytes(level, buf2);
842f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)                log_verbose("ok:    strcmp(wD(%8ld), wD(%8ld))=%2d  %s%s\n", i-1, i, cmp, buf1, buf2);
843f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            }
844f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        } else {
845f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            ++countErrors;
846f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            printBytes(prev, buf1);
847f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            printBytes(level, buf2);
848f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            log_verbose("wrong: strcmp(wD(%8ld), wD(%8ld))=%2d  %s%s\n", i-1, i, cmp, buf1, buf2);
849f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
850f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        /* remember the previous bytes */
851f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        memcpy(prev, level, 4);
852f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
853f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
854f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* show last number and bytes */
855f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    printBytes((uint8_t *)"", buf1);
856f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    printBytes(prev, buf2);
857f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("                            wD(%8ld)      %s%s\n", i-1, buf1, buf2);
858f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
859f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(countErrors==0) {
860f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_verbose("writeDiff(-0x10ffff..0x10ffff) works fine\n");
861f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    } else {
862f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("writeDiff(-0x10ffff..0x10ffff) violates lexical ordering in %d cases\n", countErrors);
863f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
864f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
865f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* output signature byte sequence */
866f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    i=0;
867f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    writePacked(encodeBocu1(&i, 0xfeff), level);
868f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    log_verbose("\nBOCU-1 signature byte sequence: %02x %02x %02x\n",
869f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            level[0], level[1], level[2]);
870f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
871f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
872f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* cintltst code ------------------------------------------------------------ */
873f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
874f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int32_t DEFAULT_BUFFER_SIZE = 30000;
875f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
876f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
877f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test one string with the ICU and the reference BOCU-1 implementations */
878f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void
879f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)roundtripBOCU1(UConverter *bocu1, int32_t number, const UChar *text, int32_t length) {
880f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    UChar *roundtripRef, *roundtripICU;
881f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    char *bocu1Ref, *bocu1ICU;
882f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
883f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t bocu1RefLength, bocu1ICULength, roundtripRefLength, roundtripICULength;
884f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    UErrorCode errorCode;
885f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
886f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripRef = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar));
887f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripICU = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar));
888f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    bocu1Ref = malloc(DEFAULT_BUFFER_SIZE);
889f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    bocu1ICU = malloc(DEFAULT_BUFFER_SIZE);
890f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
891f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* Unicode -> BOCU-1 */
892f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    bocu1RefLength=writeString(text, length, (uint8_t *)bocu1Ref);
893f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
894f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    errorCode=U_ZERO_ERROR;
895f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    bocu1ICULength=ucnv_fromUChars(bocu1, bocu1ICU, DEFAULT_BUFFER_SIZE, text, length, &errorCode);
896f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(U_FAILURE(errorCode)) {
897f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("ucnv_fromUChars(BOCU-1, text(%d)[%d]) failed: %s\n", number, length, u_errorName(errorCode));
898f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
899f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
900f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
901f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(bocu1RefLength!=bocu1ICULength || 0!=uprv_memcmp(bocu1Ref, bocu1ICU, bocu1RefLength)) {
902f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("Unicode(%d)[%d] -> BOCU-1: reference[%d]!=ICU[%d]\n", number, length, bocu1RefLength, bocu1ICULength);
903f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
904f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
905f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
906f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* BOCU-1 -> Unicode */
907f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripRefLength=readString((uint8_t *)bocu1Ref, bocu1RefLength, roundtripRef);
908f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(roundtripRefLength<0) {
909f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        free(roundtripICU);
910f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return; /* readString() found an error and reported it */
911f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
912f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
913f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripICULength=ucnv_toUChars(bocu1, roundtripICU, DEFAULT_BUFFER_SIZE, bocu1ICU, bocu1ICULength, &errorCode);
914f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(U_FAILURE(errorCode)) {
915f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("ucnv_toUChars(BOCU-1, text(%d)[%d]) failed: %s\n", number, length, u_errorName(errorCode));
916f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
917f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
918f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
919f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(length!=roundtripRefLength || 0!=u_memcmp(text, roundtripRef, length)) {
920f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("BOCU-1 -> Unicode: original(%d)[%d]!=reference[%d]\n", number, length, roundtripRefLength);
921f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
922f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
923f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(roundtripRefLength!=roundtripICULength || 0!=u_memcmp(roundtripRef, roundtripICU, roundtripRefLength)) {
924f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("BOCU-1 -> Unicode: reference(%d)[%d]!=ICU[%d]\n", number, roundtripRefLength, roundtripICULength);
925f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
926f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
927f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    free(roundtripRef);
928f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    free(roundtripICU);
929f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    free(bocu1Ref);
930f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    free(bocu1ICU);
931f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
932f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
933f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar feff[]={ 0xfeff };
934f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar ascii[]={ 0x61, 0x62, 0x20, 0x63, 0x61 };
935f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar crlf[]={ 0xd, 0xa, 0x20 };
936f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar nul[]={ 0 };
937f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar latin[]={ 0xdf, 0xe6 };
938f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar devanagari[]={ 0x930, 0x20, 0x918, 0x909 };
939f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar hiragana[]={ 0x3086, 0x304d, 0x20, 0x3053, 0x4000 };
940f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar unihan[]={ 0x4e00, 0x7777, 0x20, 0x9fa5, 0x4e00 };
941f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar hangul[]={ 0xac00, 0xbcde, 0x20, 0xd7a3 };
942f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar surrogates[]={ 0xdc00, 0xd800 }; /* single surrogates, unmatched! */
943f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane1[]={ 0xd800, 0xdc00 };
944f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane2[]={ 0xd845, 0xdddd };
945f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane15[]={ 0xdbbb, 0xddee, 0x20 };
946f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane16[]={ 0xdbff, 0xdfff };
947f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar c0[]={ 1, 0xe40, 0x20, 9 };
948f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
949f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const struct {
950f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    const UChar *s;
951f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t length;
952f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} strings[]={
953f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { feff,         LENGTHOF(feff) },
954f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { ascii,        LENGTHOF(ascii) },
955f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { crlf,         LENGTHOF(crlf) },
956f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { nul,          LENGTHOF(nul) },
957f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { latin,        LENGTHOF(latin) },
958f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { devanagari,   LENGTHOF(devanagari) },
959f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { hiragana,     LENGTHOF(hiragana) },
960f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { unihan,       LENGTHOF(unihan) },
961f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { hangul,       LENGTHOF(hangul) },
962f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { surrogates,   LENGTHOF(surrogates) },
963f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { plane1,       LENGTHOF(plane1) },
964f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { plane2,       LENGTHOF(plane2) },
965f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { plane15,      LENGTHOF(plane15) },
966f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { plane16,      LENGTHOF(plane16) },
967f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    { c0,           LENGTHOF(c0) }
968f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)};
969f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
970f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/*
971f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Verify that the ICU BOCU-1 implementation produces the same results as
972f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the reference implementation from the design folder.
973f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Generate some texts and convert them with both converters, verifying
974f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * identical results and roundtripping.
975f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */
976f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void
977f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)TestBOCU1(void) {
978f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    UChar *text;
979f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    int32_t i, length;
980f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
981f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    UConverter *bocu1;
982f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    UErrorCode errorCode;
983f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
984f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    errorCode=U_ZERO_ERROR;
985f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    bocu1=ucnv_open("BOCU-1", &errorCode);
986f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    if(U_FAILURE(errorCode)) {
987f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        log_err("error: unable to open BOCU-1 converter: %s\n", u_errorName(errorCode));
988f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        return;
989f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
990f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
991f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    text = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar));
992f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
993f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* text 1: each of strings[] once */
994f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    length=0;
995f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    for(i=0; i<LENGTHOF(strings); ++i) {
996f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        u_memcpy(text+length, strings[i].s, strings[i].length);
997f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        length+=strings[i].length;
998f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
999f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripBOCU1(bocu1, 1, text, length);
1000f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
1001f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* text 2: each of strings[] twice */
1002f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    length=0;
1003f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    for(i=0; i<LENGTHOF(strings); ++i) {
1004f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        u_memcpy(text+length, strings[i].s, strings[i].length);
1005f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        length+=strings[i].length;
1006f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        u_memcpy(text+length, strings[i].s, strings[i].length);
1007f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        length+=strings[i].length;
1008f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
1009f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripBOCU1(bocu1, 2, text, length);
1010f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
1011f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    /* text 3: each of strings[] many times (set step vs. |strings| so that all strings are used) */
1012f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    length=0;
1013f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    for(i=1; length<5000; i+=7) {
1014f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        if(i>=LENGTHOF(strings)) {
1015f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)            i-=LENGTHOF(strings);
1016f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        }
1017f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        u_memcpy(text+length, strings[i].s, strings[i].length);
1018f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)        length+=strings[i].length;
1019f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    }
1020f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    roundtripBOCU1(bocu1, 3, text, length);
1021f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
1022f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    ucnv_close(bocu1);
1023f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    free(text);
1024f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
1025f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
1026f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC void addBOCU1Tests(TestNode** root);
1027f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)
1028f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC void
1029f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)addBOCU1Tests(TestNode** root) {
1030f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    addTest(root, TestBOCU1RefDiff, "tsconv/bocu1tst/TestBOCU1RefDiff");
1031f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)    addTest(root, TestBOCU1, "tsconv/bocu1tst/TestBOCU1");
1032f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}
1033