1f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 2f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)****************************************************************************** 3f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* 4f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* Copyright (C) 2002-2010, International Business Machines 5f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* Corporation and others. All Rights Reserved. 6f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* 7f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)****************************************************************************** 8f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* file name: bocu1tst.c 9f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* encoding: US-ASCII 10f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* tab size: 8 (not used) 11f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* indentation:4 12f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* 13f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* created on: 2002may27 14f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* created by: Markus W. Scherer 15f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* 16f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* This is the reference implementation of BOCU-1, 17f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* the MIME-friendly form of the Binary Ordered Compression for Unicode, 18f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* taken directly from ### http://source.icu-project.org/repos/icu/icuhtml/trunk/design/conversion/bocu1/ 19f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* The files bocu1.h and bocu1.c from the design folder are taken 20f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* verbatim (minus copyright and #include) and copied together into this file. 21f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* The reference code and some of the reference bocu1tst.c 22f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* is modified to run as part of the ICU cintltst 23f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* test framework (minus main(), log_ln() etc. instead of printf()). 24f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* 25f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* This reference implementation is used here to verify 26f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* the ICU BOCU-1 implementation, which is 27f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* adapted for ICU conversion APIs and optimized. 28f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)* ### links in design doc to here and to ucnvbocu.c 29f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)*/ 30f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 31f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/utypes.h" 32f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/ustring.h" 33f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "unicode/ucnv.h" 34f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "cmemory.h" 35f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#include "cintltst.h" 36f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 37f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define LENGTHOF(array) (sizeof(array)/sizeof((array)[0])) 38f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 39f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1.h ---------------------------------- */ 40f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 41f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* BOCU-1 constants and macros ---------------------------------------------- */ 42f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 43f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 44f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 encodes the code points of a Unicode string as 45f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * a sequence of byte-encoded differences (slope detection), 46f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * preserving lexical order. 47f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 48f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Optimize the difference-taking for runs of Unicode text within 49f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * small scripts: 50f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 51f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Most small scripts are allocated within aligned 128-blocks of Unicode 52f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code points. Lexical order is preserved if the "previous code point" state 53f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * is always moved into the middle of such a block. 54f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 55f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Additionally, "prev" is moved from anywhere in the Unihan and Hangul 56f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * areas into the middle of those areas. 57f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 58f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * C0 control codes and space are encoded with their US-ASCII bytes. 59f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * "prev" is reset for C0 controls but not for space. 60f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 61f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 62f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* initial value for "prev": middle of the ASCII range */ 63f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_ASCII_PREV 0x40 64f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 65f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* bounding byte values for differences */ 66f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MIN 0x21 67f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MIDDLE 0x90 68f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MAX_LEAD 0xfe 69f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 70f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* add the L suffix to make computations with BOCU1_MAX_TRAIL work on 16-bit compilers */ 71f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_MAX_TRAIL 0xffL 72f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_RESET 0xff 73f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 74f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of lead bytes */ 75f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_COUNT (BOCU1_MAX_LEAD-BOCU1_MIN+1) 76f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 77f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* adjust trail byte counts for the use of some C0 control byte values */ 78f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_CONTROLS_COUNT 20 79f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_BYTE_OFFSET (BOCU1_MIN-BOCU1_TRAIL_CONTROLS_COUNT) 80f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 81f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of trail bytes */ 82f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_COUNT ((BOCU1_MAX_TRAIL-BOCU1_MIN+1)+BOCU1_TRAIL_CONTROLS_COUNT) 83f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 84f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 85f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * number of positive and negative single-byte codes 86f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (counting 0==BOCU1_MIDDLE among the positive ones) 87f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 88f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_SINGLE 64 89f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 90f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* number of lead bytes for positive and negative 2/3/4-byte sequences */ 91f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_2 43 92f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_3 3 93f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LEAD_4 1 94f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 95f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for single-byters. */ 96f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_1 (BOCU1_SINGLE-1) 97f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_1 (-BOCU1_SINGLE) 98f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 99f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for double-byters. */ 100f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_2 (BOCU1_REACH_POS_1+BOCU1_LEAD_2*BOCU1_TRAIL_COUNT) 101f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_2 (BOCU1_REACH_NEG_1-BOCU1_LEAD_2*BOCU1_TRAIL_COUNT) 102f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 103f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The difference value range for 3-byters. */ 104f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_POS_3 \ 105f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (BOCU1_REACH_POS_2+BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT) 106f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 107f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_REACH_NEG_3 (BOCU1_REACH_NEG_2-BOCU1_LEAD_3*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT) 108f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 109f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The lead byte start values. */ 110f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_2 (BOCU1_MIDDLE+BOCU1_REACH_POS_1+1) 111f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_3 (BOCU1_START_POS_2+BOCU1_LEAD_2) 112f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_POS_4 (BOCU1_START_POS_3+BOCU1_LEAD_3) 113f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* ==BOCU1_MAX_LEAD */ 114f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 115f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_2 (BOCU1_MIDDLE+BOCU1_REACH_NEG_1) 116f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_3 (BOCU1_START_NEG_2-BOCU1_LEAD_2) 117f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_START_NEG_4 (BOCU1_START_NEG_3-BOCU1_LEAD_3) 118f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* ==BOCU1_MIN+1 */ 119f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 120f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The length of a byte sequence, according to the lead byte (!=BOCU1_RESET). */ 121f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LENGTH_FROM_LEAD(lead) \ 122f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ((BOCU1_START_NEG_2<=(lead) && (lead)<BOCU1_START_POS_2) ? 1 : \ 123f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (BOCU1_START_NEG_3<=(lead) && (lead)<BOCU1_START_POS_3) ? 2 : \ 124f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (BOCU1_START_NEG_4<=(lead) && (lead)<BOCU1_START_POS_4) ? 3 : 4) 125f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 126f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* The length of a byte sequence, according to its packed form. */ 127f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_LENGTH_FROM_PACKED(packed) \ 128f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ((uint32_t)(packed)<0x04000000 ? (packed)>>24 : 4) 129f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 130f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 131f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 12 commonly used C0 control codes (and space) are only used to encode 132f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * themselves directly, 133f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * which makes BOCU-1 MIME-usable and reasonably safe for 134f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * ASCII-oriented software. 135f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 136f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * These controls are 137f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0 NUL 138f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 139f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 7 BEL 140f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 8 BS 141f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 142f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 9 TAB 143f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * a LF 144f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * b VT 145f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * c FF 146f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * d CR 147f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 148f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * e SO 149f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * f SI 150f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 151f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 1a SUB 152f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 1b ESC 153f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 154f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The other 20 C0 controls are also encoded directly (to preserve order) 155f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * but are also used as trail bytes in difference encoding 156f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * (for better compression). 157f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 158f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define BOCU1_TRAIL_TO_BYTE(t) ((t)>=BOCU1_TRAIL_CONTROLS_COUNT ? (t)+BOCU1_TRAIL_BYTE_OFFSET : bocu1TrailToByte[t]) 159f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 160f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 161f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Byte value map for control codes, 162f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from external byte values 0x00..0x20 163f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to trail byte values 0..19 (0..0x13) as used in the difference calculation. 164f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * External byte values that are illegal as trail bytes are mapped to -1. 165f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 166f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int8_t 167f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1ByteToTrail[BOCU1_MIN]={ 168f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 0 1 2 3 4 5 6 7 */ 169f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) -1, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, -1, 170f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 171f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 8 9 a b c d e f */ 172f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) -1, -1, -1, -1, -1, -1, -1, -1, 173f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 174f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 10 11 12 13 14 15 16 17 */ 175f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 176f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 177f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 18 19 1a 1b 1c 1d 1e 1f */ 178f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 0x0e, 0x0f, -1, -1, 0x10, 0x11, 0x12, 0x13, 179f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 180f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 20 */ 181f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) -1 182f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}; 183f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 184f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 185f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Byte value map for control codes, 186f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from trail byte values 0..19 (0..0x13) as used in the difference calculation 187f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to external byte values 0x00..0x20. 188f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 189f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int8_t 190f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1TrailToByte[BOCU1_TRAIL_CONTROLS_COUNT]={ 191f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 0 1 2 3 4 5 6 7 */ 192f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x10, 0x11, 193f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 194f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 8 9 a b c d e f */ 195f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 196f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 197f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 10 11 12 13 */ 198f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 0x1c, 0x1d, 0x1e, 0x1f 199f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}; 200f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 201f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 202f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Integer division and modulo with negative numerators 203f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * yields negative modulo results and quotients that are one more than 204f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * what we need here. 205f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * This macro adjust the results so that the modulo-value m is always >=0. 206f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 207f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For positive n, the if() condition is always FALSE. 208f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 209f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param n Number to be split into quotient and rest. 210f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Will be modified to contain the quotient. 211f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param d Divisor. 212f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param m Output variable for the rest (modulo result). 213f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 214f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define NEGDIVMOD(n, d, m) { \ 215f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (m)=(n)%(d); \ 216f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (n)/=(d); \ 217f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if((m)<0) { \ 218f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) --(n); \ 219f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) (m)+=(d); \ 220f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } \ 221f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 222f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 223f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* State for BOCU-1 decoder function. */ 224f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)struct Bocu1Rx { 225f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t prev, count, diff; 226f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}; 227f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 228f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)typedef struct Bocu1Rx Bocu1Rx; 229f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 230f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* Function prototypes ------------------------------------------------------ */ 231f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 232f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* see bocu1.c */ 233f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 234f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)packDiff(int32_t diff); 235f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 236f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 237f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)encodeBocu1(int32_t *pPrev, int32_t c); 238f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 239f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 240f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1(Bocu1Rx *pRx, uint8_t b); 241f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 242f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1.c ---------------------------------- */ 243f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 244f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* BOCU-1 implementation functions ------------------------------------------ */ 245f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 246f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 247f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Compute the next "previous" value for differencing 248f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * from the current code point. 249f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 250f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param c current code point, 0..0x10ffff 251f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return "previous code point" state value 252f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 253f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static U_INLINE int32_t 254f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)bocu1Prev(int32_t c) { 255f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* compute new prev */ 256f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(0x3040<=c && c<=0x309f) { 257f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* Hiragana is not 128-aligned */ 258f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return 0x3070; 259f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(0x4e00<=c && c<=0x9fa5) { 260f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* CJK Unihan */ 261f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return 0x4e00-BOCU1_REACH_NEG_2; 262f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(0xac00<=c && c<=0xd7a3) { 263f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* Korean Hangul (cast to int32_t to avoid wraparound on 16-bit compilers) */ 264f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return ((int32_t)0xd7a3+(int32_t)0xac00)/2; 265f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 266f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* mostly small scripts */ 267f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return (c&~0x7f)+BOCU1_ASCII_PREV; 268f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 269f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 270f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 271f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 272f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode a difference -0x10ffff..0x10ffff in 1..4 bytes 273f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * and return a packed integer with them. 274f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 275f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * The encoding favors small absolut differences with short encodings 276f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to compress runs of same-script characters. 277f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 278f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param diff difference value -0x10ffff..0x10ffff 279f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return 280f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0x010000zz for 1-byte sequence zz 281f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0x0200yyzz for 2-byte sequence yy zz 282f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0x03xxyyzz for 3-byte sequence xx yy zz 283f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0xwwxxyyzz for 4-byte sequence ww xx yy zz (ww>0x03) 284f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 285f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 286f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)packDiff(int32_t diff) { 287f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t result, m, lead, count, shift; 288f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 289f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(diff>=BOCU1_REACH_NEG_1) { 290f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* mostly positive differences, and single-byte negative ones */ 291f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(diff<=BOCU1_REACH_POS_1) { 292f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* single byte */ 293f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return 0x01000000|(BOCU1_MIDDLE+diff); 294f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(diff<=BOCU1_REACH_POS_2) { 295f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* two bytes */ 296f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_POS_1+1; 297f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_POS_2; 298f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=1; 299f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(diff<=BOCU1_REACH_POS_3) { 300f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* three bytes */ 301f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_POS_2+1; 302f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_POS_3; 303f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=2; 304f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 305f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* four bytes */ 306f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_POS_3+1; 307f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_POS_4; 308f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=3; 309f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 310f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 311f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* two- and four-byte negative differences */ 312f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(diff>=BOCU1_REACH_NEG_2) { 313f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* two bytes */ 314f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_NEG_1; 315f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_NEG_2; 316f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=1; 317f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(diff>=BOCU1_REACH_NEG_3) { 318f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* three bytes */ 319f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_NEG_2; 320f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_NEG_3; 321f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=2; 322f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 323f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* four bytes */ 324f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff-=BOCU1_REACH_NEG_3; 325f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) lead=BOCU1_START_NEG_4; 326f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=3; 327f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 328f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 329f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 330f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* encode the length of the packed result */ 331f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(count<3) { 332f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) result=(count+1)<<24; 333f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else /* count==3, MSB used for the lead byte */ { 334f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) result=0; 335f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 336f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 337f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* calculate trail bytes like digits in itoa() */ 338f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) shift=0; 339f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) do { 340f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) NEGDIVMOD(diff, BOCU1_TRAIL_COUNT, m); 341f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) result|=BOCU1_TRAIL_TO_BYTE(m)<<shift; 342f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) shift+=8; 343f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } while(--count>0); 344f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 345f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* add lead byte */ 346f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) result|=(lead+diff)<<shift; 347f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 348f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return result; 349f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 350f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 351f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 352f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 encoder function. 353f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 354f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pPrev pointer to the integer that holds 355f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the "previous code point" state; 356f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the initial value should be 0 which 357f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * encodeBocu1 will set to the actual BOCU-1 initial state value 358f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param c the code point to encode 359f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the packed 1/2/3/4-byte encoding, see packDiff(), 360f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * or 0 if an error occurs 361f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 362f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff 363f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 364f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 365f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)encodeBocu1(int32_t *pPrev, int32_t c) { 366f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t prev; 367f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 368f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(pPrev==NULL || c<0 || c>0x10ffff) { 369f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* illegal argument */ 370f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return 0; 371f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 372f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 373f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) prev=*pPrev; 374f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(prev==0) { 375f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* lenient handling of initial value 0 */ 376f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) prev=*pPrev=BOCU1_ASCII_PREV; 377f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 378f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 379f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(c<=0x20) { 380f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 381f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * ISO C0 control & space: 382f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode directly for MIME compatibility, 383f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * and reset state except for space, to not disrupt compression. 384f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 385f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(c!=0x20) { 386f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *pPrev=BOCU1_ASCII_PREV; 387f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 388f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return 0x01000000|c; 389f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 390f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 391f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 392f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * all other Unicode code points c==U+0021..U+10ffff 393f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * are encoded with the difference c-prev 394f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 395f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * a new prev is computed from c, 396f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * placed in the middle of a 0x80-block (for most small scripts) or 397f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * in the middle of the Unihan and Hangul blocks 398f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to statistically minimize the following difference 399f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 400f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *pPrev=bocu1Prev(c); 401f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return packDiff(c-prev); 402f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 403f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 404f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 405f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Function for BOCU-1 decoder; handles multi-byte lead bytes. 406f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 407f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure 408f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b lead byte; 409f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU1_MIN<=b<BOCU1_START_NEG_2 or BOCU1_START_POS_2<=b<=BOCU1_MAX_LEAD 410f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return -1 (state change only) 411f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 412f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see decodeBocu1 413f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 414f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 415f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1LeadByte(Bocu1Rx *pRx, uint8_t b) { 416f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t c, count; 417f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 418f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b>=BOCU1_START_NEG_2) { 419f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* positive difference */ 420f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b<BOCU1_START_POS_3) { 421f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* two bytes */ 422f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=((int32_t)b-BOCU1_START_POS_2)*BOCU1_TRAIL_COUNT+BOCU1_REACH_POS_1+1; 423f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=1; 424f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(b<BOCU1_START_POS_4) { 425f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* three bytes */ 426f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=((int32_t)b-BOCU1_START_POS_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_POS_2+1; 427f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=2; 428f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 429f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* four bytes */ 430f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=BOCU1_REACH_POS_3+1; 431f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=3; 432f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 433f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 434f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* negative difference */ 435f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b>=BOCU1_START_NEG_3) { 436f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* two bytes */ 437f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=((int32_t)b-BOCU1_START_NEG_2)*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_1; 438f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=1; 439f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(b>BOCU1_MIN) { 440f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* three bytes */ 441f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=((int32_t)b-BOCU1_START_NEG_3)*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_2; 442f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=2; 443f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 444f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* four bytes */ 445f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=-BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT+BOCU1_REACH_NEG_3; 446f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=3; 447f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 448f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 449f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 450f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* set the state for decoding the trail byte(s) */ 451f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->diff=c; 452f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->count=count; 453f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -1; 454f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 455f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 456f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 457f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Function for BOCU-1 decoder; handles multi-byte trail bytes. 458f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 459f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure 460f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b trail byte 461f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return result value, same as decodeBocu1 462f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 463f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see decodeBocu1 464f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 465f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 466f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1TrailByte(Bocu1Rx *pRx, uint8_t b) { 467f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t t, c, count; 468f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 469f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b<=0x20) { 470f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* skip some C0 controls and make the trail byte range contiguous */ 471f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) t=bocu1ByteToTrail[b]; 472f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(t<0) { 473f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* illegal trail byte value */ 474f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=BOCU1_ASCII_PREV; 475f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->count=0; 476f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -99; 477f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 478f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#if BOCU1_MAX_TRAIL<0xff 479f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(b>BOCU1_MAX_TRAIL) { 480f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -99; 481f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#endif 482f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 483f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) t=(int32_t)b-BOCU1_TRAIL_BYTE_OFFSET; 484f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 485f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 486f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* add trail byte into difference and decrement count */ 487f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=pRx->diff; 488f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=pRx->count; 489f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 490f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(count==1) { 491f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* final trail byte, deliver a code point */ 492f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=pRx->prev+c+t; 493f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(0<=c && c<=0x10ffff) { 494f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* valid code point result */ 495f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=bocu1Prev(c); 496f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->count=0; 497f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return c; 498f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 499f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* illegal code point result */ 500f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=BOCU1_ASCII_PREV; 501f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->count=0; 502f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -99; 503f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 504f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 505f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 506f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* intermediate trail byte */ 507f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(count==2) { 508f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->diff=c+t*BOCU1_TRAIL_COUNT; 509f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else /* count==3 */ { 510f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->diff=c+t*BOCU1_TRAIL_COUNT*BOCU1_TRAIL_COUNT; 511f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 512f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->count=count-1; 513f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -1; 514f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 515f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 516f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 517f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * BOCU-1 decoder function. 518f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 519f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param pRx pointer to the decoder state structure; 520f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the initial values should be 0 which 521f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * decodeBocu1 will set to actual initial state values 522f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param b an input byte 523f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return 524f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 0..0x10ffff for a result code point 525f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * -1 if only the state changed without code point output 526f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * <-1 if an error occurs 527f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 528f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC int32_t 529f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)decodeBocu1(Bocu1Rx *pRx, uint8_t b) { 530f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t prev, c, count; 531f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 532f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(pRx==NULL) { 533f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* illegal argument */ 534f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -99; 535f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 536f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 537f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) prev=pRx->prev; 538f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(prev==0) { 539f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* lenient handling of initial 0 values */ 540f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) prev=pRx->prev=BOCU1_ASCII_PREV; 541f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=pRx->count=0; 542f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 543f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=pRx->count; 544f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 545f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 546f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(count==0) { 547f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* byte in lead position */ 548f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b<=0x20) { 549f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 550f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Direct-encoded C0 control code or space. 551f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Reset prev for C0 control codes but not for space. 552f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 553f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b!=0x20) { 554f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=BOCU1_ASCII_PREV; 555f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 556f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return b; 557f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 558f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 559f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 560f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * b is a difference lead byte. 561f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 562f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Return a code point directly from a single-byte difference. 563f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 564f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For multi-byte difference lead bytes, set the decoder state 565f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * with the partial difference value from the lead byte and 566f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * with the number of trail bytes. 567f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 568f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * For four-byte differences, the signedness also affects the 569f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * first trail byte, which has special handling farther below. 570f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 571f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(b>=BOCU1_START_NEG_2 && b<BOCU1_START_POS_2) { 572f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* single-byte difference */ 573f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=prev+((int32_t)b-BOCU1_MIDDLE); 574f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=bocu1Prev(c); 575f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return c; 576f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else if(b==BOCU1_RESET) { 577f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* only reset the state, no code point */ 578f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) pRx->prev=BOCU1_ASCII_PREV; 579f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -1; 580f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 581f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return decodeBocu1LeadByte(pRx, b); 582f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 583f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 584f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* trail byte in any position */ 585f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return decodeBocu1TrailByte(pRx, b); 586f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 587f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 588f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 589f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* icuhtml/design/conversion/bocu1/bocu1tst.c ------------------------------- */ 590f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 591f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test code ---------------------------------------------------------------- */ 592f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 593f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test code options */ 594f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 595f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* ignore comma when processing name lists in testText() */ 596f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)#define TEST_IGNORE_COMMA 1 597f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 598f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 599f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Write a packed BOCU-1 byte sequence into a byte array, 600f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * without overflow check. 601f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function. 602f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 603f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param packed packed BOCU-1 byte sequence, see packDiff() 604f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to byte array 605f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of bytes 606f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 607f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff 608f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 609f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 610f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writePacked(int32_t packed, uint8_t *p) { 611f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t count=BOCU1_LENGTH_FROM_PACKED(packed); 612f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) switch(count) { 613f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 4: 614f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *p++=(uint8_t)(packed>>24); 615f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 3: 616f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *p++=(uint8_t)(packed>>16); 617f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 2: 618f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *p++=(uint8_t)(packed>>8); 619f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 1: 620f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *p++=(uint8_t)packed; 621f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) default: 622f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) break; 623f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 624f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 625f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return count; 626f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 627f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 628f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 629f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Unpack a packed BOCU-1 non-C0/space byte sequence and get 630f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the difference to initialPrev. 631f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Used only for round-trip testing of the difference encoding and decoding. 632f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function. 633f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 634f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param initialPrev bogus "previous code point" value to make sure that 635f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the resulting code point is in the range 0..0x10ffff 636f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param packed packed BOCU-1 byte sequence 637f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return the difference to initialPrev 638f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 639f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see packDiff 640f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see writeDiff 641f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 642f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 643f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)unpackDiff(int32_t initialPrev, int32_t packed) { 644f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Bocu1Rx rx={ 0, 0, 0 }; 645f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t count; 646f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 647f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) rx.prev=initialPrev; 648f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) count=BOCU1_LENGTH_FROM_PACKED(packed); 649f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) switch(count) { 650f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 4: 651f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) decodeBocu1(&rx, (uint8_t)(packed>>24)); 652f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 3: 653f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) decodeBocu1(&rx, (uint8_t)(packed>>16)); 654f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 2: 655f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) decodeBocu1(&rx, (uint8_t)(packed>>8)); 656f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) case 1: 657f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* subtract initial prev */ 658f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return decodeBocu1(&rx, (uint8_t)packed)-initialPrev; 659f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) default: 660f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -0x7fffffff; 661f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 662f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 663f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 664f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 665f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode one difference value -0x10ffff..+0x10ffff in 1..4 bytes, 666f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * preserving lexical order. 667f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Also checks for roundtripping of the difference encoding. 668f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Test function. 669f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 670f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param diff difference value to test, -0x10ffff..0x10ffff 671f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to output byte array 672f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return p advanced by number of bytes output 673f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 674f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @see unpackDiff 675f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 676f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static uint8_t * 677f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writeDiff(int32_t diff, uint8_t *p) { 678f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* generate the difference as a packed value and serialize it */ 679f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t packed, initialPrev; 680f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 681f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) packed=packDiff(diff); 682f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 683f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 684f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * bogus initial "prev" to work around 685f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * code point range check in decodeBocu1() 686f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 687f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(diff<=0) { 688f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) initialPrev=0x10ffff; 689f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 690f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) initialPrev=-1; 691f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 692f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 693f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(diff!=unpackDiff(initialPrev, packed)) { 694f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("error: unpackDiff(packDiff(diff=%ld)=0x%08lx)=%ld!=diff\n", 695f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) diff, packed, unpackDiff(initialPrev, packed)); 696f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 697f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return p+writePacked(packed, p); 698f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 699f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 700f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 701f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Encode a UTF-16 string in BOCU-1. 702f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Does not check for overflows, but otherwise useful function. 703f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 704f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s input UTF-16 string 705f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length number of UChar code units in s 706f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to output byte array 707f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of bytes output 708f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 709f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 710f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)writeString(const UChar *s, int32_t length, uint8_t *p) { 711f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) uint8_t *p0; 712f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t c, prev, i; 713f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 714f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) prev=0; 715f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) p0=p; 716f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=0; 717f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) while(i<length) { 718f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UTF_NEXT_CHAR(s, i, length, c); 719f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) p+=writePacked(encodeBocu1(&prev, c), p); 720f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 721f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return (int32_t)(p-p0); 722f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 723f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 724f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 725f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Decode a BOCU-1 byte sequence to a UTF-16 string. 726f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Does not check for overflows, but otherwise useful function. 727f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 728f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param p pointer to input BOCU-1 bytes 729f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param length number of input bytes 730f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param s point to output UTF-16 string array 731f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @return number of UChar code units output 732f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 733f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static int32_t 734f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)readString(const uint8_t *p, int32_t length, UChar *s) { 735f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) Bocu1Rx rx={ 0, 0, 0 }; 736f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t c, i, sLength; 737f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 738f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=sLength=0; 739f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) while(i<length) { 740f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) c=decodeBocu1(&rx, p[i++]); 741f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(c<-1) { 742f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("error: readString detects encoding error at string index %ld\n", i); 743f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return -1; 744f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 745f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(c>=0) { 746f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UTF_APPEND_CHAR_UNSAFE(s, sLength, c); 747f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 748f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 749f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return sLength; 750f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 751f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 752f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static U_INLINE char 753f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)hexDigit(uint8_t digit) { 754f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return digit<=9 ? (char)('0'+digit) : (char)('a'-10+digit); 755f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 756f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 757f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 758f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Pretty-print 0-terminated byte values. 759f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Helper function for test output. 760f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * 761f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * @param bytes 0-terminated byte array to print 762f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 763f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void 764f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)printBytes(uint8_t *bytes, char *out) { 765f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int i; 766f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) uint8_t b; 767f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 768f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=0; 769f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) while((b=*bytes++)!=0) { 770f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *out++=' '; 771f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *out++=hexDigit((uint8_t)(b>>4)); 772f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *out++=hexDigit((uint8_t)(b&0xf)); 773f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ++i; 774f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 775f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=3*(5-i); 776f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) while(i>0) { 777f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *out++=' '; 778f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) --i; 779f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 780f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *out=0; 781f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 782f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 783f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/** 784f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Basic BOCU-1 test function, called when there are no command line arguments. 785f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Prints some of the #define values and performs round-trip tests of the 786f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * difference encoding and decoding. 787f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 788f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void 789f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)TestBOCU1RefDiff(void) { 790f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) char buf1[80], buf2[80]; 791f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) uint8_t prev[5], level[5]; 792f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t i, cmp, countErrors; 793f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 794f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("reach of single bytes: %ld\n", 1+BOCU1_REACH_POS_1-BOCU1_REACH_NEG_1); 795f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("reach of 2 bytes : %ld\n", 1+BOCU1_REACH_POS_2-BOCU1_REACH_NEG_2); 796f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("reach of 3 bytes : %ld\n\n", 1+BOCU1_REACH_POS_3-BOCU1_REACH_NEG_3); 797f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 798f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_REACH_NEG_1 %8ld BOCU1_REACH_POS_1 %8ld\n", BOCU1_REACH_NEG_1, BOCU1_REACH_POS_1); 799f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_REACH_NEG_2 %8ld BOCU1_REACH_POS_2 %8ld\n", BOCU1_REACH_NEG_2, BOCU1_REACH_POS_2); 800f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_REACH_NEG_3 %8ld BOCU1_REACH_POS_3 %8ld\n\n", BOCU1_REACH_NEG_3, BOCU1_REACH_POS_3); 801f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 802f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_MIDDLE 0x%02x\n", BOCU1_MIDDLE); 803f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_START_NEG_2 0x%02x BOCU1_START_POS_2 0x%02x\n", BOCU1_START_NEG_2, BOCU1_START_POS_2); 804f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" BOCU1_START_NEG_3 0x%02x BOCU1_START_POS_3 0x%02x\n\n", BOCU1_START_NEG_3, BOCU1_START_POS_3); 805f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 806f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* test packDiff() & unpackDiff() with some specific values */ 807f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(0, level); 808f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(1, level); 809f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(65, level); 810f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(130, level); 811f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(30000, level); 812f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(1000000, level); 813f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(-65, level); 814f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(-130, level); 815f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(-30000, level); 816f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writeDiff(-1000000, level); 817f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 818f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* test that each value is smaller than any following one */ 819f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) countErrors=0; 820f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=-0x10ffff; 821f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *writeDiff(i, prev)=0; 822f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 823f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* show first number and bytes */ 824f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(prev, buf1); 825f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" wD(%8ld) %s\n", i, buf1); 826f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 827f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) for(++i; i<=0x10ffff; ++i) { 828f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) *writeDiff(i, level)=0; 829f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) cmp=strcmp((const char *)prev, (const char *)level); 830f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(BOCU1_LENGTH_FROM_LEAD(level[0])!=(int32_t)strlen((const char *)level)) { 831f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("BOCU1_LENGTH_FROM_LEAD(0x%02x)=%ld!=%ld=strlen(writeDiff(%ld))\n", 832f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) level[0], BOCU1_LENGTH_FROM_LEAD(level[0]), strlen((const char *)level), i); 833f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 834f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(cmp<0) { 835f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(i==0 || i==1 || strlen((const char *)prev)!=strlen((const char *)level)) { 836f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* 837f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * if the result is good, then print only if the length changed 838f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * to get little but interesting output 839f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 840f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(prev, buf1); 841f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(level, buf2); 842f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("ok: strcmp(wD(%8ld), wD(%8ld))=%2d %s%s\n", i-1, i, cmp, buf1, buf2); 843f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 844f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 845f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ++countErrors; 846f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(prev, buf1); 847f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(level, buf2); 848f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("wrong: strcmp(wD(%8ld), wD(%8ld))=%2d %s%s\n", i-1, i, cmp, buf1, buf2); 849f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 850f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* remember the previous bytes */ 851f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) memcpy(prev, level, 4); 852f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 853f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 854f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* show last number and bytes */ 855f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes((uint8_t *)"", buf1); 856f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) printBytes(prev, buf2); 857f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose(" wD(%8ld) %s%s\n", i-1, buf1, buf2); 858f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 859f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(countErrors==0) { 860f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("writeDiff(-0x10ffff..0x10ffff) works fine\n"); 861f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } else { 862f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("writeDiff(-0x10ffff..0x10ffff) violates lexical ordering in %d cases\n", countErrors); 863f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 864f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 865f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* output signature byte sequence */ 866f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i=0; 867f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) writePacked(encodeBocu1(&i, 0xfeff), level); 868f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_verbose("\nBOCU-1 signature byte sequence: %02x %02x %02x\n", 869f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) level[0], level[1], level[2]); 870f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 871f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 872f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* cintltst code ------------------------------------------------------------ */ 873f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 874f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const int32_t DEFAULT_BUFFER_SIZE = 30000; 875f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 876f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 877f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* test one string with the ICU and the reference BOCU-1 implementations */ 878f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void 879f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)roundtripBOCU1(UConverter *bocu1, int32_t number, const UChar *text, int32_t length) { 880f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar *roundtripRef, *roundtripICU; 881f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) char *bocu1Ref, *bocu1ICU; 882f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 883f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t bocu1RefLength, bocu1ICULength, roundtripRefLength, roundtripICULength; 884f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode errorCode; 885f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 886f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripRef = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar)); 887f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripICU = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar)); 888f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) bocu1Ref = malloc(DEFAULT_BUFFER_SIZE); 889f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) bocu1ICU = malloc(DEFAULT_BUFFER_SIZE); 890f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 891f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* Unicode -> BOCU-1 */ 892f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) bocu1RefLength=writeString(text, length, (uint8_t *)bocu1Ref); 893f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 894f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) errorCode=U_ZERO_ERROR; 895f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) bocu1ICULength=ucnv_fromUChars(bocu1, bocu1ICU, DEFAULT_BUFFER_SIZE, text, length, &errorCode); 896f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(U_FAILURE(errorCode)) { 897f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("ucnv_fromUChars(BOCU-1, text(%d)[%d]) failed: %s\n", number, length, u_errorName(errorCode)); 898f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 899f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 900f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 901f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(bocu1RefLength!=bocu1ICULength || 0!=uprv_memcmp(bocu1Ref, bocu1ICU, bocu1RefLength)) { 902f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("Unicode(%d)[%d] -> BOCU-1: reference[%d]!=ICU[%d]\n", number, length, bocu1RefLength, bocu1ICULength); 903f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 904f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 905f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 906f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* BOCU-1 -> Unicode */ 907f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripRefLength=readString((uint8_t *)bocu1Ref, bocu1RefLength, roundtripRef); 908f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(roundtripRefLength<0) { 909f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(roundtripICU); 910f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; /* readString() found an error and reported it */ 911f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 912f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 913f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripICULength=ucnv_toUChars(bocu1, roundtripICU, DEFAULT_BUFFER_SIZE, bocu1ICU, bocu1ICULength, &errorCode); 914f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(U_FAILURE(errorCode)) { 915f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("ucnv_toUChars(BOCU-1, text(%d)[%d]) failed: %s\n", number, length, u_errorName(errorCode)); 916f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 917f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 918f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 919f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(length!=roundtripRefLength || 0!=u_memcmp(text, roundtripRef, length)) { 920f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("BOCU-1 -> Unicode: original(%d)[%d]!=reference[%d]\n", number, length, roundtripRefLength); 921f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 922f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 923f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(roundtripRefLength!=roundtripICULength || 0!=u_memcmp(roundtripRef, roundtripICU, roundtripRefLength)) { 924f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("BOCU-1 -> Unicode: reference(%d)[%d]!=ICU[%d]\n", number, roundtripRefLength, roundtripICULength); 925f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 926f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 927f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(roundtripRef); 928f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(roundtripICU); 929f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(bocu1Ref); 930f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(bocu1ICU); 931f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 932f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 933f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar feff[]={ 0xfeff }; 934f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar ascii[]={ 0x61, 0x62, 0x20, 0x63, 0x61 }; 935f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar crlf[]={ 0xd, 0xa, 0x20 }; 936f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar nul[]={ 0 }; 937f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar latin[]={ 0xdf, 0xe6 }; 938f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar devanagari[]={ 0x930, 0x20, 0x918, 0x909 }; 939f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar hiragana[]={ 0x3086, 0x304d, 0x20, 0x3053, 0x4000 }; 940f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar unihan[]={ 0x4e00, 0x7777, 0x20, 0x9fa5, 0x4e00 }; 941f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar hangul[]={ 0xac00, 0xbcde, 0x20, 0xd7a3 }; 942f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar surrogates[]={ 0xdc00, 0xd800 }; /* single surrogates, unmatched! */ 943f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane1[]={ 0xd800, 0xdc00 }; 944f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane2[]={ 0xd845, 0xdddd }; 945f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane15[]={ 0xdbbb, 0xddee, 0x20 }; 946f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar plane16[]={ 0xdbff, 0xdfff }; 947f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const UChar c0[]={ 1, 0xe40, 0x20, 9 }; 948f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 949f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static const struct { 950f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) const UChar *s; 951f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t length; 952f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} strings[]={ 953f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { feff, LENGTHOF(feff) }, 954f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { ascii, LENGTHOF(ascii) }, 955f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { crlf, LENGTHOF(crlf) }, 956f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { nul, LENGTHOF(nul) }, 957f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { latin, LENGTHOF(latin) }, 958f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { devanagari, LENGTHOF(devanagari) }, 959f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { hiragana, LENGTHOF(hiragana) }, 960f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { unihan, LENGTHOF(unihan) }, 961f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { hangul, LENGTHOF(hangul) }, 962f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { surrogates, LENGTHOF(surrogates) }, 963f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { plane1, LENGTHOF(plane1) }, 964f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { plane2, LENGTHOF(plane2) }, 965f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { plane15, LENGTHOF(plane15) }, 966f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { plane16, LENGTHOF(plane16) }, 967f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) { c0, LENGTHOF(c0) } 968f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)}; 969f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 970f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)/* 971f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Verify that the ICU BOCU-1 implementation produces the same results as 972f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * the reference implementation from the design folder. 973f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * Generate some texts and convert them with both converters, verifying 974f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) * identical results and roundtripping. 975f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) */ 976f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)static void 977f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)TestBOCU1(void) { 978f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UChar *text; 979f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) int32_t i, length; 980f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 981f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UConverter *bocu1; 982f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) UErrorCode errorCode; 983f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 984f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) errorCode=U_ZERO_ERROR; 985f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) bocu1=ucnv_open("BOCU-1", &errorCode); 986f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(U_FAILURE(errorCode)) { 987f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) log_err("error: unable to open BOCU-1 converter: %s\n", u_errorName(errorCode)); 988f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) return; 989f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 990f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 991f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) text = malloc(DEFAULT_BUFFER_SIZE * sizeof(UChar)); 992f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 993f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* text 1: each of strings[] once */ 994f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length=0; 995f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) for(i=0; i<LENGTHOF(strings); ++i) { 996f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) u_memcpy(text+length, strings[i].s, strings[i].length); 997f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length+=strings[i].length; 998f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 999f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripBOCU1(bocu1, 1, text, length); 1000f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 1001f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* text 2: each of strings[] twice */ 1002f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length=0; 1003f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) for(i=0; i<LENGTHOF(strings); ++i) { 1004f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) u_memcpy(text+length, strings[i].s, strings[i].length); 1005f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length+=strings[i].length; 1006f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) u_memcpy(text+length, strings[i].s, strings[i].length); 1007f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length+=strings[i].length; 1008f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 1009f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripBOCU1(bocu1, 2, text, length); 1010f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 1011f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) /* text 3: each of strings[] many times (set step vs. |strings| so that all strings are used) */ 1012f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length=0; 1013f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) for(i=1; length<5000; i+=7) { 1014f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) if(i>=LENGTHOF(strings)) { 1015f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) i-=LENGTHOF(strings); 1016f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 1017f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) u_memcpy(text+length, strings[i].s, strings[i].length); 1018f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) length+=strings[i].length; 1019f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) } 1020f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) roundtripBOCU1(bocu1, 3, text, length); 1021f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 1022f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) ucnv_close(bocu1); 1023f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) free(text); 1024f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 1025f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 1026f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC void addBOCU1Tests(TestNode** root); 1027f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) 1028f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)U_CFUNC void 1029f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)addBOCU1Tests(TestNode** root) { 1030f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) addTest(root, TestBOCU1RefDiff, "tsconv/bocu1tst/TestBOCU1RefDiff"); 1031f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles) addTest(root, TestBOCU1, "tsconv/bocu1tst/TestBOCU1"); 1032f4ed1cf5d184064c4cf0e4359c6d5d8aadb50afaTorne (Richard Coles)} 1033