1/* GENERATED SOURCE. DO NOT MODIFY. */ 2// © 2016 and later: Unicode, Inc. and others. 3// License & terms of use: http://www.unicode.org/copyright.html#License 4/* 5******************************************************************************* 6* Copyright (C) 2001-2016, International Business Machines 7* Corporation and others. All Rights Reserved. 8******************************************************************************* 9*/ 10 11/* FOOD FOR THOUGHT: currently the reordering modes are a mixture of 12 * algorithm for direct BiDi, algorithm for inverse Bidi and the bizarre 13 * concept of RUNS_ONLY which is a double operation. 14 * It could be advantageous to divide this into 3 concepts: 15 * a) Operation: direct / inverse / RUNS_ONLY 16 * b) Direct algorithm: default / NUMBERS_SPECIAL / GROUP_NUMBERS_WITH_L 17 * c) Inverse algorithm: default / INVERSE_LIKE_DIRECT / NUMBERS_SPECIAL 18 * This would allow combinations not possible today like RUNS_ONLY with 19 * NUMBERS_SPECIAL. 20 * Also allow to set INSERT_MARKS for the direct step of RUNS_ONLY and 21 * REMOVE_CONTROLS for the inverse step. 22 * Not all combinations would be supported, and probably not all do make sense. 23 * This would need to document which ones are supported and what are the 24 * fallbacks for unsupported combinations. 25 */ 26 27//TODO: make sample program do something simple but real and complete 28 29package android.icu.text; 30 31import java.awt.font.NumericShaper; 32import java.awt.font.TextAttribute; 33import java.lang.reflect.Array; 34import java.text.AttributedCharacterIterator; 35import java.util.Arrays; 36 37import android.icu.impl.UBiDiProps; 38import android.icu.lang.UCharacter; 39import android.icu.lang.UCharacterDirection; 40import android.icu.lang.UProperty; 41 42/** 43 * 44 * <h2>Bidi algorithm for ICU</h2> 45 * 46 * This is an implementation of the Unicode Bidirectional Algorithm. The 47 * algorithm is defined in the <a 48 * href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>. 49 * <p> 50 * 51 * Note: Libraries that perform a bidirectional algorithm and reorder strings 52 * accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and 53 * shaping (ArabicShaping) classes can be used at the core of such "Storage 54 * Layout Engines". 55 * 56 * <h3>General remarks about the API:</h3> 57 * 58 * The "limit" of a sequence of characters is the position just after 59 * their last character, i.e., one more than that position. 60 * <p> 61 * 62 * Some of the API methods provide access to "runs". Such a 63 * "run" is defined as a sequence of characters that are at the same 64 * embedding level after performing the Bidi algorithm. 65 * 66 * <h3>Basic concept: paragraph</h3> 67 * A piece of text can be divided into several paragraphs by characters 68 * with the Bidi class <code>Block Separator</code>. For handling of 69 * paragraphs, see: 70 * <ul> 71 * <li>{@link #countParagraphs} 72 * <li>{@link #getParaLevel} 73 * <li>{@link #getParagraph} 74 * <li>{@link #getParagraphByIndex} 75 * </ul> 76 * 77 * <h3>Basic concept: text direction</h3> 78 * The direction of a piece of text may be: 79 * <ul> 80 * <li>{@link #LTR} 81 * <li>{@link #RTL} 82 * <li>{@link #MIXED} 83 * <li>{@link #NEUTRAL} 84 * </ul> 85 * 86 * <h3>Basic concept: levels</h3> 87 * 88 * Levels in this API represent embedding levels according to the Unicode 89 * Bidirectional Algorithm. 90 * Their low-order bit (even/odd value) indicates the visual direction.<p> 91 * 92 * Levels can be abstract values when used for the 93 * <code>paraLevel</code> and <code>embeddingLevels</code> 94 * arguments of <code>setPara()</code>; there: 95 * <ul> 96 * <li>the high-order bit of an <code>embeddingLevels[]</code> 97 * value indicates whether the using application is 98 * specifying the level of a character to <i>override</i> whatever the 99 * Bidi implementation would resolve it to.</li> 100 * <li><code>paraLevel</code> can be set to the 101 * pseudo-level values <code>LEVEL_DEFAULT_LTR</code> 102 * and <code>LEVEL_DEFAULT_RTL</code>.</li> 103 * </ul> 104 * 105 * <p>The related constants are not real, valid level values. 106 * <code>DEFAULT_XXX</code> can be used to specify 107 * a default for the paragraph level for 108 * when the <code>setPara()</code> method 109 * shall determine it but there is no 110 * strongly typed character in the input.<p> 111 * 112 * Note that the value for <code>LEVEL_DEFAULT_LTR</code> is even 113 * and the one for <code>LEVEL_DEFAULT_RTL</code> is odd, 114 * just like with normal LTR and RTL level values - 115 * these special values are designed that way. Also, the implementation 116 * assumes that MAX_EXPLICIT_LEVEL is odd. 117 * 118 * <b>See Also:</b> 119 * <ul> 120 * <li>{@link #LEVEL_DEFAULT_LTR} 121 * <li>{@link #LEVEL_DEFAULT_RTL} 122 * <li>{@link #LEVEL_OVERRIDE} 123 * <li>{@link #MAX_EXPLICIT_LEVEL} 124 * <li>{@link #setPara} 125 * </ul> 126 * 127 * <h3>Basic concept: Reordering Mode</h3> 128 * Reordering mode values indicate which variant of the Bidi algorithm to 129 * use. 130 * 131 * <b>See Also:</b> 132 * <ul> 133 * <li>{@link #setReorderingMode} 134 * <li>{@link #REORDER_DEFAULT} 135 * <li>{@link #REORDER_NUMBERS_SPECIAL} 136 * <li>{@link #REORDER_GROUP_NUMBERS_WITH_R} 137 * <li>{@link #REORDER_RUNS_ONLY} 138 * <li>{@link #REORDER_INVERSE_NUMBERS_AS_L} 139 * <li>{@link #REORDER_INVERSE_LIKE_DIRECT} 140 * <li>{@link #REORDER_INVERSE_FOR_NUMBERS_SPECIAL} 141 * </ul> 142 * 143 * <h3>Basic concept: Reordering Options</h3> 144 * Reordering options can be applied during Bidi text transformations. 145 * 146 * <b>See Also:</b> 147 * <ul> 148 * <li>{@link #setReorderingOptions} 149 * <li>{@link #OPTION_DEFAULT} 150 * <li>{@link #OPTION_INSERT_MARKS} 151 * <li>{@link #OPTION_REMOVE_CONTROLS} 152 * <li>{@link #OPTION_STREAMING} 153 * </ul> 154 * 155 * <h4> Sample code for the ICU Bidi API </h4> 156 * 157 * <h5>Rendering a paragraph with the ICU Bidi API</h5> 158 * 159 * This is (hypothetical) sample code that illustrates how the ICU Bidi API 160 * could be used to render a paragraph of text. Rendering code depends highly on 161 * the graphics system, therefore this sample code must make a lot of 162 * assumptions, which may or may not match any existing graphics system's 163 * properties. 164 * 165 * <p> 166 * The basic assumptions are: 167 * 168 * <ul> 169 * <li>Rendering is done from left to right on a horizontal line.</li> 170 * <li>A run of single-style, unidirectional text can be rendered at once. 171 * </li> 172 * <li>Such a run of text is passed to the graphics system with characters 173 * (code units) in logical order.</li> 174 * <li>The line-breaking algorithm is very complicated and Locale-dependent - 175 * and therefore its implementation omitted from this sample code.</li> 176 * </ul> 177 * 178 * <pre> 179 * 180 * package android.icu.dev.test.bidi; 181 * 182 * import android.icu.text.Bidi; 183 * import android.icu.text.BidiRun; 184 * 185 * public class Sample { 186 * 187 * static final int styleNormal = 0; 188 * static final int styleSelected = 1; 189 * static final int styleBold = 2; 190 * static final int styleItalics = 4; 191 * static final int styleSuper=8; 192 * static final int styleSub = 16; 193 * 194 * static class StyleRun { 195 * int limit; 196 * int style; 197 * 198 * public StyleRun(int limit, int style) { 199 * this.limit = limit; 200 * this.style = style; 201 * } 202 * } 203 * 204 * static class Bounds { 205 * int start; 206 * int limit; 207 * 208 * public Bounds(int start, int limit) { 209 * this.start = start; 210 * this.limit = limit; 211 * } 212 * } 213 * 214 * static int getTextWidth(String text, int start, int limit, 215 * StyleRun[] styleRuns, int styleRunCount) { 216 * // simplistic way to compute the width 217 * return limit - start; 218 * } 219 * 220 * // set limit and StyleRun limit for a line 221 * // from text[start] and from styleRuns[styleRunStart] 222 * // using Bidi.getLogicalRun(...) 223 * // returns line width 224 * static int getLineBreak(String text, Bounds line, Bidi para, 225 * StyleRun styleRuns[], Bounds styleRun) { 226 * // dummy return 227 * return 0; 228 * } 229 * 230 * // render runs on a line sequentially, always from left to right 231 * 232 * // prepare rendering a new line 233 * static void startLine(byte textDirection, int lineWidth) { 234 * System.out.println(); 235 * } 236 * 237 * // render a run of text and advance to the right by the run width 238 * // the text[start..limit-1] is always in logical order 239 * static void renderRun(String text, int start, int limit, 240 * byte textDirection, int style) { 241 * } 242 * 243 * // We could compute a cross-product 244 * // from the style runs with the directional runs 245 * // and then reorder it. 246 * // Instead, here we iterate over each run type 247 * // and render the intersections - 248 * // with shortcuts in simple (and common) cases. 249 * // renderParagraph() is the main function. 250 * 251 * // render a directional run with 252 * // (possibly) multiple style runs intersecting with it 253 * static void renderDirectionalRun(String text, int start, int limit, 254 * byte direction, StyleRun styleRuns[], 255 * int styleRunCount) { 256 * int i; 257 * 258 * // iterate over style runs 259 * if (direction == Bidi.LTR) { 260 * int styleLimit; 261 * for (i = 0; i < styleRunCount; ++i) { 262 * styleLimit = styleRuns[i].limit; 263 * if (start < styleLimit) { 264 * if (styleLimit > limit) { 265 * styleLimit = limit; 266 * } 267 * renderRun(text, start, styleLimit, 268 * direction, styleRuns[i].style); 269 * if (styleLimit == limit) { 270 * break; 271 * } 272 * start = styleLimit; 273 * } 274 * } 275 * } else { 276 * int styleStart; 277 * 278 * for (i = styleRunCount-1; i >= 0; --i) { 279 * if (i > 0) { 280 * styleStart = styleRuns[i-1].limit; 281 * } else { 282 * styleStart = 0; 283 * } 284 * if (limit >= styleStart) { 285 * if (styleStart < start) { 286 * styleStart = start; 287 * } 288 * renderRun(text, styleStart, limit, direction, 289 * styleRuns[i].style); 290 * if (styleStart == start) { 291 * break; 292 * } 293 * limit = styleStart; 294 * } 295 * } 296 * } 297 * } 298 * 299 * // the line object represents text[start..limit-1] 300 * static void renderLine(Bidi line, String text, int start, int limit, 301 * StyleRun styleRuns[], int styleRunCount) { 302 * byte direction = line.getDirection(); 303 * if (direction != Bidi.MIXED) { 304 * // unidirectional 305 * if (styleRunCount <= 1) { 306 * renderRun(text, start, limit, direction, styleRuns[0].style); 307 * } else { 308 * renderDirectionalRun(text, start, limit, direction, 309 * styleRuns, styleRunCount); 310 * } 311 * } else { 312 * // mixed-directional 313 * int count, i; 314 * BidiRun run; 315 * 316 * try { 317 * count = line.countRuns(); 318 * } catch (IllegalStateException e) { 319 * e.printStackTrace(); 320 * return; 321 * } 322 * if (styleRunCount <= 1) { 323 * int style = styleRuns[0].style; 324 * 325 * // iterate over directional runs 326 * for (i = 0; i < count; ++i) { 327 * run = line.getVisualRun(i); 328 * renderRun(text, run.getStart(), run.getLimit(), 329 * run.getDirection(), style); 330 * } 331 * } else { 332 * // iterate over both directional and style runs 333 * for (i = 0; i < count; ++i) { 334 * run = line.getVisualRun(i); 335 * renderDirectionalRun(text, run.getStart(), 336 * run.getLimit(), run.getDirection(), 337 * styleRuns, styleRunCount); 338 * } 339 * } 340 * } 341 * } 342 * 343 * static void renderParagraph(String text, byte textDirection, 344 * StyleRun styleRuns[], int styleRunCount, 345 * int lineWidth) { 346 * int length = text.length(); 347 * Bidi para = new Bidi(); 348 * try { 349 * para.setPara(text, 350 * textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL 351 * : Bidi.LEVEL_DEFAULT_LTR, 352 * null); 353 * } catch (Exception e) { 354 * e.printStackTrace(); 355 * return; 356 * } 357 * byte paraLevel = (byte)(1 & para.getParaLevel()); 358 * StyleRun styleRun = new StyleRun(length, styleNormal); 359 * 360 * if (styleRuns == null || styleRunCount <= 0) { 361 * styleRuns = new StyleRun[1]; 362 * styleRunCount = 1; 363 * styleRuns[0] = styleRun; 364 * } 365 * // assume styleRuns[styleRunCount-1].limit>=length 366 * 367 * int width = getTextWidth(text, 0, length, styleRuns, styleRunCount); 368 * if (width <= lineWidth) { 369 * // everything fits onto one line 370 * 371 * // prepare rendering a new line from either left or right 372 * startLine(paraLevel, width); 373 * 374 * renderLine(para, text, 0, length, styleRuns, styleRunCount); 375 * } else { 376 * // we need to render several lines 377 * Bidi line = new Bidi(length, 0); 378 * int start = 0, limit; 379 * int styleRunStart = 0, styleRunLimit; 380 * 381 * for (;;) { 382 * limit = length; 383 * styleRunLimit = styleRunCount; 384 * width = getLineBreak(text, new Bounds(start, limit), 385 * para, styleRuns, 386 * new Bounds(styleRunStart, styleRunLimit)); 387 * try { 388 * line = para.setLine(start, limit); 389 * } catch (Exception e) { 390 * e.printStackTrace(); 391 * return; 392 * } 393 * // prepare rendering a new line 394 * // from either left or right 395 * startLine(paraLevel, width); 396 * 397 * if (styleRunStart > 0) { 398 * int newRunCount = styleRuns.length - styleRunStart; 399 * StyleRun[] newRuns = new StyleRun[newRunCount]; 400 * System.arraycopy(styleRuns, styleRunStart, newRuns, 0, 401 * newRunCount); 402 * renderLine(line, text, start, limit, newRuns, 403 * styleRunLimit - styleRunStart); 404 * } else { 405 * renderLine(line, text, start, limit, styleRuns, 406 * styleRunLimit - styleRunStart); 407 * } 408 * if (limit == length) { 409 * break; 410 * } 411 * start = limit; 412 * styleRunStart = styleRunLimit - 1; 413 * if (start >= styleRuns[styleRunStart].limit) { 414 * ++styleRunStart; 415 * } 416 * } 417 * } 418 * } 419 * 420 * public static void main(String[] args) 421 * { 422 * renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80); 423 * renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60); 424 * } 425 * } 426 * 427 * </pre> 428 * 429 * @hide Only a subset of ICU is exposed in Android 430 */ 431 432/* 433 * General implementation notes: 434 * 435 * Throughout the implementation, there are comments like (W2) that refer to 436 * rules of the BiDi algorithm, in this example to the second rule of the 437 * resolution of weak types. 438 * 439 * For handling surrogate pairs, where two UChar's form one "abstract" (or UTF-32) 440 * character according to UTF-16, the second UChar gets the directional property of 441 * the entire character assigned, while the first one gets a BN, a boundary 442 * neutral, type, which is ignored by most of the algorithm according to 443 * rule (X9) and the implementation suggestions of the BiDi algorithm. 444 * 445 * Later, adjustWSLevels() will set the level for each BN to that of the 446 * following character (UChar), which results in surrogate pairs getting the 447 * same level on each of their surrogates. 448 * 449 * In a UTF-8 implementation, the same thing could be done: the last byte of 450 * a multi-byte sequence would get the "real" property, while all previous 451 * bytes of that sequence would get BN. 452 * 453 * It is not possible to assign all those parts of a character the same real 454 * property because this would fail in the resolution of weak types with rules 455 * that look at immediately surrounding types. 456 * 457 * As a related topic, this implementation does not remove Boundary Neutral 458 * types from the input, but ignores them wherever this is relevant. 459 * For example, the loop for the resolution of the weak types reads 460 * types until it finds a non-BN. 461 * Also, explicit embedding codes are neither changed into BN nor removed. 462 * They are only treated the same way real BNs are. 463 * As stated before, adjustWSLevels() takes care of them at the end. 464 * For the purpose of conformance, the levels of all these codes 465 * do not matter. 466 * 467 * Note that this implementation modifies the dirProps 468 * after the initial setup, when applying X5c (replace FSI by LRI or RLI), 469 * X6, N0 (replace paired brackets by L or R). 470 * 471 * In this implementation, the resolution of weak types (W1 to W6), 472 * neutrals (N1 and N2), and the assignment of the resolved level (In) 473 * are all done in one single loop, in resolveImplicitLevels(). 474 * Changes of dirProp values are done on the fly, without writing 475 * them back to the dirProps array. 476 * 477 * 478 * This implementation contains code that allows to bypass steps of the 479 * algorithm that are not needed on the specific paragraph 480 * in order to speed up the most common cases considerably, 481 * like text that is entirely LTR, or RTL text without numbers. 482 * 483 * Most of this is done by setting a bit for each directional property 484 * in a flags variable and later checking for whether there are 485 * any LTR characters or any RTL characters, or both, whether 486 * there are any explicit embedding codes, etc. 487 * 488 * If the (Xn) steps are performed, then the flags are re-evaluated, 489 * because they will then not contain the embedding codes any more 490 * and will be adjusted for override codes, so that subsequently 491 * more bypassing may be possible than what the initial flags suggested. 492 * 493 * If the text is not mixed-directional, then the 494 * algorithm steps for the weak type resolution are not performed, 495 * and all levels are set to the paragraph level. 496 * 497 * If there are no explicit embedding codes, then the (Xn) steps 498 * are not performed. 499 * 500 * If embedding levels are supplied as a parameter, then all 501 * explicit embedding codes are ignored, and the (Xn) steps 502 * are not performed. 503 * 504 * White Space types could get the level of the run they belong to, 505 * and are checked with a test of (flags&MASK_EMBEDDING) to 506 * consider if the paragraph direction should be considered in 507 * the flags variable. 508 * 509 * If there are no White Space types in the paragraph, then 510 * (L1) is not necessary in adjustWSLevels(). 511 */ 512 513public class Bidi { 514 515 static class Point { 516 int pos; /* position in text */ 517 int flag; /* flag for LRM/RLM, before/after */ 518 } 519 520 static class InsertPoints { 521 int size; 522 int confirmed; 523 Point[] points = new Point[0]; 524 } 525 526 static class Opening { 527 int position; /* position of opening bracket */ 528 int match; /* matching char or -position of closing bracket */ 529 int contextPos; /* position of last strong char found before opening */ 530 short flags; /* bits for L or R/AL found within the pair */ 531 byte contextDir; /* L or R according to last strong char before opening */ 532 } 533 534 static class IsoRun { 535 int contextPos; /* position of char determining context */ 536 short start; /* index of first opening entry for this run */ 537 short limit; /* index after last opening entry for this run */ 538 byte level; /* level of this run */ 539 byte lastStrong; /* bidi class of last strong char found in this run */ 540 byte lastBase; /* bidi class of last base char found in this run */ 541 byte contextDir; /* L or R to use as context for following openings */ 542 } 543 544 static class BracketData { 545 Opening[] openings = new Opening[SIMPLE_OPENINGS_COUNT]; 546 int isoRunLast; /* index of last used entry */ 547 /* array of nested isolated sequence entries; can never excess UBIDI_MAX_EXPLICIT_LEVEL 548 + 1 for index 0, + 1 for before the first isolated sequence */ 549 IsoRun[] isoRuns = new IsoRun[MAX_EXPLICIT_LEVEL+2]; 550 boolean isNumbersSpecial; /*reordering mode for NUMBERS_SPECIAL */ 551 } 552 553 static class Isolate { 554 int startON; 555 int start1; 556 short stateImp; 557 short state; 558 } 559 560 /** Paragraph level setting<p> 561 * 562 * Constant indicating that the base direction depends on the first strong 563 * directional character in the text according to the Unicode Bidirectional 564 * Algorithm. If no strong directional character is present, 565 * then set the paragraph level to 0 (left-to-right).<p> 566 * 567 * If this value is used in conjunction with reordering modes 568 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or 569 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder 570 * is assumed to be visual LTR, and the text after reordering is required 571 * to be the corresponding logical string with appropriate contextual 572 * direction. The direction of the result string will be RTL if either 573 * the rightmost or leftmost strong character of the source text is RTL 574 * or Arabic Letter, the direction will be LTR otherwise.<p> 575 * 576 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may 577 * be added at the beginning of the result string to ensure round trip 578 * (that the result string, when reordered back to visual, will produce 579 * the original source text). 580 * @see #REORDER_INVERSE_LIKE_DIRECT 581 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL 582 */ 583 public static final byte LEVEL_DEFAULT_LTR = (byte)0x7e; 584 585 /** Paragraph level setting<p> 586 * 587 * Constant indicating that the base direction depends on the first strong 588 * directional character in the text according to the Unicode Bidirectional 589 * Algorithm. If no strong directional character is present, 590 * then set the paragraph level to 1 (right-to-left).<p> 591 * 592 * If this value is used in conjunction with reordering modes 593 * <code>REORDER_INVERSE_LIKE_DIRECT</code> or 594 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the text to reorder 595 * is assumed to be visual LTR, and the text after reordering is required 596 * to be the corresponding logical string with appropriate contextual 597 * direction. The direction of the result string will be RTL if either 598 * the rightmost or leftmost strong character of the source text is RTL 599 * or Arabic Letter, or if the text contains no strong character; 600 * the direction will be LTR otherwise.<p> 601 * 602 * If reordering option <code>OPTION_INSERT_MARKS</code> is set, an RLM may 603 * be added at the beginning of the result string to ensure round trip 604 * (that the result string, when reordered back to visual, will produce 605 * the original source text). 606 * @see #REORDER_INVERSE_LIKE_DIRECT 607 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL 608 */ 609 public static final byte LEVEL_DEFAULT_RTL = (byte)0x7f; 610 611 /** 612 * Maximum explicit embedding level. 613 * (The maximum resolved level can be up to <code>MAX_EXPLICIT_LEVEL+1</code>). 614 */ 615 public static final byte MAX_EXPLICIT_LEVEL = 125; 616 617 /** 618 * Bit flag for level input. 619 * Overrides directional properties. 620 */ 621 public static final byte LEVEL_OVERRIDE = (byte)0x80; 622 623 /** 624 * Special value which can be returned by the mapping methods when a 625 * logical index has no corresponding visual index or vice-versa. This may 626 * happen for the logical-to-visual mapping of a Bidi control when option 627 * <code>OPTION_REMOVE_CONTROLS</code> is 628 * specified. This can also happen for the visual-to-logical mapping of a 629 * Bidi mark (LRM or RLM) inserted by option 630 * <code>OPTION_INSERT_MARKS</code>. 631 * @see #getVisualIndex 632 * @see #getVisualMap 633 * @see #getLogicalIndex 634 * @see #getLogicalMap 635 * @see #OPTION_INSERT_MARKS 636 * @see #OPTION_REMOVE_CONTROLS 637 */ 638 public static final int MAP_NOWHERE = -1; 639 640 /** 641 * Left-to-right text. 642 * <ul> 643 * <li>As return value for <code>getDirection()</code>, it means 644 * that the source string contains no right-to-left characters, or 645 * that the source string is empty and the paragraph level is even. 646 * <li>As return value for <code>getBaseDirection()</code>, it 647 * means that the first strong character of the source string has 648 * a left-to-right direction. 649 * </ul> 650 */ 651 public static final byte LTR = 0; 652 653 /** 654 * Right-to-left text. 655 * <ul> 656 * <li>As return value for <code>getDirection()</code>, it means 657 * that the source string contains no left-to-right characters, or 658 * that the source string is empty and the paragraph level is odd. 659 * <li>As return value for <code>getBaseDirection()</code>, it 660 * means that the first strong character of the source string has 661 * a right-to-left direction. 662 * </ul> 663 */ 664 public static final byte RTL = 1; 665 666 /** 667 * Mixed-directional text. 668 * <p>As return value for <code>getDirection()</code>, it means 669 * that the source string contains both left-to-right and 670 * right-to-left characters. 671 */ 672 public static final byte MIXED = 2; 673 674 /** 675 * No strongly directional text. 676 * <p>As return value for <code>getBaseDirection()</code>, it means 677 * that the source string is missing or empty, or contains neither 678 * left-to-right nor right-to-left characters. 679 */ 680 public static final byte NEUTRAL = 3; 681 682 /** 683 * option bit for writeReordered(): 684 * keep combining characters after their base characters in RTL runs 685 * 686 * @see #writeReordered 687 */ 688 public static final short KEEP_BASE_COMBINING = 1; 689 690 /** 691 * option bit for writeReordered(): 692 * replace characters with the "mirrored" property in RTL runs 693 * by their mirror-image mappings 694 * 695 * @see #writeReordered 696 */ 697 public static final short DO_MIRRORING = 2; 698 699 /** 700 * option bit for writeReordered(): 701 * surround the run with LRMs if necessary; 702 * this is part of the approximate "inverse Bidi" algorithm 703 * 704 * <p>This option does not imply corresponding adjustment of the index 705 * mappings. 706 * 707 * @see #setInverse 708 * @see #writeReordered 709 */ 710 public static final short INSERT_LRM_FOR_NUMERIC = 4; 711 712 /** 713 * option bit for writeReordered(): 714 * remove Bidi control characters 715 * (this does not affect INSERT_LRM_FOR_NUMERIC) 716 * 717 * <p>This option does not imply corresponding adjustment of the index 718 * mappings. 719 * 720 * @see #writeReordered 721 * @see #INSERT_LRM_FOR_NUMERIC 722 */ 723 public static final short REMOVE_BIDI_CONTROLS = 8; 724 725 /** 726 * option bit for writeReordered(): 727 * write the output in reverse order 728 * 729 * <p>This has the same effect as calling <code>writeReordered()</code> 730 * first without this option, and then calling 731 * <code>writeReverse()</code> without mirroring. 732 * Doing this in the same step is faster and avoids a temporary buffer. 733 * An example for using this option is output to a character terminal that 734 * is designed for RTL scripts and stores text in reverse order. 735 * 736 * @see #writeReordered 737 */ 738 public static final short OUTPUT_REVERSE = 16; 739 740 /** Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode. 741 * @see #setReorderingMode 742 */ 743 public static final short REORDER_DEFAULT = 0; 744 745 /** Reordering mode: Logical to Visual algorithm which handles numbers in 746 * a way which mimicks the behavior of Windows XP. 747 * @see #setReorderingMode 748 */ 749 public static final short REORDER_NUMBERS_SPECIAL = 1; 750 751 /** Reordering mode: Logical to Visual algorithm grouping numbers with 752 * adjacent R characters (reversible algorithm). 753 * @see #setReorderingMode 754 */ 755 public static final short REORDER_GROUP_NUMBERS_WITH_R = 2; 756 757 /** Reordering mode: Reorder runs only to transform a Logical LTR string 758 * to the logical RTL string with the same display, or vice-versa.<br> 759 * If this mode is set together with option 760 * <code>OPTION_INSERT_MARKS</code>, some Bidi controls in the source 761 * text may be removed and other controls may be added to produce the 762 * minimum combination which has the required display. 763 * @see #OPTION_INSERT_MARKS 764 * @see #setReorderingMode 765 */ 766 public static final short REORDER_RUNS_ONLY = 3; 767 768 /** Reordering mode: Visual to Logical algorithm which handles numbers 769 * like L (same algorithm as selected by <code>setInverse(true)</code>. 770 * @see #setInverse 771 * @see #setReorderingMode 772 */ 773 public static final short REORDER_INVERSE_NUMBERS_AS_L = 4; 774 775 /** Reordering mode: Visual to Logical algorithm equivalent to the regular 776 * Logical to Visual algorithm. 777 * @see #setReorderingMode 778 */ 779 public static final short REORDER_INVERSE_LIKE_DIRECT = 5; 780 781 /** Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the 782 * <code>REORDER_NUMBERS_SPECIAL</code> Bidi algorithm. 783 * @see #setReorderingMode 784 */ 785 public static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL = 6; 786 787 /* Number of values for reordering mode. */ 788 static final short REORDER_COUNT = 7; 789 790 /* Reordering mode values must be ordered so that all the regular logical to 791 * visual modes come first, and all inverse Bidi modes come last. 792 */ 793 static final short REORDER_LAST_LOGICAL_TO_VISUAL = 794 REORDER_NUMBERS_SPECIAL; 795 796 /** 797 * Option value for <code>setReorderingOptions</code>: 798 * disable all the options which can be set with this method 799 * @see #setReorderingOptions 800 */ 801 public static final int OPTION_DEFAULT = 0; 802 803 /** 804 * Option bit for <code>setReorderingOptions</code>: 805 * insert Bidi marks (LRM or RLM) when needed to ensure correct result of 806 * a reordering to a Logical order 807 * 808 * <p>This option must be set or reset before calling 809 * <code>setPara</code>. 810 * 811 * <p>This option is significant only with reordering modes which generate 812 * a result with Logical order, specifically. 813 * <ul> 814 * <li><code>REORDER_RUNS_ONLY</code></li> 815 * <li><code>REORDER_INVERSE_NUMBERS_AS_L</code></li> 816 * <li><code>REORDER_INVERSE_LIKE_DIRECT</code></li> 817 * <li><code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code></li> 818 * </ul> 819 * 820 * <p>If this option is set in conjunction with reordering mode 821 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> or with calling 822 * <code>setInverse(true)</code>, it implies option 823 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method 824 * <code>writeReordered()</code>. 825 * 826 * <p>For other reordering modes, a minimum number of LRM or RLM characters 827 * will be added to the source text after reordering it so as to ensure 828 * round trip, i.e. when applying the inverse reordering mode on the 829 * resulting logical text with removal of Bidi marks 830 * (option <code>OPTION_REMOVE_CONTROLS</code> set before calling 831 * <code>setPara()</code> or option 832 * <code>REMOVE_BIDI_CONTROLS</code> in 833 * <code>writeReordered</code>), the result will be identical to the 834 * source text in the first transformation. 835 * 836 * <p>This option will be ignored if specified together with option 837 * <code>OPTION_REMOVE_CONTROLS</code>. It inhibits option 838 * <code>REMOVE_BIDI_CONTROLS</code> in calls to method 839 * <code>writeReordered()</code> and it implies option 840 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method 841 * <code>writeReordered()</code> if the reordering mode is 842 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>. 843 * 844 * @see #setReorderingMode 845 * @see #setReorderingOptions 846 * @see #INSERT_LRM_FOR_NUMERIC 847 * @see #REMOVE_BIDI_CONTROLS 848 * @see #OPTION_REMOVE_CONTROLS 849 * @see #REORDER_RUNS_ONLY 850 * @see #REORDER_INVERSE_NUMBERS_AS_L 851 * @see #REORDER_INVERSE_LIKE_DIRECT 852 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL 853 */ 854 public static final int OPTION_INSERT_MARKS = 1; 855 856 /** 857 * Option bit for <code>setReorderingOptions</code>: 858 * remove Bidi control characters 859 * 860 * <p>This option must be set or reset before calling 861 * <code>setPara</code>. 862 * 863 * <p>This option nullifies option 864 * <code>OPTION_INSERT_MARKS</code>. It inhibits option 865 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to method 866 * <code>writeReordered()</code> and it implies option 867 * <code>REMOVE_BIDI_CONTROLS</code> in calls to that method. 868 * 869 * @see #setReorderingMode 870 * @see #setReorderingOptions 871 * @see #OPTION_INSERT_MARKS 872 * @see #INSERT_LRM_FOR_NUMERIC 873 * @see #REMOVE_BIDI_CONTROLS 874 */ 875 public static final int OPTION_REMOVE_CONTROLS = 2; 876 877 /** 878 * Option bit for <code>setReorderingOptions</code>: 879 * process the output as part of a stream to be continued 880 * 881 * <p>This option must be set or reset before calling 882 * <code>setPara</code>. 883 * 884 * <p>This option specifies that the caller is interested in processing 885 * large text object in parts. The results of the successive calls are 886 * expected to be concatenated by the caller. Only the call for the last 887 * part will have this option bit off. 888 * 889 * <p>When this option bit is on, <code>setPara()</code> may process 890 * less than the full source text in order to truncate the text at a 891 * meaningful boundary. The caller should call 892 * <code>getProcessedLength()</code> immediately after calling 893 * <code>setPara()</code> in order to determine how much of the source 894 * text has been processed. Source text beyond that length should be 895 * resubmitted in following calls to <code>setPara</code>. The 896 * processed length may be less than the length of the source text if a 897 * character preceding the last character of the source text constitutes a 898 * reasonable boundary (like a block separator) for text to be continued.<br> 899 * If the last character of the source text constitutes a reasonable 900 * boundary, the whole text will be processed at once.<br> 901 * If nowhere in the source text there exists 902 * such a reasonable boundary, the processed length will be zero.<br> 903 * The caller should check for such an occurrence and do one of the following: 904 * <ul><li>submit a larger amount of text with a better chance to include 905 * a reasonable boundary.</li> 906 * <li>resubmit the same text after turning off option 907 * <code>OPTION_STREAMING</code>.</li></ul> 908 * In all cases, this option should be turned off before processing the last 909 * part of the text. 910 * 911 * <p>When the <code>OPTION_STREAMING</code> option is used, it is 912 * recommended to call <code>orderParagraphsLTR(true)</code> before calling 913 * <code>setPara()</code> so that later paragraphs may be concatenated to 914 * previous paragraphs on the right. 915 * 916 * @see #setReorderingMode 917 * @see #setReorderingOptions 918 * @see #getProcessedLength 919 */ 920 public static final int OPTION_STREAMING = 4; 921 922 /* 923 * Comparing the description of the Bidi algorithm with this implementation 924 * is easier with the same names for the Bidi types in the code as there. 925 * See UCharacterDirection 926 */ 927 static final byte L = UCharacterDirection.LEFT_TO_RIGHT; /* 0 */ 928 static final byte R = UCharacterDirection.RIGHT_TO_LEFT; /* 1 */ 929 static final byte EN = UCharacterDirection.EUROPEAN_NUMBER; /* 2 */ 930 static final byte ES = UCharacterDirection.EUROPEAN_NUMBER_SEPARATOR; /* 3 */ 931 static final byte ET = UCharacterDirection.EUROPEAN_NUMBER_TERMINATOR; /* 4 */ 932 static final byte AN = UCharacterDirection.ARABIC_NUMBER; /* 5 */ 933 static final byte CS = UCharacterDirection.COMMON_NUMBER_SEPARATOR; /* 6 */ 934 static final byte B = UCharacterDirection.BLOCK_SEPARATOR; /* 7 */ 935 static final byte S = UCharacterDirection.SEGMENT_SEPARATOR; /* 8 */ 936 static final byte WS = UCharacterDirection.WHITE_SPACE_NEUTRAL; /* 9 */ 937 static final byte ON = UCharacterDirection.OTHER_NEUTRAL; /* 10 */ 938 static final byte LRE = UCharacterDirection.LEFT_TO_RIGHT_EMBEDDING; /* 11 */ 939 static final byte LRO = UCharacterDirection.LEFT_TO_RIGHT_OVERRIDE; /* 12 */ 940 static final byte AL = UCharacterDirection.RIGHT_TO_LEFT_ARABIC; /* 13 */ 941 static final byte RLE = UCharacterDirection.RIGHT_TO_LEFT_EMBEDDING; /* 14 */ 942 static final byte RLO = UCharacterDirection.RIGHT_TO_LEFT_OVERRIDE; /* 15 */ 943 static final byte PDF = UCharacterDirection.POP_DIRECTIONAL_FORMAT; /* 16 */ 944 static final byte NSM = UCharacterDirection.DIR_NON_SPACING_MARK; /* 17 */ 945 static final byte BN = UCharacterDirection.BOUNDARY_NEUTRAL; /* 18 */ 946 static final byte FSI = UCharacterDirection.FIRST_STRONG_ISOLATE; /* 19 */ 947 static final byte LRI = UCharacterDirection.LEFT_TO_RIGHT_ISOLATE; /* 20 */ 948 static final byte RLI = UCharacterDirection.RIGHT_TO_LEFT_ISOLATE; /* 21 */ 949 static final byte PDI = UCharacterDirection.POP_DIRECTIONAL_ISOLATE; /* 22 */ 950 static final byte ENL = PDI + 1; /* EN after W7 */ /* 23 */ 951 static final byte ENR = ENL + 1; /* EN not subject to W7 */ /* 24 */ 952 953 /** 954 * Value returned by <code>BidiClassifier</code> when there is no need to 955 * override the standard Bidi class for a given code point. 956 * 957 * <p>This constant is deprecated; use UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1 instead. 958 * 959 * @see BidiClassifier 960 * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420. 961 */ 962 @Deprecated 963 public static final int CLASS_DEFAULT = UCharacterDirection.CHAR_DIRECTION_COUNT; 964 965 /* number of paras entries allocated initially */ 966 static final int SIMPLE_PARAS_COUNT = 10; 967 /* number of isolate run entries for paired brackets allocated initially */ 968 static final int SIMPLE_OPENINGS_COUNT = 20; 969 970 private static final char CR = '\r'; 971 private static final char LF = '\n'; 972 973 static final int LRM_BEFORE = 1; 974 static final int LRM_AFTER = 2; 975 static final int RLM_BEFORE = 4; 976 static final int RLM_AFTER = 8; 977 978 /* flags for Opening.flags */ 979 static final byte FOUND_L = (byte)DirPropFlag(L); 980 static final byte FOUND_R = (byte)DirPropFlag(R); 981 982 /* 983 * The following bit is used for the directional isolate status. 984 * Stack entries corresponding to isolate sequences are greater than ISOLATE. 985 */ 986 static final int ISOLATE = 0x0100; 987 988 989 /* 990 * reference to parent paragraph object (reference to self if this object is 991 * a paragraph object); set to null in a newly opened object; set to a 992 * real value after a successful execution of setPara or setLine 993 */ 994 Bidi paraBidi; 995 996 final UBiDiProps bdp; 997 998 /* character array representing the current text */ 999 char[] text; 1000 1001 /* length of the current text */ 1002 int originalLength; 1003 1004 /* if the option OPTION_STREAMING is set, this is the length of 1005 * text actually processed by <code>setPara</code>, which may be shorter 1006 * than the original length. Otherwise, it is identical to the original 1007 * length. 1008 */ 1009 int length; 1010 1011 /* if option OPTION_REMOVE_CONTROLS is set, and/or Bidi 1012 * marks are allowed to be inserted in one of the reordering modes, the 1013 * length of the result string may be different from the processed length. 1014 */ 1015 int resultLength; 1016 1017 /* indicators for whether memory may be allocated after construction */ 1018 boolean mayAllocateText; 1019 boolean mayAllocateRuns; 1020 1021 /* arrays with one value per text-character */ 1022 byte[] dirPropsMemory = new byte[1]; 1023 byte[] levelsMemory = new byte[1]; 1024 byte[] dirProps; 1025 byte[] levels; 1026 1027 /* are we performing an approximation of the "inverse Bidi" algorithm? */ 1028 boolean isInverse; 1029 1030 /* are we using the basic algorithm or its variation? */ 1031 int reorderingMode; 1032 1033 /* bitmask for reordering options */ 1034 int reorderingOptions; 1035 1036 /* must block separators receive level 0? */ 1037 boolean orderParagraphsLTR; 1038 1039 /* the paragraph level */ 1040 byte paraLevel; 1041 /* original paraLevel when contextual */ 1042 /* must be one of DEFAULT_xxx or 0 if not contextual */ 1043 byte defaultParaLevel; 1044 1045 /* context data */ 1046 String prologue; 1047 String epilogue; 1048 1049 /* the following is set in setPara, used in processPropertySeq */ 1050 1051 ImpTabPair impTabPair; /* reference to levels state table pair */ 1052 /* the overall paragraph or line directionality*/ 1053 byte direction; 1054 1055 /* flags is a bit set for which directional properties are in the text */ 1056 int flags; 1057 1058 /* lastArabicPos is index to the last AL in the text, -1 if none */ 1059 int lastArabicPos; 1060 1061 /* characters after trailingWSStart are WS and are */ 1062 /* implicitly at the paraLevel (rule (L1)) - levels may not reflect that */ 1063 int trailingWSStart; 1064 1065 /* fields for paragraph handling, set in getDirProps() */ 1066 int paraCount; 1067 int[] paras_limit = new int[SIMPLE_PARAS_COUNT]; 1068 byte[] paras_level = new byte[SIMPLE_PARAS_COUNT]; 1069 1070 /* fields for line reordering */ 1071 int runCount; /* ==-1: runs not set up yet */ 1072 BidiRun[] runsMemory = new BidiRun[0]; 1073 BidiRun[] runs; 1074 1075 /* for non-mixed text, we only need a tiny array of runs (no allocation) */ 1076 BidiRun[] simpleRuns = {new BidiRun()}; 1077 1078 /* fields for managing isolate sequences */ 1079 Isolate[] isolates; 1080 /* maximum or current nesting depth of isolate sequences */ 1081 /* Within resolveExplicitLevels() and checkExplicitLevels(), this is the maximal 1082 nesting encountered. 1083 Within resolveImplicitLevels(), this is the index of the current isolates 1084 stack entry. */ 1085 int isolateCount; 1086 1087 /* mapping of runs in logical order to visual order */ 1088 int[] logicalToVisualRunsMap; 1089 /* flag to indicate that the map has been updated */ 1090 boolean isGoodLogicalToVisualRunsMap; 1091 1092 /* customized class provider */ 1093 BidiClassifier customClassifier = null; 1094 1095 /* for inverse Bidi with insertion of directional marks */ 1096 InsertPoints insertPoints = new InsertPoints(); 1097 1098 /* for option OPTION_REMOVE_CONTROLS */ 1099 int controlCount; 1100 1101 /* 1102 * Sometimes, bit values are more appropriate 1103 * to deal with directionality properties. 1104 * Abbreviations in these method names refer to names 1105 * used in the Bidi algorithm. 1106 */ 1107 static int DirPropFlag(byte dir) { 1108 return (1 << dir); 1109 } 1110 1111 boolean testDirPropFlagAt(int flag, int index) { 1112 return ((DirPropFlag(dirProps[index]) & flag) != 0); 1113 } 1114 1115 static final int DirPropFlagMultiRuns = DirPropFlag((byte)31); 1116 1117 /* to avoid some conditional statements, use tiny constant arrays */ 1118 static final int DirPropFlagLR[] = { DirPropFlag(L), DirPropFlag(R) }; 1119 static final int DirPropFlagE[] = { DirPropFlag(LRE), DirPropFlag(RLE) }; 1120 static final int DirPropFlagO[] = { DirPropFlag(LRO), DirPropFlag(RLO) }; 1121 1122 static final int DirPropFlagLR(byte level) { return DirPropFlagLR[level & 1]; } 1123 static final int DirPropFlagE(byte level) { return DirPropFlagE[level & 1]; } 1124 static final int DirPropFlagO(byte level) { return DirPropFlagO[level & 1]; } 1125 static final byte DirFromStrong(byte strong) { return strong == L ? L : R; } 1126 static final byte NoOverride(byte level) { return (byte)(level & ~LEVEL_OVERRIDE); } 1127 1128 /* are there any characters that are LTR or RTL? */ 1129 static final int MASK_LTR = 1130 DirPropFlag(L)|DirPropFlag(EN)|DirPropFlag(ENL)|DirPropFlag(ENR)|DirPropFlag(AN)|DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(LRI); 1131 static final int MASK_RTL = DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(RLI); 1132 1133 static final int MASK_R_AL = DirPropFlag(R)|DirPropFlag(AL); 1134 static final int MASK_STRONG_EN_AN = DirPropFlag(L)|DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(EN)|DirPropFlag(AN); 1135 /* explicit embedding codes */ 1136 static final int MASK_EXPLICIT = DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(PDF); 1137 static final int MASK_BN_EXPLICIT = DirPropFlag(BN)|MASK_EXPLICIT; 1138 1139 /* explicit isolate codes */ 1140 static final int MASK_ISO = DirPropFlag(LRI)|DirPropFlag(RLI)|DirPropFlag(FSI)|DirPropFlag(PDI); 1141 1142 /* paragraph and segment separators */ 1143 static final int MASK_B_S = DirPropFlag(B)|DirPropFlag(S); 1144 1145 /* all types that are counted as White Space or Neutral in some steps */ 1146 static final int MASK_WS = MASK_B_S|DirPropFlag(WS)|MASK_BN_EXPLICIT|MASK_ISO; 1147 1148 /* types that are neutrals or could becomes neutrals in (Wn) */ 1149 static final int MASK_POSSIBLE_N = DirPropFlag(ON)|DirPropFlag(CS)|DirPropFlag(ES)|DirPropFlag(ET)|MASK_WS; 1150 1151 /* 1152 * These types may be changed to "e", 1153 * the embedding type (L or R) of the run, 1154 * in the Bidi algorithm (N2) 1155 */ 1156 static final int MASK_EMBEDDING = DirPropFlag(NSM)|MASK_POSSIBLE_N; 1157 1158 /* 1159 * the dirProp's L and R are defined to 0 and 1 values in UCharacterDirection.java 1160 */ 1161 static byte GetLRFromLevel(byte level) 1162 { 1163 return (byte)(level & 1); 1164 } 1165 1166 static boolean IsDefaultLevel(byte level) 1167 { 1168 return ((level & LEVEL_DEFAULT_LTR) == LEVEL_DEFAULT_LTR); 1169 } 1170 1171 static boolean IsBidiControlChar(int c) 1172 { 1173 /* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or 1174 0x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */ 1175 return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e)) 1176 || ((c >= 0x2066) && (c <= 0x2069))); 1177 } 1178 1179 void verifyValidPara() 1180 { 1181 if (!(this == this.paraBidi)) { 1182 throw new IllegalStateException(); 1183 } 1184 } 1185 1186 void verifyValidParaOrLine() 1187 { 1188 Bidi para = this.paraBidi; 1189 /* verify Para */ 1190 if (this == para) { 1191 return; 1192 } 1193 /* verify Line */ 1194 if ((para == null) || (para != para.paraBidi)) { 1195 throw new IllegalStateException(); 1196 } 1197 } 1198 1199 void verifyRange(int index, int start, int limit) 1200 { 1201 if (index < start || index >= limit) { 1202 throw new IllegalArgumentException("Value " + index + 1203 " is out of range " + start + " to " + limit); 1204 } 1205 } 1206 1207 /** 1208 * Allocate a <code>Bidi</code> object. 1209 * Such an object is initially empty. It is assigned 1210 * the Bidi properties of a piece of text containing one or more paragraphs 1211 * by <code>setPara()</code> 1212 * or the Bidi properties of a line within a paragraph by 1213 * <code>setLine()</code>.<p> 1214 * This object can be reused.<p> 1215 * <code>setPara()</code> and <code>setLine()</code> will allocate 1216 * additional memory for internal structures as necessary. 1217 */ 1218 public Bidi() 1219 { 1220 this(0, 0); 1221 } 1222 1223 /** 1224 * Allocate a <code>Bidi</code> object with preallocated memory 1225 * for internal structures. 1226 * This method provides a <code>Bidi</code> object like the default constructor 1227 * but it also preallocates memory for internal structures 1228 * according to the sizings supplied by the caller.<p> 1229 * The preallocation can be limited to some of the internal memory 1230 * by setting some values to 0 here. That means that if, e.g., 1231 * <code>maxRunCount</code> cannot be reasonably predetermined and should not 1232 * be set to <code>maxLength</code> (the only failproof value) to avoid 1233 * wasting memory, then <code>maxRunCount</code> could be set to 0 here 1234 * and the internal structures that are associated with it will be allocated 1235 * on demand, just like with the default constructor. 1236 * 1237 * @param maxLength is the maximum text or line length that internal memory 1238 * will be preallocated for. An attempt to associate this object with a 1239 * longer text will fail, unless this value is 0, which leaves the allocation 1240 * up to the implementation. 1241 * 1242 * @param maxRunCount is the maximum anticipated number of same-level runs 1243 * that internal memory will be preallocated for. An attempt to access 1244 * visual runs on an object that was not preallocated for as many runs 1245 * as the text was actually resolved to will fail, 1246 * unless this value is 0, which leaves the allocation up to the implementation.<br><br> 1247 * The number of runs depends on the actual text and maybe anywhere between 1248 * 1 and <code>maxLength</code>. It is typically small. 1249 * 1250 * @throws IllegalArgumentException if maxLength or maxRunCount is less than 0 1251 */ 1252 public Bidi(int maxLength, int maxRunCount) 1253 { 1254 /* check the argument values */ 1255 if (maxLength < 0 || maxRunCount < 0) { 1256 throw new IllegalArgumentException(); 1257 } 1258 1259 /* reset the object, all reference variables null, all flags false, 1260 all sizes 0. 1261 In fact, we don't need to do anything, since class members are 1262 initialized as zero when an instance is created. 1263 */ 1264 /* 1265 mayAllocateText = false; 1266 mayAllocateRuns = false; 1267 orderParagraphsLTR = false; 1268 paraCount = 0; 1269 runCount = 0; 1270 trailingWSStart = 0; 1271 flags = 0; 1272 paraLevel = 0; 1273 defaultParaLevel = 0; 1274 direction = 0; 1275 */ 1276 /* get Bidi properties */ 1277 bdp = UBiDiProps.INSTANCE; 1278 1279 /* allocate memory for arrays as requested */ 1280 if (maxLength > 0) { 1281 getInitialDirPropsMemory(maxLength); 1282 getInitialLevelsMemory(maxLength); 1283 } else { 1284 mayAllocateText = true; 1285 } 1286 1287 if (maxRunCount > 0) { 1288 // if maxRunCount == 1, use simpleRuns[] 1289 if (maxRunCount > 1) { 1290 getInitialRunsMemory(maxRunCount); 1291 } 1292 } else { 1293 mayAllocateRuns = true; 1294 } 1295 } 1296 1297 /* 1298 * We are allowed to allocate memory if object==null or 1299 * mayAllocate==true for each array that we need. 1300 * 1301 * Assume sizeNeeded>0. 1302 * If object != null, then assume size > 0. 1303 */ 1304 private Object getMemory(String label, Object array, Class<?> arrayClass, 1305 boolean mayAllocate, int sizeNeeded) 1306 { 1307 int len = Array.getLength(array); 1308 1309 /* we have at least enough memory and must not allocate */ 1310 if (sizeNeeded == len) { 1311 return array; 1312 } 1313 if (!mayAllocate) { 1314 /* we must not allocate */ 1315 if (sizeNeeded <= len) { 1316 return array; 1317 } 1318 throw new OutOfMemoryError("Failed to allocate memory for " 1319 + label); 1320 } 1321 /* we may try to grow or shrink */ 1322 /* FOOD FOR THOUGHT: when shrinking it should be possible to avoid 1323 the allocation altogether and rely on this.length */ 1324 try { 1325 return Array.newInstance(arrayClass, sizeNeeded); 1326 } catch (Exception e) { 1327 throw new OutOfMemoryError("Failed to allocate memory for " 1328 + label); 1329 } 1330 } 1331 1332 /* helper methods for each allocated array */ 1333 private void getDirPropsMemory(boolean mayAllocate, int len) 1334 { 1335 Object array = getMemory("DirProps", dirPropsMemory, Byte.TYPE, mayAllocate, len); 1336 dirPropsMemory = (byte[]) array; 1337 } 1338 1339 void getDirPropsMemory(int len) 1340 { 1341 getDirPropsMemory(mayAllocateText, len); 1342 } 1343 1344 private void getLevelsMemory(boolean mayAllocate, int len) 1345 { 1346 Object array = getMemory("Levels", levelsMemory, Byte.TYPE, mayAllocate, len); 1347 levelsMemory = (byte[]) array; 1348 } 1349 1350 void getLevelsMemory(int len) 1351 { 1352 getLevelsMemory(mayAllocateText, len); 1353 } 1354 1355 private void getRunsMemory(boolean mayAllocate, int len) 1356 { 1357 Object array = getMemory("Runs", runsMemory, BidiRun.class, mayAllocate, len); 1358 runsMemory = (BidiRun[]) array; 1359 } 1360 1361 void getRunsMemory(int len) 1362 { 1363 getRunsMemory(mayAllocateRuns, len); 1364 } 1365 1366 /* additional methods used by constructor - always allow allocation */ 1367 private void getInitialDirPropsMemory(int len) 1368 { 1369 getDirPropsMemory(true, len); 1370 } 1371 1372 private void getInitialLevelsMemory(int len) 1373 { 1374 getLevelsMemory(true, len); 1375 } 1376 1377 private void getInitialRunsMemory(int len) 1378 { 1379 getRunsMemory(true, len); 1380 } 1381 1382 /** 1383 * Modify the operation of the Bidi algorithm such that it 1384 * approximates an "inverse Bidi" algorithm. This method 1385 * must be called before <code>setPara()</code>. 1386 * 1387 * <p>The normal operation of the Bidi algorithm as described 1388 * in the Unicode Technical Report is to take text stored in logical 1389 * (keyboard, typing) order and to determine the reordering of it for visual 1390 * rendering. 1391 * Some legacy systems store text in visual order, and for operations 1392 * with standard, Unicode-based algorithms, the text needs to be transformed 1393 * to logical order. This is effectively the inverse algorithm of the 1394 * described Bidi algorithm. Note that there is no standard algorithm for 1395 * this "inverse Bidi" and that the current implementation provides only an 1396 * approximation of "inverse Bidi". 1397 * 1398 * <p>With <code>isInversed</code> set to <code>true</code>, 1399 * this method changes the behavior of some of the subsequent methods 1400 * in a way that they can be used for the inverse Bidi algorithm. 1401 * Specifically, runs of text with numeric characters will be treated in a 1402 * special way and may need to be surrounded with LRM characters when they are 1403 * written in reordered sequence. 1404 * 1405 * <p>Output runs should be retrieved using <code>getVisualRun()</code>. 1406 * Since the actual input for "inverse Bidi" is visually ordered text and 1407 * <code>getVisualRun()</code> gets the reordered runs, these are actually 1408 * the runs of the logically ordered output. 1409 * 1410 * <p>Calling this method with argument <code>isInverse</code> set to 1411 * <code>true</code> is equivalent to calling <code>setReorderingMode</code> 1412 * with argument <code>reorderingMode</code> 1413 * set to <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br> 1414 * Calling this method with argument <code>isInverse</code> set to 1415 * <code>false</code> is equivalent to calling <code>setReorderingMode</code> 1416 * with argument <code>reorderingMode</code> 1417 * set to <code>REORDER_DEFAULT</code>. 1418 * 1419 * @param isInverse specifies "forward" or "inverse" Bidi operation. 1420 * 1421 * @see #setPara 1422 * @see #writeReordered 1423 * @see #setReorderingMode 1424 * @see #REORDER_INVERSE_NUMBERS_AS_L 1425 * @see #REORDER_DEFAULT 1426 */ 1427 public void setInverse(boolean isInverse) { 1428 this.isInverse = (isInverse); 1429 this.reorderingMode = isInverse ? REORDER_INVERSE_NUMBERS_AS_L 1430 : REORDER_DEFAULT; 1431 } 1432 1433 /** 1434 * Is this <code>Bidi</code> object set to perform the inverse Bidi 1435 * algorithm? 1436 * <p>Note: calling this method after setting the reordering mode with 1437 * <code>setReorderingMode</code> will return <code>true</code> if the 1438 * reordering mode was set to 1439 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>, <code>false</code> 1440 * for all other values. 1441 * 1442 * @return <code>true</code> if the <code>Bidi</code> object is set to 1443 * perform the inverse Bidi algorithm by handling numbers as L. 1444 * 1445 * @see #setInverse 1446 * @see #setReorderingMode 1447 * @see #REORDER_INVERSE_NUMBERS_AS_L 1448 */ 1449 public boolean isInverse() { 1450 return isInverse; 1451 } 1452 1453 /** 1454 * Modify the operation of the Bidi algorithm such that it implements some 1455 * variant to the basic Bidi algorithm or approximates an "inverse Bidi" 1456 * algorithm, depending on different values of the "reordering mode". 1457 * This method must be called before <code>setPara()</code>, and stays in 1458 * effect until called again with a different argument. 1459 * 1460 * <p>The normal operation of the Bidi algorithm as described in the Unicode 1461 * Standard Annex #9 is to take text stored in logical (keyboard, typing) 1462 * order and to determine how to reorder it for visual rendering. 1463 * 1464 * <p>With the reordering mode set to a value other than 1465 * <code>REORDER_DEFAULT</code>, this method changes the behavior of some of 1466 * the subsequent methods in a way such that they implement an inverse Bidi 1467 * algorithm or some other algorithm variants. 1468 * 1469 * <p>Some legacy systems store text in visual order, and for operations 1470 * with standard, Unicode-based algorithms, the text needs to be transformed 1471 * into logical order. This is effectively the inverse algorithm of the 1472 * described Bidi algorithm. Note that there is no standard algorithm for 1473 * this "inverse Bidi", so a number of variants are implemented here. 1474 * 1475 * <p>In other cases, it may be desirable to emulate some variant of the 1476 * Logical to Visual algorithm (e.g. one used in MS Windows), or perform a 1477 * Logical to Logical transformation. 1478 * 1479 * <ul> 1480 * <li>When the Reordering Mode is set to 1481 * <code>REORDER_DEFAULT</code>, 1482 * the standard Bidi Logical to Visual algorithm is applied.</li> 1483 * 1484 * <li>When the reordering mode is set to 1485 * <code>REORDER_NUMBERS_SPECIAL</code>, 1486 * the algorithm used to perform Bidi transformations when calling 1487 * <code>setPara</code> should approximate the algorithm used in Microsoft 1488 * Windows XP rather than strictly conform to the Unicode Bidi algorithm. 1489 * <br> 1490 * The differences between the basic algorithm and the algorithm addressed 1491 * by this option are as follows: 1492 * <ul> 1493 * <li>Within text at an even embedding level, the sequence "123AB" 1494 * (where AB represent R or AL letters) is transformed to "123BA" by the 1495 * Unicode algorithm and to "BA123" by the Windows algorithm.</li> 1496 * 1497 * <li>Arabic-Indic numbers (AN) are handled by the Windows algorithm just 1498 * like regular numbers (EN).</li> 1499 * </ul></li> 1500 * 1501 * <li>When the reordering mode is set to 1502 * <code>REORDER_GROUP_NUMBERS_WITH_R</code>, 1503 * numbers located between LTR text and RTL text are associated with the RTL 1504 * text. For instance, an LTR paragraph with content "abc 123 DEF" (where 1505 * upper case letters represent RTL characters) will be transformed to 1506 * "abc FED 123" (and not "abc 123 FED"), "DEF 123 abc" will be transformed 1507 * to "123 FED abc" and "123 FED abc" will be transformed to "DEF 123 abc". 1508 * This makes the algorithm reversible and makes it useful when round trip 1509 * (from visual to logical and back to visual) must be achieved without 1510 * adding LRM characters. However, this is a variation from the standard 1511 * Unicode Bidi algorithm.<br> 1512 * The source text should not contain Bidi control characters other than LRM 1513 * or RLM.</li> 1514 * 1515 * <li>When the reordering mode is set to 1516 * <code>REORDER_RUNS_ONLY</code>, 1517 * a "Logical to Logical" transformation must be performed: 1518 * <ul> 1519 * <li>If the default text level of the source text (argument 1520 * <code>paraLevel</code> in <code>setPara</code>) is even, the source text 1521 * will be handled as LTR logical text and will be transformed to the RTL 1522 * logical text which has the same LTR visual display.</li> 1523 * <li>If the default level of the source text is odd, the source text 1524 * will be handled as RTL logical text and will be transformed to the 1525 * LTR logical text which has the same LTR visual display.</li> 1526 * </ul> 1527 * This mode may be needed when logical text which is basically Arabic or 1528 * Hebrew, with possible included numbers or phrases in English, has to be 1529 * displayed as if it had an even embedding level (this can happen if the 1530 * displaying application treats all text as if it was basically LTR). 1531 * <br> 1532 * This mode may also be needed in the reverse case, when logical text which 1533 * is basically English, with possible included phrases in Arabic or Hebrew, 1534 * has to be displayed as if it had an odd embedding level. 1535 * <br> 1536 * Both cases could be handled by adding LRE or RLE at the head of the 1537 * text, if the display subsystem supports these formatting controls. If it 1538 * does not, the problem may be handled by transforming the source text in 1539 * this mode before displaying it, so that it will be displayed properly. 1540 * <br> 1541 * The source text should not contain Bidi control characters other than LRM 1542 * or RLM.</li> 1543 * 1544 * <li>When the reordering mode is set to 1545 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>, an "inverse Bidi" 1546 * algorithm is applied. 1547 * Runs of text with numeric characters will be treated like LTR letters and 1548 * may need to be surrounded with LRM characters when they are written in 1549 * reordered sequence (the option <code>INSERT_LRM_FOR_NUMERIC</code> can 1550 * be used with method <code>writeReordered</code> to this end. This mode 1551 * is equivalent to calling <code>setInverse()</code> with 1552 * argument <code>isInverse</code> set to <code>true</code>.</li> 1553 * 1554 * <li>When the reordering mode is set to 1555 * <code>REORDER_INVERSE_LIKE_DIRECT</code>, the "direct" Logical to 1556 * Visual Bidi algorithm is used as an approximation of an "inverse Bidi" 1557 * algorithm. This mode is similar to mode 1558 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> but is closer to the 1559 * regular Bidi algorithm. 1560 * <br> 1561 * For example, an LTR paragraph with the content "FED 123 456 CBA" (where 1562 * upper case represents RTL characters) will be transformed to 1563 * "ABC 456 123 DEF", as opposed to "DEF 123 456 ABC" 1564 * with mode <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br> 1565 * When used in conjunction with option 1566 * <code>OPTION_INSERT_MARKS</code>, this mode generally 1567 * adds Bidi marks to the output significantly more sparingly than mode 1568 * <code>REORDER_INVERSE_NUMBERS_AS_L</code>.<br> with option 1569 * <code>INSERT_LRM_FOR_NUMERIC</code> in calls to 1570 * <code>writeReordered</code>.</li> 1571 * 1572 * <li>When the reordering mode is set to 1573 * <code>REORDER_INVERSE_FOR_NUMBERS_SPECIAL</code>, the Logical to Visual 1574 * Bidi algorithm used in Windows XP is used as an approximation of an "inverse 1575 * Bidi" algorithm. 1576 * <br> 1577 * For example, an LTR paragraph with the content "abc FED123" (where 1578 * upper case represents RTL characters) will be transformed to 1579 * "abc 123DEF.</li> 1580 * </ul> 1581 * 1582 * <p>In all the reordering modes specifying an "inverse Bidi" algorithm 1583 * (i.e. those with a name starting with <code>REORDER_INVERSE</code>), 1584 * output runs should be retrieved using <code>getVisualRun()</code>, and 1585 * the output text with <code>writeReordered()</code>. The caller should 1586 * keep in mind that in "inverse Bidi" modes the input is actually visually 1587 * ordered text and reordered output returned by <code>getVisualRun()</code> 1588 * or <code>writeReordered()</code> are actually runs or character string 1589 * of logically ordered output.<br> 1590 * For all the "inverse Bidi" modes, the source text should not contain 1591 * Bidi control characters other than LRM or RLM. 1592 * 1593 * <p>Note that option <code>OUTPUT_REVERSE</code> of 1594 * <code>writeReordered</code> has no useful meaning and should not be used 1595 * in conjunction with any value of the reordering mode specifying "inverse 1596 * Bidi" or with value <code>REORDER_RUNS_ONLY</code>. 1597 * 1598 * @param reorderingMode specifies the required variant of the Bidi 1599 * algorithm. 1600 * 1601 * @see #setInverse 1602 * @see #setPara 1603 * @see #writeReordered 1604 * @see #INSERT_LRM_FOR_NUMERIC 1605 * @see #OUTPUT_REVERSE 1606 * @see #REORDER_DEFAULT 1607 * @see #REORDER_NUMBERS_SPECIAL 1608 * @see #REORDER_GROUP_NUMBERS_WITH_R 1609 * @see #REORDER_RUNS_ONLY 1610 * @see #REORDER_INVERSE_NUMBERS_AS_L 1611 * @see #REORDER_INVERSE_LIKE_DIRECT 1612 * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL 1613 */ 1614 public void setReorderingMode(int reorderingMode) { 1615 if ((reorderingMode < REORDER_DEFAULT) || 1616 (reorderingMode >= REORDER_COUNT)) 1617 return; /* don't accept a wrong value */ 1618 this.reorderingMode = reorderingMode; 1619 this.isInverse = 1620 reorderingMode == REORDER_INVERSE_NUMBERS_AS_L; 1621 } 1622 1623 /** 1624 * What is the requested reordering mode for a given Bidi object? 1625 * 1626 * @return the current reordering mode of the Bidi object 1627 * 1628 * @see #setReorderingMode 1629 */ 1630 public int getReorderingMode() { 1631 return this.reorderingMode; 1632 } 1633 1634 /** 1635 * Specify which of the reordering options should be applied during Bidi 1636 * transformations. 1637 * 1638 * @param options A combination of zero or more of the following 1639 * reordering options: 1640 * <code>OPTION_DEFAULT</code>, <code>OPTION_INSERT_MARKS</code>, 1641 * <code>OPTION_REMOVE_CONTROLS</code>, <code>OPTION_STREAMING</code>. 1642 * 1643 * @see #getReorderingOptions 1644 * @see #OPTION_DEFAULT 1645 * @see #OPTION_INSERT_MARKS 1646 * @see #OPTION_REMOVE_CONTROLS 1647 * @see #OPTION_STREAMING 1648 */ 1649 public void setReorderingOptions(int options) { 1650 if ((options & OPTION_REMOVE_CONTROLS) != 0) { 1651 this.reorderingOptions = options & ~OPTION_INSERT_MARKS; 1652 } else { 1653 this.reorderingOptions = options; 1654 } 1655 } 1656 1657 /** 1658 * What are the reordering options applied to a given Bidi object? 1659 * 1660 * @return the current reordering options of the Bidi object 1661 * 1662 * @see #setReorderingOptions 1663 */ 1664 public int getReorderingOptions() { 1665 return this.reorderingOptions; 1666 } 1667 1668 /** 1669 * Get the base direction of the text provided according to the Unicode 1670 * Bidirectional Algorithm. The base direction is derived from the first 1671 * character in the string with bidirectional character type L, R, or AL. 1672 * If the first such character has type L, LTR is returned. If the first 1673 * such character has type R or AL, RTL is returned. If the string does 1674 * not contain any character of these types, then NEUTRAL is returned. 1675 * This is a lightweight function for use when only the base direction is 1676 * needed and no further bidi processing of the text is needed. 1677 * @param paragraph the text whose paragraph level direction is needed. 1678 * @return LTR, RTL, NEUTRAL 1679 * @see #LTR 1680 * @see #RTL 1681 * @see #NEUTRAL 1682 */ 1683 public static byte getBaseDirection(CharSequence paragraph) { 1684 if (paragraph == null || paragraph.length() == 0) { 1685 return NEUTRAL; 1686 } 1687 1688 int length = paragraph.length(); 1689 int c;// codepoint 1690 byte direction; 1691 1692 for (int i = 0; i < length; ) { 1693 // U16_NEXT(paragraph, i, length, c) for C++ 1694 c = UCharacter.codePointAt(paragraph, i); 1695 direction = UCharacter.getDirectionality(c); 1696 if (direction == UCharacterDirection.LEFT_TO_RIGHT) { 1697 return LTR; 1698 } else if (direction == UCharacterDirection.RIGHT_TO_LEFT 1699 || direction == UCharacterDirection.RIGHT_TO_LEFT_ARABIC) { 1700 return RTL; 1701 } 1702 1703 i = UCharacter.offsetByCodePoints(paragraph, i, 1);// set i to the head index of next codepoint 1704 } 1705 return NEUTRAL; 1706 } 1707 1708/* perform (P2)..(P3) ------------------------------------------------------- */ 1709 1710 /** 1711 * Returns the directionality of the first strong character 1712 * after the last B in prologue, if any. 1713 * Requires prologue!=null. 1714 */ 1715 private byte firstL_R_AL() { 1716 byte result = ON; 1717 for (int i = 0; i < prologue.length(); ) { 1718 int uchar = prologue.codePointAt(i); 1719 i += Character.charCount(uchar); 1720 byte dirProp = (byte)getCustomizedClass(uchar); 1721 if (result == ON) { 1722 if (dirProp == L || dirProp == R || dirProp == AL) { 1723 result = dirProp; 1724 } 1725 } else { 1726 if (dirProp == B) { 1727 result = ON; 1728 } 1729 } 1730 } 1731 return result; 1732 } 1733 1734 /* 1735 * Check that there are enough entries in the arrays paras_limit and paras_level 1736 */ 1737 private void checkParaCount() { 1738 int[] saveLimits; 1739 byte[] saveLevels; 1740 int count = paraCount; 1741 if (count <= paras_level.length) 1742 return; 1743 int oldLength = paras_level.length; 1744 saveLimits = paras_limit; 1745 saveLevels = paras_level; 1746 try { 1747 paras_limit = new int[count * 2]; 1748 paras_level = new byte[count * 2]; 1749 } catch (Exception e) { 1750 throw new OutOfMemoryError("Failed to allocate memory for paras"); 1751 } 1752 System.arraycopy(saveLimits, 0, paras_limit, 0, oldLength); 1753 System.arraycopy(saveLevels, 0, paras_level, 0, oldLength); 1754 } 1755 1756 /* 1757 * Get the directional properties for the text, calculate the flags bit-set, and 1758 * determine the paragraph level if necessary (in paras_level[i]). 1759 * FSI initiators are also resolved and their dirProp replaced with LRI or RLI. 1760 * When encountering an FSI, it is initially replaced with an LRI, which is the 1761 * default. Only if a strong R or AL is found within its scope will the LRI be 1762 * replaced by an RLI. 1763 */ 1764 static final int NOT_SEEKING_STRONG = 0; /* 0: not contextual paraLevel, not after FSI */ 1765 static final int SEEKING_STRONG_FOR_PARA = 1; /* 1: looking for first strong char in para */ 1766 static final int SEEKING_STRONG_FOR_FSI = 2; /* 2: looking for first strong after FSI */ 1767 static final int LOOKING_FOR_PDI = 3; /* 3: found strong after FSI, looking for PDI */ 1768 1769 private void getDirProps() 1770 { 1771 int i = 0, i0, i1; 1772 flags = 0; /* collect all directionalities in the text */ 1773 int uchar; 1774 byte dirProp; 1775 byte defaultParaLevel = 0; /* initialize to avoid compiler warnings */ 1776 boolean isDefaultLevel = IsDefaultLevel(paraLevel); 1777 /* for inverse Bidi, the default para level is set to RTL if there is a 1778 strong R or AL character at either end of the text */ 1779 boolean isDefaultLevelInverse=isDefaultLevel && 1780 (reorderingMode == REORDER_INVERSE_LIKE_DIRECT || 1781 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL); 1782 lastArabicPos = -1; 1783 int controlCount = 0; 1784 boolean removeBidiControls = (reorderingOptions & OPTION_REMOVE_CONTROLS) != 0; 1785 1786 byte state; 1787 byte lastStrong = ON; /* for default level & inverse Bidi */ 1788 /* The following stacks are used to manage isolate sequences. Those 1789 sequences may be nested, but obviously never more deeply than the 1790 maximum explicit embedding level. 1791 lastStack is the index of the last used entry in the stack. A value of -1 1792 means that there is no open isolate sequence. 1793 lastStack is reset to -1 on paragraph boundaries. */ 1794 /* The following stack contains the position of the initiator of 1795 each open isolate sequence */ 1796 int[] isolateStartStack= new int[MAX_EXPLICIT_LEVEL+1]; 1797 /* The following stack contains the last known state before 1798 encountering the initiator of an isolate sequence */ 1799 byte[] previousStateStack = new byte[MAX_EXPLICIT_LEVEL+1]; 1800 int stackLast=-1; 1801 1802 if ((reorderingOptions & OPTION_STREAMING) != 0) 1803 length = 0; 1804 defaultParaLevel = (byte)(paraLevel & 1); 1805 1806 if (isDefaultLevel) { 1807 paras_level[0] = defaultParaLevel; 1808 lastStrong = defaultParaLevel; 1809 if (prologue != null && /* there is a prologue */ 1810 (dirProp = firstL_R_AL()) != ON) { /* with a strong character */ 1811 if (dirProp == L) 1812 paras_level[0] = 0; /* set the default para level */ 1813 else 1814 paras_level[0] = 1; /* set the default para level */ 1815 state = NOT_SEEKING_STRONG; 1816 } else { 1817 state = SEEKING_STRONG_FOR_PARA; 1818 } 1819 } else { 1820 paras_level[0] = paraLevel; 1821 state = NOT_SEEKING_STRONG; 1822 } 1823 /* count paragraphs and determine the paragraph level (P2..P3) */ 1824 /* 1825 * see comment on constant fields: 1826 * the LEVEL_DEFAULT_XXX values are designed so that 1827 * their low-order bit alone yields the intended default 1828 */ 1829 1830 for (i = 0; i < originalLength; /* i is incremented in the loop */) { 1831 i0 = i; /* index of first code unit */ 1832 uchar = UTF16.charAt(text, 0, originalLength, i); 1833 i += UTF16.getCharCount(uchar); 1834 i1 = i - 1; /* index of last code unit, gets the directional property */ 1835 1836 dirProp = (byte)getCustomizedClass(uchar); 1837 flags |= DirPropFlag(dirProp); 1838 dirProps[i1] = dirProp; 1839 if (i1 > i0) { /* set previous code units' properties to BN */ 1840 flags |= DirPropFlag(BN); 1841 do { 1842 dirProps[--i1] = BN; 1843 } while (i1 > i0); 1844 } 1845 if (removeBidiControls && IsBidiControlChar(uchar)) { 1846 controlCount++; 1847 } 1848 if (dirProp == L) { 1849 if (state == SEEKING_STRONG_FOR_PARA) { 1850 paras_level[paraCount - 1] = 0; 1851 state = NOT_SEEKING_STRONG; 1852 } 1853 else if (state == SEEKING_STRONG_FOR_FSI) { 1854 if (stackLast <= MAX_EXPLICIT_LEVEL) { 1855 /* no need for next statement, already set by default */ 1856 /* dirProps[isolateStartStack[stackLast]] = LRI; */ 1857 flags |= DirPropFlag(LRI); 1858 } 1859 state = LOOKING_FOR_PDI; 1860 } 1861 lastStrong = L; 1862 continue; 1863 } 1864 if (dirProp == R || dirProp == AL) { 1865 if (state == SEEKING_STRONG_FOR_PARA) { 1866 paras_level[paraCount - 1] = 1; 1867 state = NOT_SEEKING_STRONG; 1868 } 1869 else if (state == SEEKING_STRONG_FOR_FSI) { 1870 if (stackLast <= MAX_EXPLICIT_LEVEL) { 1871 dirProps[isolateStartStack[stackLast]] = RLI; 1872 flags |= DirPropFlag(RLI); 1873 } 1874 state = LOOKING_FOR_PDI; 1875 } 1876 lastStrong = R; 1877 if (dirProp == AL) 1878 lastArabicPos = i - 1; 1879 continue; 1880 } 1881 if (dirProp >= FSI && dirProp <= RLI) { /* FSI, LRI or RLI */ 1882 stackLast++; 1883 if (stackLast <= MAX_EXPLICIT_LEVEL) { 1884 isolateStartStack[stackLast] = i - 1; 1885 previousStateStack[stackLast] = state; 1886 } 1887 if (dirProp == FSI) { 1888 dirProps[i-1] = LRI; /* default if no strong char */ 1889 state = SEEKING_STRONG_FOR_FSI; 1890 } 1891 else 1892 state = LOOKING_FOR_PDI; 1893 continue; 1894 } 1895 if (dirProp == PDI) { 1896 if (state == SEEKING_STRONG_FOR_FSI) { 1897 if (stackLast <= MAX_EXPLICIT_LEVEL) { 1898 /* no need for next statement, already set by default */ 1899 /* dirProps[isolateStartStack[stackLast]] = LRI; */ 1900 flags |= DirPropFlag(LRI); 1901 } 1902 } 1903 if (stackLast >= 0) { 1904 if (stackLast <= MAX_EXPLICIT_LEVEL) 1905 state = previousStateStack[stackLast]; 1906 stackLast--; 1907 } 1908 continue; 1909 } 1910 if (dirProp == B) { 1911 if (i < originalLength && uchar == CR && text[i] == LF) /* do nothing on the CR */ 1912 continue; 1913 paras_limit[paraCount - 1] = i; 1914 if (isDefaultLevelInverse && lastStrong == R) 1915 paras_level[paraCount - 1] = 1; 1916 if ((reorderingOptions & OPTION_STREAMING) != 0) { 1917 /* When streaming, we only process whole paragraphs 1918 thus some updates are only done on paragraph boundaries */ 1919 length = i; /* i is index to next character */ 1920 this.controlCount = controlCount; 1921 } 1922 if (i < originalLength) { /* B not last char in text */ 1923 paraCount++; 1924 checkParaCount(); /* check that there is enough memory for a new para entry */ 1925 if (isDefaultLevel) { 1926 paras_level[paraCount - 1] = defaultParaLevel; 1927 state = SEEKING_STRONG_FOR_PARA; 1928 lastStrong = defaultParaLevel; 1929 } else { 1930 paras_level[paraCount - 1] = paraLevel; 1931 state = NOT_SEEKING_STRONG; 1932 } 1933 stackLast = -1; 1934 } 1935 continue; 1936 } 1937 } 1938 /* +Ignore still open isolate sequences with overflow */ 1939 if (stackLast > MAX_EXPLICIT_LEVEL) { 1940 stackLast = MAX_EXPLICIT_LEVEL; 1941 state=SEEKING_STRONG_FOR_FSI; /* to be on the safe side */ 1942 } 1943 /* Resolve direction of still unresolved open FSI sequences */ 1944 while (stackLast >= 0) { 1945 if (state == SEEKING_STRONG_FOR_FSI) { 1946 /* no need for next statement, already set by default */ 1947 /* dirProps[isolateStartStack[stackLast]] = LRI; */ 1948 flags |= DirPropFlag(LRI); 1949 break; 1950 } 1951 state = previousStateStack[stackLast]; 1952 stackLast--; 1953 } 1954 /* When streaming, ignore text after the last paragraph separator */ 1955 if ((reorderingOptions & OPTION_STREAMING) != 0) { 1956 if (length < originalLength) 1957 paraCount--; 1958 } else { 1959 paras_limit[paraCount - 1] = originalLength; 1960 this.controlCount = controlCount; 1961 } 1962 /* For inverse bidi, default para direction is RTL if there is 1963 a strong R or AL at either end of the paragraph */ 1964 if (isDefaultLevelInverse && lastStrong == R) { 1965 paras_level[paraCount - 1] = 1; 1966 } 1967 if (isDefaultLevel) { 1968 paraLevel = paras_level[0]; 1969 } 1970 /* The following is needed to resolve the text direction for default level 1971 paragraphs containing no strong character */ 1972 for (i = 0; i < paraCount; i++) 1973 flags |= DirPropFlagLR(paras_level[i]); 1974 1975 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) { 1976 flags |= DirPropFlag(L); 1977 } 1978 } 1979 1980 /* determine the paragraph level at position index */ 1981 byte GetParaLevelAt(int pindex) 1982 { 1983 if (defaultParaLevel == 0 || pindex < paras_limit[0]) 1984 return paraLevel; 1985 int i; 1986 for (i = 1; i < paraCount; i++) 1987 if (pindex < paras_limit[i]) 1988 break; 1989 if (i >= paraCount) 1990 i = paraCount - 1; 1991 return paras_level[i]; 1992 } 1993 1994 /* Functions for handling paired brackets ----------------------------------- */ 1995 1996 /* In the isoRuns array, the first entry is used for text outside of any 1997 isolate sequence. Higher entries are used for each more deeply nested 1998 isolate sequence. isoRunLast is the index of the last used entry. The 1999 openings array is used to note the data of opening brackets not yet 2000 matched by a closing bracket, or matched but still susceptible to change 2001 level. 2002 Each isoRun entry contains the index of the first and 2003 one-after-last openings entries for pending opening brackets it 2004 contains. The next openings entry to use is the one-after-last of the 2005 most deeply nested isoRun entry. 2006 isoRun entries also contain their current embedding level and the last 2007 encountered strong character, since these will be needed to resolve 2008 the level of paired brackets. */ 2009 2010 private void bracketInit(BracketData bd) { 2011 bd.isoRunLast = 0; 2012 bd.isoRuns[0] = new IsoRun(); 2013 bd.isoRuns[0].start = 0; 2014 bd.isoRuns[0].limit = 0; 2015 bd.isoRuns[0].level = GetParaLevelAt(0); 2016 bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(GetParaLevelAt(0) & 1); 2017 bd.isoRuns[0].contextPos = 0; 2018 bd.openings = new Opening[SIMPLE_OPENINGS_COUNT]; 2019 bd.isNumbersSpecial = reorderingMode == REORDER_NUMBERS_SPECIAL || 2020 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL; 2021 } 2022 2023 /* paragraph boundary */ 2024 private void bracketProcessB(BracketData bd, byte level) { 2025 bd.isoRunLast = 0; 2026 bd.isoRuns[0].limit = 0; 2027 bd.isoRuns[0].level = level; 2028 bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(level & 1); 2029 bd.isoRuns[0].contextPos = 0; 2030 } 2031 2032 /* LRE, LRO, RLE, RLO, PDF */ 2033 private void bracketProcessBoundary(BracketData bd, int lastCcPos, 2034 byte contextLevel, byte embeddingLevel) { 2035 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2036 if ((DirPropFlag(dirProps[lastCcPos]) & MASK_ISO) != 0) /* after an isolate */ 2037 return; 2038 if (NoOverride(embeddingLevel) > NoOverride(contextLevel)) /* not a PDF */ 2039 contextLevel = embeddingLevel; 2040 pLastIsoRun.limit = pLastIsoRun.start; 2041 pLastIsoRun.level = embeddingLevel; 2042 pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(contextLevel & 1); 2043 pLastIsoRun.contextPos = lastCcPos; 2044 } 2045 2046 /* LRI or RLI */ 2047 private void bracketProcessLRI_RLI(BracketData bd, byte level) { 2048 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2049 short lastLimit; 2050 pLastIsoRun.lastBase = ON; 2051 lastLimit = pLastIsoRun.limit; 2052 bd.isoRunLast++; 2053 pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2054 if (pLastIsoRun == null) 2055 pLastIsoRun = bd.isoRuns[bd.isoRunLast] = new IsoRun(); 2056 pLastIsoRun.start = pLastIsoRun.limit = lastLimit; 2057 pLastIsoRun.level = level; 2058 pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(level & 1); 2059 pLastIsoRun.contextPos = 0; 2060 } 2061 2062 /* PDI */ 2063 private void bracketProcessPDI(BracketData bd) { 2064 IsoRun pLastIsoRun; 2065 bd.isoRunLast--; 2066 pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2067 pLastIsoRun.lastBase = ON; 2068 } 2069 2070 /* newly found opening bracket: create an openings entry */ 2071 private void bracketAddOpening(BracketData bd, char match, int position) { 2072 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2073 Opening pOpening; 2074 if (pLastIsoRun.limit >= bd.openings.length) { /* no available new entry */ 2075 Opening[] saveOpenings = bd.openings; 2076 int count; 2077 try { 2078 count = bd.openings.length; 2079 bd.openings = new Opening[count * 2]; 2080 } catch (Exception e) { 2081 throw new OutOfMemoryError("Failed to allocate memory for openings"); 2082 } 2083 System.arraycopy(saveOpenings, 0, bd.openings, 0, count); 2084 } 2085 pOpening = bd.openings[pLastIsoRun.limit]; 2086 if (pOpening == null) 2087 pOpening = bd.openings[pLastIsoRun.limit]= new Opening(); 2088 pOpening.position = position; 2089 pOpening.match = match; 2090 pOpening.contextDir = pLastIsoRun.contextDir; 2091 pOpening.contextPos = pLastIsoRun.contextPos; 2092 pOpening.flags = 0; 2093 pLastIsoRun.limit++; 2094 } 2095 2096 /* change N0c1 to N0c2 when a preceding bracket is assigned the embedding level */ 2097 private void fixN0c(BracketData bd, int openingIndex, int newPropPosition, byte newProp) { 2098 /* This function calls itself recursively */ 2099 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2100 Opening qOpening; 2101 int k, openingPosition, closingPosition; 2102 for (k = openingIndex+1; k < pLastIsoRun.limit; k++) { 2103 qOpening = bd.openings[k]; 2104 if (qOpening.match >= 0) /* not an N0c match */ 2105 continue; 2106 if (newPropPosition < qOpening.contextPos) 2107 break; 2108 if (newPropPosition >= qOpening.position) 2109 continue; 2110 if (newProp == qOpening.contextDir) 2111 break; 2112 openingPosition = qOpening.position; 2113 dirProps[openingPosition] = newProp; 2114 closingPosition = -(qOpening.match); 2115 dirProps[closingPosition] = newProp; 2116 qOpening.match = 0; /* prevent further changes */ 2117 fixN0c(bd, k, openingPosition, newProp); 2118 fixN0c(bd, k, closingPosition, newProp); 2119 } 2120 } 2121 2122 /* process closing bracket; return L or R if N0b or N0c, ON if N0d */ 2123 private byte bracketProcessClosing(BracketData bd, int openIdx, int position) { 2124 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2125 Opening pOpening, qOpening; 2126 byte direction; 2127 boolean stable; 2128 byte newProp; 2129 pOpening = bd.openings[openIdx]; 2130 direction = (byte)(pLastIsoRun.level & 1); 2131 stable = true; /* assume stable until proved otherwise */ 2132 2133 /* The stable flag is set when brackets are paired and their 2134 level is resolved and cannot be changed by what will be 2135 found later in the source string. 2136 An unstable match can occur only when applying N0c, where 2137 the resolved level depends on the preceding context, and 2138 this context may be affected by text occurring later. 2139 Example: RTL paragraph containing: abc[(latin) HEBREW] 2140 When the closing parenthesis is encountered, it appears 2141 that N0c1 must be applied since 'abc' sets an opposite 2142 direction context and both parentheses receive level 2. 2143 However, when the closing square bracket is processed, 2144 N0b applies because of 'HEBREW' being included within the 2145 brackets, thus the square brackets are treated like R and 2146 receive level 1. However, this changes the preceding 2147 context of the opening parenthesis, and it now appears 2148 that N0c2 must be applied to the parentheses rather than 2149 N0c1. */ 2150 2151 if ((direction == 0 && (pOpening.flags & FOUND_L) > 0) || 2152 (direction == 1 && (pOpening.flags & FOUND_R) > 0)) { /* N0b */ 2153 newProp = direction; 2154 } 2155 else if ((pOpening.flags & (FOUND_L | FOUND_R)) != 0) { /* N0c */ 2156 /* it is stable if there is no preceding text or in 2157 conditions too complicated and not worth checking */ 2158 stable = (openIdx == pLastIsoRun.start); 2159 if (direction != pOpening.contextDir) 2160 newProp = pOpening.contextDir; /* N0c1 */ 2161 else 2162 newProp = direction; /* N0c2 */ 2163 } else { 2164 /* forget this and any brackets nested within this pair */ 2165 pLastIsoRun.limit = (short)openIdx; 2166 return ON; /* N0d */ 2167 } 2168 dirProps[pOpening.position] = newProp; 2169 dirProps[position] = newProp; 2170 /* Update nested N0c pairs that may be affected */ 2171 fixN0c(bd, openIdx, pOpening.position, newProp); 2172 if (stable) { 2173 pLastIsoRun.limit = (short)openIdx; /* forget any brackets nested within this pair */ 2174 /* remove lower located synonyms if any */ 2175 while (pLastIsoRun.limit > pLastIsoRun.start && 2176 bd.openings[pLastIsoRun.limit - 1].position == pOpening.position) 2177 pLastIsoRun.limit--; 2178 } else { 2179 int k; 2180 pOpening.match = -position; 2181 /* neutralize lower located synonyms if any */ 2182 k = openIdx - 1; 2183 while (k >= pLastIsoRun.start && 2184 bd.openings[k].position == pOpening.position) 2185 bd.openings[k--].match = 0; 2186 /* neutralize any unmatched opening between the current pair; 2187 this will also neutralize higher located synonyms if any */ 2188 for (k = openIdx + 1; k < pLastIsoRun.limit; k++) { 2189 qOpening =bd.openings[k]; 2190 if (qOpening.position >= position) 2191 break; 2192 if (qOpening.match > 0) 2193 qOpening.match = 0; 2194 } 2195 } 2196 return newProp; 2197 } 2198 2199 /* handle strong characters, digits and candidates for closing brackets */ 2200 private void bracketProcessChar(BracketData bd, int position) { 2201 IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; 2202 byte dirProp, newProp; 2203 byte level; 2204 dirProp = dirProps[position]; 2205 if (dirProp == ON) { 2206 char c, match; 2207 int idx; 2208 /* First see if it is a matching closing bracket. Hopefully, this is 2209 more efficient than checking if it is a closing bracket at all */ 2210 c = text[position]; 2211 for (idx = pLastIsoRun.limit - 1; idx >= pLastIsoRun.start; idx--) { 2212 if (bd.openings[idx].match != c) 2213 continue; 2214 /* We have a match */ 2215 newProp = bracketProcessClosing(bd, idx, position); 2216 if(newProp == ON) { /* N0d */ 2217 c = 0; /* prevent handling as an opening */ 2218 break; 2219 } 2220 pLastIsoRun.lastBase = ON; 2221 pLastIsoRun.contextDir = newProp; 2222 pLastIsoRun.contextPos = position; 2223 level = levels[position]; 2224 if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */ 2225 short flag; 2226 int i; 2227 newProp = (byte)(level & 1); 2228 pLastIsoRun.lastStrong = newProp; 2229 flag = (short)DirPropFlag(newProp); 2230 for (i = pLastIsoRun.start; i < idx; i++) 2231 bd.openings[i].flags |= flag; 2232 /* matching brackets are not overridden by LRO/RLO */ 2233 levels[position] &= ~LEVEL_OVERRIDE; 2234 } 2235 /* matching brackets are not overridden by LRO/RLO */ 2236 levels[bd.openings[idx].position] &= ~LEVEL_OVERRIDE; 2237 return; 2238 } 2239 /* We get here only if the ON character is not a matching closing 2240 bracket or it is a case of N0d */ 2241 /* Now see if it is an opening bracket */ 2242 if (c != 0) 2243 match = (char)UCharacter.getBidiPairedBracket(c); /* get the matching char */ 2244 else 2245 match = 0; 2246 if (match != c && /* has a matching char */ 2247 UCharacter.getIntPropertyValue(c, UProperty.BIDI_PAIRED_BRACKET_TYPE) == 2248 /* opening bracket */ UCharacter.BidiPairedBracketType.OPEN) { 2249 /* special case: process synonyms 2250 create an opening entry for each synonym */ 2251 if (match == 0x232A) { /* RIGHT-POINTING ANGLE BRACKET */ 2252 bracketAddOpening(bd, (char)0x3009, position); 2253 } 2254 else if (match == 0x3009) { /* RIGHT ANGLE BRACKET */ 2255 bracketAddOpening(bd, (char)0x232A, position); 2256 } 2257 bracketAddOpening(bd, match, position); 2258 } 2259 } 2260 level = levels[position]; 2261 if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */ 2262 newProp = (byte)(level & 1); 2263 if (dirProp != S && dirProp != WS && dirProp != ON) 2264 dirProps[position] = newProp; 2265 pLastIsoRun.lastBase = newProp; 2266 pLastIsoRun.lastStrong = newProp; 2267 pLastIsoRun.contextDir = newProp; 2268 pLastIsoRun.contextPos = position; 2269 } 2270 else if (dirProp <= R || dirProp == AL) { 2271 newProp = DirFromStrong(dirProp); 2272 pLastIsoRun.lastBase = dirProp; 2273 pLastIsoRun.lastStrong = dirProp; 2274 pLastIsoRun.contextDir = newProp; 2275 pLastIsoRun.contextPos = position; 2276 } 2277 else if(dirProp == EN) { 2278 pLastIsoRun.lastBase = EN; 2279 if (pLastIsoRun.lastStrong == L) { 2280 newProp = L; /* W7 */ 2281 if (!bd.isNumbersSpecial) 2282 dirProps[position] = ENL; 2283 pLastIsoRun.contextDir = L; 2284 pLastIsoRun.contextPos = position; 2285 } 2286 else { 2287 newProp = R; /* N0 */ 2288 if (pLastIsoRun.lastStrong == AL) 2289 dirProps[position] = AN; /* W2 */ 2290 else 2291 dirProps[position] = ENR; 2292 pLastIsoRun.contextDir = R; 2293 pLastIsoRun.contextPos = position; 2294 } 2295 } 2296 else if (dirProp == AN) { 2297 newProp = R; /* N0 */ 2298 pLastIsoRun.lastBase = AN; 2299 pLastIsoRun.contextDir = R; 2300 pLastIsoRun.contextPos = position; 2301 } 2302 else if (dirProp == NSM) { 2303 /* if the last real char was ON, change NSM to ON so that it 2304 will stay ON even if the last real char is a bracket which 2305 may be changed to L or R */ 2306 newProp = pLastIsoRun.lastBase; 2307 if (newProp == ON) 2308 dirProps[position] = newProp; 2309 } 2310 else { 2311 newProp = dirProp; 2312 pLastIsoRun.lastBase = dirProp; 2313 } 2314 if (newProp <= R || newProp == AL) { 2315 int i; 2316 short flag = (short)DirPropFlag(DirFromStrong(newProp)); 2317 for (i = pLastIsoRun.start; i < pLastIsoRun.limit; i++) 2318 if (position > bd.openings[i].position) 2319 bd.openings[i].flags |= flag; 2320 } 2321 } 2322 2323 /* perform (X1)..(X9) ------------------------------------------------------- */ 2324 2325 /* determine if the text is mixed-directional or single-directional */ 2326 private byte directionFromFlags() { 2327 /* if the text contains AN and neutrals, then some neutrals may become RTL */ 2328 if (!((flags & MASK_RTL) != 0 || 2329 ((flags & DirPropFlag(AN)) != 0 && 2330 (flags & MASK_POSSIBLE_N) != 0))) { 2331 return LTR; 2332 } else if ((flags & MASK_LTR) == 0) { 2333 return RTL; 2334 } else { 2335 return MIXED; 2336 } 2337 } 2338 2339 /* 2340 * Resolve the explicit levels as specified by explicit embedding codes. 2341 * Recalculate the flags to have them reflect the real properties 2342 * after taking the explicit embeddings into account. 2343 * 2344 * The BiDi algorithm is designed to result in the same behavior whether embedding 2345 * levels are externally specified (from "styled text", supposedly the preferred 2346 * method) or set by explicit embedding codes (LRx, RLx, PDF, FSI, PDI) in the plain text. 2347 * That is why (X9) instructs to remove all not-isolate explicit codes (and BN). 2348 * However, in a real implementation, the removal of these codes and their index 2349 * positions in the plain text is undesirable since it would result in 2350 * reallocated, reindexed text. 2351 * Instead, this implementation leaves the codes in there and just ignores them 2352 * in the subsequent processing. 2353 * In order to get the same reordering behavior, positions with a BN or a not-isolate 2354 * explicit embedding code just get the same level assigned as the last "real" 2355 * character. 2356 * 2357 * Some implementations, not this one, then overwrite some of these 2358 * directionality properties at "real" same-level-run boundaries by 2359 * L or R codes so that the resolution of weak types can be performed on the 2360 * entire paragraph at once instead of having to parse it once more and 2361 * perform that resolution on same-level-runs. 2362 * This limits the scope of the implicit rules in effectively 2363 * the same way as the run limits. 2364 * 2365 * Instead, this implementation does not modify these codes, except for 2366 * paired brackets whose properties (ON) may be replaced by L or R. 2367 * On one hand, the paragraph has to be scanned for same-level-runs, but 2368 * on the other hand, this saves another loop to reset these codes, 2369 * or saves making and modifying a copy of dirProps[]. 2370 * 2371 * 2372 * Note that (Pn) and (Xn) changed significantly from version 4 of the BiDi algorithm. 2373 * 2374 * 2375 * Handling the stack of explicit levels (Xn): 2376 * 2377 * With the BiDi stack of explicit levels, as pushed with each 2378 * LRE, RLE, LRO, RLO, LRI, RLI and FSI and popped with each PDF and PDI, 2379 * the explicit level must never exceed MAX_EXPLICIT_LEVEL. 2380 * 2381 * In order to have a correct push-pop semantics even in the case of overflows, 2382 * overflow counters and a valid isolate counter are used as described in UAX#9 2383 * section 3.3.2 "Explicit Levels and Directions". 2384 * 2385 * This implementation assumes that MAX_EXPLICIT_LEVEL is odd. 2386 * 2387 * Returns the direction 2388 * 2389 */ 2390 private byte resolveExplicitLevels() { 2391 int i = 0; 2392 byte dirProp; 2393 byte level = GetParaLevelAt(0); 2394 byte dirct; 2395 isolateCount = 0; 2396 2397 /* determine if the text is mixed-directional or single-directional */ 2398 dirct = directionFromFlags(); 2399 2400 /* we may not need to resolve any explicit levels */ 2401 if (dirct != MIXED) { 2402 /* not mixed directionality: levels don't matter - trailingWSStart will be 0 */ 2403 return dirct; 2404 } 2405 if (reorderingMode > REORDER_LAST_LOGICAL_TO_VISUAL) { 2406 /* inverse BiDi: mixed, but all characters are at the same embedding level */ 2407 /* set all levels to the paragraph level */ 2408 int paraIndex, start, limit; 2409 for (paraIndex = 0; paraIndex < paraCount; paraIndex++) { 2410 if (paraIndex == 0) 2411 start = 0; 2412 else 2413 start = paras_limit[paraIndex - 1]; 2414 limit = paras_limit[paraIndex]; 2415 level = paras_level[paraIndex]; 2416 for (i = start; i < limit; i++) 2417 levels[i] =level; 2418 } 2419 return dirct; /* no bracket matching for inverse BiDi */ 2420 } 2421 if ((flags & (MASK_EXPLICIT | MASK_ISO)) == 0) { 2422 /* no embeddings, set all levels to the paragraph level */ 2423 /* we still have to perform bracket matching */ 2424 int paraIndex, start, limit; 2425 BracketData bracketData = new BracketData(); 2426 bracketInit(bracketData); 2427 for (paraIndex = 0; paraIndex < paraCount; paraIndex++) { 2428 if (paraIndex == 0) 2429 start = 0; 2430 else 2431 start = paras_limit[paraIndex-1]; 2432 limit = paras_limit[paraIndex]; 2433 level = paras_level[paraIndex]; 2434 for (i = start; i < limit; i++) { 2435 levels[i] = level; 2436 dirProp = dirProps[i]; 2437 if (dirProp == BN) 2438 continue; 2439 if (dirProp == B) { 2440 if ((i + 1) < length) { 2441 if (text[i] == CR && text[i + 1] == LF) 2442 continue; /* skip CR when followed by LF */ 2443 bracketProcessB(bracketData, level); 2444 } 2445 continue; 2446 } 2447 bracketProcessChar(bracketData, i); 2448 } 2449 } 2450 return dirct; 2451 } 2452 /* continue to perform (Xn) */ 2453 2454 /* (X1) level is set for all codes, embeddingLevel keeps track of the push/pop operations */ 2455 /* both variables may carry the LEVEL_OVERRIDE flag to indicate the override status */ 2456 byte embeddingLevel = level, newLevel; 2457 byte previousLevel = level; /* previous level for regular (not CC) characters */ 2458 int lastCcPos = 0; /* index of last effective LRx,RLx, PDx */ 2459 2460 /* The following stack remembers the embedding level and the ISOLATE flag of level runs. 2461 stackLast points to its current entry. */ 2462 short[] stack = new short[MAX_EXPLICIT_LEVEL + 2]; /* we never push anything >= MAX_EXPLICIT_LEVEL 2463 but we need one more entry as base */ 2464 int stackLast = 0; 2465 int overflowIsolateCount = 0; 2466 int overflowEmbeddingCount = 0; 2467 int validIsolateCount = 0; 2468 BracketData bracketData = new BracketData(); 2469 bracketInit(bracketData); 2470 stack[0] = level; /* initialize base entry to para level, no override, no isolate */ 2471 2472 /* recalculate the flags */ 2473 flags = 0; 2474 2475 for (i = 0; i < length; i++) { 2476 dirProp = dirProps[i]; 2477 switch (dirProp) { 2478 case LRE: 2479 case RLE: 2480 case LRO: 2481 case RLO: 2482 /* (X2, X3, X4, X5) */ 2483 flags |= DirPropFlag(BN); 2484 levels[i] = previousLevel; 2485 if (dirProp == LRE || dirProp == LRO) 2486 /* least greater even level */ 2487 newLevel = (byte)((embeddingLevel+2) & ~(LEVEL_OVERRIDE | 1)); 2488 else 2489 /* least greater odd level */ 2490 newLevel = (byte)((NoOverride(embeddingLevel) + 1) | 1); 2491 if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 && 2492 overflowEmbeddingCount == 0) { 2493 lastCcPos = i; 2494 embeddingLevel = newLevel; 2495 if (dirProp == LRO || dirProp == RLO) 2496 embeddingLevel |= LEVEL_OVERRIDE; 2497 stackLast++; 2498 stack[stackLast] = embeddingLevel; 2499 /* we don't need to set LEVEL_OVERRIDE off for LRE and RLE 2500 since this has already been done for newLevel which is 2501 the source for embeddingLevel. 2502 */ 2503 } else { 2504 if (overflowIsolateCount == 0) 2505 overflowEmbeddingCount++; 2506 } 2507 break; 2508 case PDF: 2509 /* (X7) */ 2510 flags |= DirPropFlag(BN); 2511 levels[i] = previousLevel; 2512 /* handle all the overflow cases first */ 2513 if (overflowIsolateCount > 0) { 2514 break; 2515 } 2516 if (overflowEmbeddingCount > 0) { 2517 overflowEmbeddingCount--; 2518 break; 2519 } 2520 if (stackLast > 0 && stack[stackLast] < ISOLATE) { /* not an isolate entry */ 2521 lastCcPos = i; 2522 stackLast--; 2523 embeddingLevel = (byte)stack[stackLast]; 2524 } 2525 break; 2526 case LRI: 2527 case RLI: 2528 flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel); 2529 levels[i] = NoOverride(embeddingLevel); 2530 if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { 2531 bracketProcessBoundary(bracketData, lastCcPos, 2532 previousLevel, embeddingLevel); 2533 flags |= DirPropFlagMultiRuns; 2534 } 2535 previousLevel = embeddingLevel; 2536 /* (X5a, X5b) */ 2537 if (dirProp == LRI) 2538 /* least greater even level */ 2539 newLevel=(byte)((embeddingLevel+2)&~(LEVEL_OVERRIDE|1)); 2540 else 2541 /* least greater odd level */ 2542 newLevel=(byte)((NoOverride(embeddingLevel)+1)|1); 2543 if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 2544 && overflowEmbeddingCount == 0) { 2545 flags |= DirPropFlag(dirProp); 2546 lastCcPos = i; 2547 validIsolateCount++; 2548 if (validIsolateCount > isolateCount) 2549 isolateCount = validIsolateCount; 2550 embeddingLevel = newLevel; 2551 /* we can increment stackLast without checking because newLevel 2552 will exceed UBIDI_MAX_EXPLICIT_LEVEL before stackLast overflows */ 2553 stackLast++; 2554 stack[stackLast] = (short)(embeddingLevel + ISOLATE); 2555 bracketProcessLRI_RLI(bracketData, embeddingLevel); 2556 } else { 2557 /* make it WS so that it is handled by adjustWSLevels() */ 2558 dirProps[i] = WS; 2559 overflowIsolateCount++; 2560 } 2561 break; 2562 case PDI: 2563 if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { 2564 bracketProcessBoundary(bracketData, lastCcPos, 2565 previousLevel, embeddingLevel); 2566 flags |= DirPropFlagMultiRuns; 2567 } 2568 /* (X6a) */ 2569 if (overflowIsolateCount > 0) { 2570 overflowIsolateCount--; 2571 /* make it WS so that it is handled by adjustWSLevels() */ 2572 dirProps[i] = WS; 2573 } 2574 else if (validIsolateCount > 0) { 2575 flags |= DirPropFlag(PDI); 2576 lastCcPos = i; 2577 overflowEmbeddingCount = 0; 2578 while (stack[stackLast] < ISOLATE) /* pop embedding entries */ 2579 stackLast--; /* until the last isolate entry */ 2580 stackLast--; /* pop also the last isolate entry */ 2581 validIsolateCount--; 2582 bracketProcessPDI(bracketData); 2583 } else 2584 /* make it WS so that it is handled by adjustWSLevels() */ 2585 dirProps[i] = WS; 2586 embeddingLevel = (byte)(stack[stackLast] & ~ISOLATE); 2587 flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel); 2588 previousLevel = embeddingLevel; 2589 levels[i] = NoOverride(embeddingLevel); 2590 break; 2591 case B: 2592 flags |= DirPropFlag(B); 2593 levels[i] = GetParaLevelAt(i); 2594 if ((i + 1) < length) { 2595 if (text[i] == CR && text[i + 1] == LF) 2596 break; /* skip CR when followed by LF */ 2597 overflowEmbeddingCount = overflowIsolateCount = 0; 2598 validIsolateCount = 0; 2599 stackLast = 0; 2600 previousLevel = embeddingLevel = GetParaLevelAt(i + 1); 2601 stack[0] = embeddingLevel; /* initialize base entry to para level, no override, no isolate */ 2602 bracketProcessB(bracketData, embeddingLevel); 2603 } 2604 break; 2605 case BN: 2606 /* BN, LRE, RLE, and PDF are supposed to be removed (X9) */ 2607 /* they will get their levels set correctly in adjustWSLevels() */ 2608 levels[i] = previousLevel; 2609 flags |= DirPropFlag(BN); 2610 break; 2611 default: 2612 /* all other types are normal characters and get the "real" level */ 2613 if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { 2614 bracketProcessBoundary(bracketData, lastCcPos, 2615 previousLevel, embeddingLevel); 2616 flags |= DirPropFlagMultiRuns; 2617 if ((embeddingLevel & LEVEL_OVERRIDE) != 0) 2618 flags |= DirPropFlagO(embeddingLevel); 2619 else 2620 flags |= DirPropFlagE(embeddingLevel); 2621 } 2622 previousLevel = embeddingLevel; 2623 levels[i] = embeddingLevel; 2624 bracketProcessChar(bracketData, i); 2625 /* the dirProp may have been changed in bracketProcessChar() */ 2626 flags |= DirPropFlag(dirProps[i]); 2627 break; 2628 } 2629 } 2630 if ((flags & MASK_EMBEDDING) != 0) { 2631 flags |= DirPropFlagLR(paraLevel); 2632 } 2633 if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) { 2634 flags |= DirPropFlag(L); 2635 } 2636 /* again, determine if the text is mixed-directional or single-directional */ 2637 dirct = directionFromFlags(); 2638 2639 return dirct; 2640 } 2641 2642 /* 2643 * Use a pre-specified embedding levels array: 2644 * 2645 * Adjust the directional properties for overrides (->LEVEL_OVERRIDE), 2646 * ignore all explicit codes (X9), 2647 * and check all the preset levels. 2648 * 2649 * Recalculate the flags to have them reflect the real properties 2650 * after taking the explicit embeddings into account. 2651 */ 2652 private byte checkExplicitLevels() { 2653 byte dirProp; 2654 int i; 2655 int isolateCount = 0; 2656 2657 this.flags = 0; /* collect all directionalities in the text */ 2658 byte level; 2659 this.isolateCount = 0; 2660 2661 for (i = 0; i < length; ++i) { 2662 level = levels[i]; 2663 dirProp = dirProps[i]; 2664 if (dirProp == LRI || dirProp == RLI) { 2665 isolateCount++; 2666 if (isolateCount > this.isolateCount) 2667 this.isolateCount = isolateCount; 2668 } 2669 else if (dirProp == PDI) 2670 isolateCount--; 2671 else if (dirProp == B) 2672 isolateCount = 0; 2673 if ((level & LEVEL_OVERRIDE) != 0) { 2674 /* keep the override flag in levels[i] but adjust the flags */ 2675 level &= ~LEVEL_OVERRIDE; /* make the range check below simpler */ 2676 flags |= DirPropFlagO(level); 2677 } else { 2678 /* set the flags */ 2679 flags |= DirPropFlagE(level) | DirPropFlag(dirProp); 2680 } 2681 if ((level < GetParaLevelAt(i) && 2682 !((0 == level) && (dirProp == B))) || 2683 (MAX_EXPLICIT_LEVEL < level)) { 2684 /* level out of bounds */ 2685 throw new IllegalArgumentException("level " + level + 2686 " out of bounds at " + i); 2687 } 2688 } 2689 if ((flags & MASK_EMBEDDING) != 0) 2690 flags |= DirPropFlagLR(paraLevel); 2691 /* determine if the text is mixed-directional or single-directional */ 2692 return directionFromFlags(); 2693 } 2694 2695 /*********************************************************************/ 2696 /* The Properties state machine table */ 2697 /*********************************************************************/ 2698 /* */ 2699 /* All table cells are 8 bits: */ 2700 /* bits 0..4: next state */ 2701 /* bits 5..7: action to perform (if > 0) */ 2702 /* */ 2703 /* Cells may be of format "n" where n represents the next state */ 2704 /* (except for the rightmost column). */ 2705 /* Cells may also be of format "_(x,y)" where x represents an action */ 2706 /* to perform and y represents the next state. */ 2707 /* */ 2708 /*********************************************************************/ 2709 /* Definitions and type for properties state tables */ 2710 /*********************************************************************/ 2711 private static final int IMPTABPROPS_COLUMNS = 16; 2712 private static final int IMPTABPROPS_RES = IMPTABPROPS_COLUMNS - 1; 2713 private static short GetStateProps(short cell) { 2714 return (short)(cell & 0x1f); 2715 } 2716 private static short GetActionProps(short cell) { 2717 return (short)(cell >> 5); 2718 } 2719 2720 private static final short groupProp[] = /* dirProp regrouped */ 2721 { 2722 /* L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN FSI LRI RLI PDI ENL ENR */ 2723 0, 1, 2, 7, 8, 3, 9, 6, 5, 4, 4, 10, 10, 12, 10, 10, 10, 11, 10, 4, 4, 4, 4, 13, 14 2724 }; 2725 private static final short _L = 0; 2726 private static final short _R = 1; 2727 private static final short _EN = 2; 2728 private static final short _AN = 3; 2729 private static final short _ON = 4; 2730 private static final short _S = 5; 2731 private static final short _B = 6; /* reduced dirProp */ 2732 2733 /*********************************************************************/ 2734 /* */ 2735 /* PROPERTIES STATE TABLE */ 2736 /* */ 2737 /* In table impTabProps, */ 2738 /* - the ON column regroups ON and WS, FSI, RLI, LRI and PDI */ 2739 /* - the BN column regroups BN, LRE, RLE, LRO, RLO, PDF */ 2740 /* - the Res column is the reduced property assigned to a run */ 2741 /* */ 2742 /* Action 1: process current run1, init new run1 */ 2743 /* 2: init new run2 */ 2744 /* 3: process run1, process run2, init new run1 */ 2745 /* 4: process run1, set run1=run2, init new run2 */ 2746 /* */ 2747 /* Notes: */ 2748 /* 1) This table is used in resolveImplicitLevels(). */ 2749 /* 2) This table triggers actions when there is a change in the Bidi*/ 2750 /* property of incoming characters (action 1). */ 2751 /* 3) Most such property sequences are processed immediately (in */ 2752 /* fact, passed to processPropertySeq(). */ 2753 /* 4) However, numbers are assembled as one sequence. This means */ 2754 /* that undefined situations (like CS following digits, until */ 2755 /* it is known if the next char will be a digit) are held until */ 2756 /* following chars define them. */ 2757 /* Example: digits followed by CS, then comes another CS or ON; */ 2758 /* the digits will be processed, then the CS assigned */ 2759 /* as the start of an ON sequence (action 3). */ 2760 /* 5) There are cases where more than one sequence must be */ 2761 /* processed, for instance digits followed by CS followed by L: */ 2762 /* the digits must be processed as one sequence, and the CS */ 2763 /* must be processed as an ON sequence, all this before starting */ 2764 /* assembling chars for the opening L sequence. */ 2765 /* */ 2766 /* */ 2767 private static final short impTabProps[][] = 2768 { 2769/* L, R, EN, AN, ON, S, B, ES, ET, CS, BN, NSM, AL, ENL, ENR, Res */ 2770/* 0 Init */ { 1, 2, 4, 5, 7, 15, 17, 7, 9, 7, 0, 7, 3, 18, 21, _ON }, 2771/* 1 L */ { 1, 32+2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 1, 1, 32+3, 32+18, 32+21, _L }, 2772/* 2 R */ { 32+1, 2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 2, 2, 32+3, 32+18, 32+21, _R }, 2773/* 3 AL */ { 32+1, 32+2, 32+6, 32+6, 32+8, 32+16, 32+17, 32+8, 32+8, 32+8, 3, 3, 3, 32+18, 32+21, _R }, 2774/* 4 EN */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 64+10, 11, 64+10, 4, 4, 32+3, 18, 21, _EN }, 2775/* 5 AN */ { 32+1, 32+2, 32+4, 5, 32+7, 32+15, 32+17, 32+7, 32+9, 64+12, 5, 5, 32+3, 32+18, 32+21, _AN }, 2776/* 6 AL:EN/AN */ { 32+1, 32+2, 6, 6, 32+8, 32+16, 32+17, 32+8, 32+8, 64+13, 6, 6, 32+3, 18, 21, _AN }, 2777/* 7 ON */ { 32+1, 32+2, 32+4, 32+5, 7, 32+15, 32+17, 7, 64+14, 7, 7, 7, 32+3, 32+18, 32+21, _ON }, 2778/* 8 AL:ON */ { 32+1, 32+2, 32+6, 32+6, 8, 32+16, 32+17, 8, 8, 8, 8, 8, 32+3, 32+18, 32+21, _ON }, 2779/* 9 ET */ { 32+1, 32+2, 4, 32+5, 7, 32+15, 32+17, 7, 9, 7, 9, 9, 32+3, 18, 21, _ON }, 2780/*10 EN+ES/CS */ { 96+1, 96+2, 4, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 10, 128+7, 96+3, 18, 21, _EN }, 2781/*11 EN+ET */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 32+7, 11, 32+7, 11, 11, 32+3, 18, 21, _EN }, 2782/*12 AN+CS */ { 96+1, 96+2, 96+4, 5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 12, 128+7, 96+3, 96+18, 96+21, _AN }, 2783/*13 AL:EN/AN+CS */ { 96+1, 96+2, 6, 6, 128+8, 96+16, 96+17, 128+8, 128+8, 128+8, 13, 128+8, 96+3, 18, 21, _AN }, 2784/*14 ON+ET */ { 32+1, 32+2, 128+4, 32+5, 7, 32+15, 32+17, 7, 14, 7, 14, 14, 32+3,128+18,128+21, _ON }, 2785/*15 S */ { 32+1, 32+2, 32+4, 32+5, 32+7, 15, 32+17, 32+7, 32+9, 32+7, 15, 32+7, 32+3, 32+18, 32+21, _S }, 2786/*16 AL:S */ { 32+1, 32+2, 32+6, 32+6, 32+8, 16, 32+17, 32+8, 32+8, 32+8, 16, 32+8, 32+3, 32+18, 32+21, _S }, 2787/*17 B */ { 32+1, 32+2, 32+4, 32+5, 32+7, 32+15, 17, 32+7, 32+9, 32+7, 17, 32+7, 32+3, 32+18, 32+21, _B }, 2788/*18 ENL */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 64+19, 20, 64+19, 18, 18, 32+3, 18, 21, _L }, 2789/*19 ENL+ES/CS */ { 96+1, 96+2, 18, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 19, 128+7, 96+3, 18, 21, _L }, 2790/*20 ENL+ET */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 32+7, 20, 32+7, 20, 20, 32+3, 18, 21, _L }, 2791/*21 ENR */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 64+22, 23, 64+22, 21, 21, 32+3, 18, 21, _AN }, 2792/*22 ENR+ES/CS */ { 96+1, 96+2, 21, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 22, 128+7, 96+3, 18, 21, _AN }, 2793/*23 ENR+ET */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 32+7, 23, 32+7, 23, 23, 32+3, 18, 21, _AN } 2794 }; 2795 2796 /*********************************************************************/ 2797 /* The levels state machine tables */ 2798 /*********************************************************************/ 2799 /* */ 2800 /* All table cells are 8 bits: */ 2801 /* bits 0..3: next state */ 2802 /* bits 4..7: action to perform (if > 0) */ 2803 /* */ 2804 /* Cells may be of format "n" where n represents the next state */ 2805 /* (except for the rightmost column). */ 2806 /* Cells may also be of format "_(x,y)" where x represents an action */ 2807 /* to perform and y represents the next state. */ 2808 /* */ 2809 /* This format limits each table to 16 states each and to 15 actions.*/ 2810 /* */ 2811 /*********************************************************************/ 2812 /* Definitions and type for levels state tables */ 2813 /*********************************************************************/ 2814 private static final int IMPTABLEVELS_COLUMNS = _B + 2; 2815 private static final int IMPTABLEVELS_RES = IMPTABLEVELS_COLUMNS - 1; 2816 private static short GetState(byte cell) { return (short)(cell & 0x0f); } 2817 private static short GetAction(byte cell) { return (short)(cell >> 4); } 2818 2819 private static class ImpTabPair { 2820 byte[][][] imptab; 2821 short[][] impact; 2822 2823 ImpTabPair(byte[][] table1, byte[][] table2, 2824 short[] act1, short[] act2) { 2825 imptab = new byte[][][] {table1, table2}; 2826 impact = new short[][] {act1, act2}; 2827 } 2828 } 2829 2830 /*********************************************************************/ 2831 /* */ 2832 /* LEVELS STATE TABLES */ 2833 /* */ 2834 /* In all levels state tables, */ 2835 /* - state 0 is the initial state */ 2836 /* - the Res column is the increment to add to the text level */ 2837 /* for this property sequence. */ 2838 /* */ 2839 /* The impact arrays for each table of a pair map the local action */ 2840 /* numbers of the table to the total list of actions. For instance, */ 2841 /* action 2 in a given table corresponds to the action number which */ 2842 /* appears in entry [2] of the impact array for that table. */ 2843 /* The first entry of all impact arrays must be 0. */ 2844 /* */ 2845 /* Action 1: init conditional sequence */ 2846 /* 2: prepend conditional sequence to current sequence */ 2847 /* 3: set ON sequence to new level - 1 */ 2848 /* 4: init EN/AN/ON sequence */ 2849 /* 5: fix EN/AN/ON sequence followed by R */ 2850 /* 6: set previous level sequence to level 2 */ 2851 /* */ 2852 /* Notes: */ 2853 /* 1) These tables are used in processPropertySeq(). The input */ 2854 /* is property sequences as determined by resolveImplicitLevels. */ 2855 /* 2) Most such property sequences are processed immediately */ 2856 /* (levels are assigned). */ 2857 /* 3) However, some sequences cannot be assigned a final level till */ 2858 /* one or more following sequences are received. For instance, */ 2859 /* ON following an R sequence within an even-level paragraph. */ 2860 /* If the following sequence is R, the ON sequence will be */ 2861 /* assigned basic run level+1, and so will the R sequence. */ 2862 /* 4) S is generally handled like ON, since its level will be fixed */ 2863 /* to paragraph level in adjustWSLevels(). */ 2864 /* */ 2865 2866 private static final byte impTabL_DEFAULT[][] = /* Even paragraph level */ 2867 /* In this table, conditional sequences receive the lower possible level 2868 until proven otherwise. 2869 */ 2870 { 2871 /* L, R, EN, AN, ON, S, B, Res */ 2872 /* 0 : init */ { 0, 1, 0, 2, 0, 0, 0, 0 }, 2873 /* 1 : R */ { 0, 1, 3, 3, 0x14, 0x14, 0, 1 }, 2874 /* 2 : AN */ { 0, 1, 0, 2, 0x15, 0x15, 0, 2 }, 2875 /* 3 : R+EN/AN */ { 0, 1, 3, 3, 0x14, 0x14, 0, 2 }, 2876 /* 4 : R+ON */ { 0, 0x21, 0x33, 0x33, 4, 4, 0, 0 }, 2877 /* 5 : AN+ON */ { 0, 0x21, 0, 0x32, 5, 5, 0, 0 } 2878 }; 2879 2880 private static final byte impTabR_DEFAULT[][] = /* Odd paragraph level */ 2881 /* In this table, conditional sequences receive the lower possible level 2882 until proven otherwise. 2883 */ 2884 { 2885 /* L, R, EN, AN, ON, S, B, Res */ 2886 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 }, 2887 /* 1 : L */ { 1, 0, 1, 3, 0x14, 0x14, 0, 1 }, 2888 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 }, 2889 /* 3 : L+AN */ { 1, 0, 1, 3, 5, 5, 0, 1 }, 2890 /* 4 : L+ON */ { 0x21, 0, 0x21, 3, 4, 4, 0, 0 }, 2891 /* 5 : L+AN+ON */ { 1, 0, 1, 3, 5, 5, 0, 0 } 2892 }; 2893 2894 private static final short[] impAct0 = {0,1,2,3,4}; 2895 2896 private static final ImpTabPair impTab_DEFAULT = new ImpTabPair( 2897 impTabL_DEFAULT, impTabR_DEFAULT, impAct0, impAct0); 2898 2899 private static final byte impTabL_NUMBERS_SPECIAL[][] = { /* Even paragraph level */ 2900 /* In this table, conditional sequences receive the lower possible 2901 level until proven otherwise. 2902 */ 2903 /* L, R, EN, AN, ON, S, B, Res */ 2904 /* 0 : init */ { 0, 2, 0x11, 0x11, 0, 0, 0, 0 }, 2905 /* 1 : L+EN/AN */ { 0, 0x42, 1, 1, 0, 0, 0, 0 }, 2906 /* 2 : R */ { 0, 2, 4, 4, 0x13, 0x13, 0, 1 }, 2907 /* 3 : R+ON */ { 0, 0x22, 0x34, 0x34, 3, 3, 0, 0 }, 2908 /* 4 : R+EN/AN */ { 0, 2, 4, 4, 0x13, 0x13, 0, 2 } 2909 }; 2910 private static final ImpTabPair impTab_NUMBERS_SPECIAL = new ImpTabPair( 2911 impTabL_NUMBERS_SPECIAL, impTabR_DEFAULT, impAct0, impAct0); 2912 2913 private static final byte impTabL_GROUP_NUMBERS_WITH_R[][] = { 2914 /* In this table, EN/AN+ON sequences receive levels as if associated with R 2915 until proven that there is L or sor/eor on both sides. AN is handled like EN. 2916 */ 2917 /* L, R, EN, AN, ON, S, B, Res */ 2918 /* 0 init */ { 0, 3, 0x11, 0x11, 0, 0, 0, 0 }, 2919 /* 1 EN/AN */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 2 }, 2920 /* 2 EN/AN+ON */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 1 }, 2921 /* 3 R */ { 0, 3, 5, 5, 0x14, 0, 0, 1 }, 2922 /* 4 R+ON */ { 0x20, 3, 5, 5, 4, 0x20, 0x20, 1 }, 2923 /* 5 R+EN/AN */ { 0, 3, 5, 5, 0x14, 0, 0, 2 } 2924 }; 2925 private static final byte impTabR_GROUP_NUMBERS_WITH_R[][] = { 2926 /* In this table, EN/AN+ON sequences receive levels as if associated with R 2927 until proven that there is L on both sides. AN is handled like EN. 2928 */ 2929 /* L, R, EN, AN, ON, S, B, Res */ 2930 /* 0 init */ { 2, 0, 1, 1, 0, 0, 0, 0 }, 2931 /* 1 EN/AN */ { 2, 0, 1, 1, 0, 0, 0, 1 }, 2932 /* 2 L */ { 2, 0, 0x14, 0x14, 0x13, 0, 0, 1 }, 2933 /* 3 L+ON */ { 0x22, 0, 4, 4, 3, 0, 0, 0 }, 2934 /* 4 L+EN/AN */ { 0x22, 0, 4, 4, 3, 0, 0, 1 } 2935 }; 2936 private static final ImpTabPair impTab_GROUP_NUMBERS_WITH_R = new 2937 ImpTabPair(impTabL_GROUP_NUMBERS_WITH_R, 2938 impTabR_GROUP_NUMBERS_WITH_R, impAct0, impAct0); 2939 2940 private static final byte impTabL_INVERSE_NUMBERS_AS_L[][] = { 2941 /* This table is identical to the Default LTR table except that EN and AN 2942 are handled like L. 2943 */ 2944 /* L, R, EN, AN, ON, S, B, Res */ 2945 /* 0 : init */ { 0, 1, 0, 0, 0, 0, 0, 0 }, 2946 /* 1 : R */ { 0, 1, 0, 0, 0x14, 0x14, 0, 1 }, 2947 /* 2 : AN */ { 0, 1, 0, 0, 0x15, 0x15, 0, 2 }, 2948 /* 3 : R+EN/AN */ { 0, 1, 0, 0, 0x14, 0x14, 0, 2 }, 2949 /* 4 : R+ON */ { 0x20, 1, 0x20, 0x20, 4, 4, 0x20, 1 }, 2950 /* 5 : AN+ON */ { 0x20, 1, 0x20, 0x20, 5, 5, 0x20, 1 } 2951 }; 2952 private static final byte impTabR_INVERSE_NUMBERS_AS_L[][] = { 2953 /* This table is identical to the Default RTL table except that EN and AN 2954 are handled like L. 2955 */ 2956 /* L, R, EN, AN, ON, S, B, Res */ 2957 /* 0 : init */ { 1, 0, 1, 1, 0, 0, 0, 0 }, 2958 /* 1 : L */ { 1, 0, 1, 1, 0x14, 0x14, 0, 1 }, 2959 /* 2 : EN/AN */ { 1, 0, 1, 1, 0, 0, 0, 1 }, 2960 /* 3 : L+AN */ { 1, 0, 1, 1, 5, 5, 0, 1 }, 2961 /* 4 : L+ON */ { 0x21, 0, 0x21, 0x21, 4, 4, 0, 0 }, 2962 /* 5 : L+AN+ON */ { 1, 0, 1, 1, 5, 5, 0, 0 } 2963 }; 2964 private static final ImpTabPair impTab_INVERSE_NUMBERS_AS_L = new ImpTabPair 2965 (impTabL_INVERSE_NUMBERS_AS_L, impTabR_INVERSE_NUMBERS_AS_L, 2966 impAct0, impAct0); 2967 2968 private static final byte impTabR_INVERSE_LIKE_DIRECT[][] = { /* Odd paragraph level */ 2969 /* In this table, conditional sequences receive the lower possible level 2970 until proven otherwise. 2971 */ 2972 /* L, R, EN, AN, ON, S, B, Res */ 2973 /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 }, 2974 /* 1 : L */ { 1, 0, 1, 2, 0x13, 0x13, 0, 1 }, 2975 /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 }, 2976 /* 3 : L+ON */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 0 }, 2977 /* 4 : L+ON+AN */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 3 }, 2978 /* 5 : L+AN+ON */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 2 }, 2979 /* 6 : L+ON+EN */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 1 } 2980 }; 2981 private static final short[] impAct1 = {0,1,13,14}; 2982 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT = new ImpTabPair( 2983 impTabL_DEFAULT, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1); 2984 2985 private static final byte impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = { 2986 /* The case handled in this table is (visually): R EN L 2987 */ 2988 /* L, R, EN, AN, ON, S, B, Res */ 2989 /* 0 : init */ { 0, 0x63, 0, 1, 0, 0, 0, 0 }, 2990 /* 1 : L+AN */ { 0, 0x63, 0, 1, 0x12, 0x30, 0, 4 }, 2991 /* 2 : L+AN+ON */ { 0x20, 0x63, 0x20, 1, 2, 0x30, 0x20, 3 }, 2992 /* 3 : R */ { 0, 0x63, 0x55, 0x56, 0x14, 0x30, 0, 3 }, 2993 /* 4 : R+ON */ { 0x30, 0x43, 0x55, 0x56, 4, 0x30, 0x30, 3 }, 2994 /* 5 : R+EN */ { 0x30, 0x43, 5, 0x56, 0x14, 0x30, 0x30, 4 }, 2995 /* 6 : R+AN */ { 0x30, 0x43, 0x55, 6, 0x14, 0x30, 0x30, 4 } 2996 }; 2997 private static final byte impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = { 2998 /* The cases handled in this table are (visually): R EN L 2999 R L AN L 3000 */ 3001 /* L, R, EN, AN, ON, S, B, Res */ 3002 /* 0 : init */ { 0x13, 0, 1, 1, 0, 0, 0, 0 }, 3003 /* 1 : R+EN/AN */ { 0x23, 0, 1, 1, 2, 0x40, 0, 1 }, 3004 /* 2 : R+EN/AN+ON */ { 0x23, 0, 1, 1, 2, 0x40, 0, 0 }, 3005 /* 3 : L */ { 3, 0, 3, 0x36, 0x14, 0x40, 0, 1 }, 3006 /* 4 : L+ON */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 0 }, 3007 /* 5 : L+ON+EN */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 1 }, 3008 /* 6 : L+AN */ { 0x53, 0x40, 6, 6, 4, 0x40, 0x40, 3 } 3009 }; 3010 private static final short[] impAct2 = {0,1,2,5,6,7,8}; 3011 private static final short[] impAct3 = {0,1,9,10,11,12}; 3012 private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT_WITH_MARKS = 3013 new ImpTabPair(impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS, 3014 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3); 3015 3016 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL = new ImpTabPair( 3017 impTabL_NUMBERS_SPECIAL, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1); 3018 3019 private static final byte impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS[][] = { 3020 /* The case handled in this table is (visually): R EN L 3021 */ 3022 /* L, R, EN, AN, ON, S, B, Res */ 3023 /* 0 : init */ { 0, 0x62, 1, 1, 0, 0, 0, 0 }, 3024 /* 1 : L+EN/AN */ { 0, 0x62, 1, 1, 0, 0x30, 0, 4 }, 3025 /* 2 : R */ { 0, 0x62, 0x54, 0x54, 0x13, 0x30, 0, 3 }, 3026 /* 3 : R+ON */ { 0x30, 0x42, 0x54, 0x54, 3, 0x30, 0x30, 3 }, 3027 /* 4 : R+EN/AN */ { 0x30, 0x42, 4, 4, 0x13, 0x30, 0x30, 4 } 3028 }; 3029 private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS = new 3030 ImpTabPair(impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS, 3031 impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3); 3032 3033 private static class LevState { 3034 byte[][] impTab; /* level table pointer */ 3035 short[] impAct; /* action map array */ 3036 int startON; /* start of ON sequence */ 3037 int startL2EN; /* start of level 2 sequence */ 3038 int lastStrongRTL; /* index of last found R or AL */ 3039 int runStart; /* start position of the run */ 3040 short state; /* current state */ 3041 byte runLevel; /* run level before implicit solving */ 3042 } 3043 3044 /*------------------------------------------------------------------------*/ 3045 3046 static final int FIRSTALLOC = 10; 3047 /* 3048 * param pos: position where to insert 3049 * param flag: one of LRM_BEFORE, LRM_AFTER, RLM_BEFORE, RLM_AFTER 3050 */ 3051 private void addPoint(int pos, int flag) 3052 { 3053 Point point = new Point(); 3054 3055 int len = insertPoints.points.length; 3056 if (len == 0) { 3057 insertPoints.points = new Point[FIRSTALLOC]; 3058 len = FIRSTALLOC; 3059 } 3060 if (insertPoints.size >= len) { /* no room for new point */ 3061 Point[] savePoints = insertPoints.points; 3062 insertPoints.points = new Point[len * 2]; 3063 System.arraycopy(savePoints, 0, insertPoints.points, 0, len); 3064 } 3065 point.pos = pos; 3066 point.flag = flag; 3067 insertPoints.points[insertPoints.size] = point; 3068 insertPoints.size++; 3069 } 3070 3071 private void setLevelsOutsideIsolates(int start, int limit, byte level) 3072 { 3073 byte dirProp; 3074 int isolateCount = 0, k; 3075 for (k = start; k < limit; k++) { 3076 dirProp = dirProps[k]; 3077 if (dirProp == PDI) 3078 isolateCount--; 3079 if (isolateCount == 0) 3080 levels[k] = level; 3081 if (dirProp == LRI || dirProp == RLI) 3082 isolateCount++; 3083 } 3084 } 3085 3086 /* perform rules (Wn), (Nn), and (In) on a run of the text ------------------ */ 3087 3088 /* 3089 * This implementation of the (Wn) rules applies all rules in one pass. 3090 * In order to do so, it needs a look-ahead of typically 1 character 3091 * (except for W5: sequences of ET) and keeps track of changes 3092 * in a rule Wp that affect a later Wq (p<q). 3093 * 3094 * The (Nn) and (In) rules are also performed in that same single loop, 3095 * but effectively one iteration behind for white space. 3096 * 3097 * Since all implicit rules are performed in one step, it is not necessary 3098 * to actually store the intermediate directional properties in dirProps[]. 3099 */ 3100 3101 private void processPropertySeq(LevState levState, short _prop, 3102 int start, int limit) { 3103 byte cell; 3104 byte[][] impTab = levState.impTab; 3105 short[] impAct = levState.impAct; 3106 short oldStateSeq,actionSeq; 3107 byte level, addLevel; 3108 int start0, k; 3109 3110 start0 = start; /* save original start position */ 3111 oldStateSeq = levState.state; 3112 cell = impTab[oldStateSeq][_prop]; 3113 levState.state = GetState(cell); /* isolate the new state */ 3114 actionSeq = impAct[GetAction(cell)]; /* isolate the action */ 3115 addLevel = impTab[levState.state][IMPTABLEVELS_RES]; 3116 3117 if (actionSeq != 0) { 3118 switch (actionSeq) { 3119 case 1: /* init ON seq */ 3120 levState.startON = start0; 3121 break; 3122 3123 case 2: /* prepend ON seq to current seq */ 3124 start = levState.startON; 3125 break; 3126 3127 case 3: /* EN/AN after R+ON */ 3128 level = (byte)(levState.runLevel + 1); 3129 setLevelsOutsideIsolates(levState.startON, start0, level); 3130 break; 3131 3132 case 4: /* EN/AN before R for NUMBERS_SPECIAL */ 3133 level = (byte)(levState.runLevel + 2); 3134 setLevelsOutsideIsolates(levState.startON, start0, level); 3135 break; 3136 3137 case 5: /* L or S after possible relevant EN/AN */ 3138 /* check if we had EN after R/AL */ 3139 if (levState.startL2EN >= 0) { 3140 addPoint(levState.startL2EN, LRM_BEFORE); 3141 } 3142 levState.startL2EN = -1; /* not within previous if since could also be -2 */ 3143 /* check if we had any relevant EN/AN after R/AL */ 3144 if ((insertPoints.points.length == 0) || 3145 (insertPoints.size <= insertPoints.confirmed)) { 3146 /* nothing, just clean up */ 3147 levState.lastStrongRTL = -1; 3148 /* check if we have a pending conditional segment */ 3149 level = impTab[oldStateSeq][IMPTABLEVELS_RES]; 3150 if ((level & 1) != 0 && levState.startON > 0) { /* after ON */ 3151 start = levState.startON; /* reset to basic run level */ 3152 } 3153 if (_prop == _S) { /* add LRM before S */ 3154 addPoint(start0, LRM_BEFORE); 3155 insertPoints.confirmed = insertPoints.size; 3156 } 3157 break; 3158 } 3159 /* reset previous RTL cont to level for LTR text */ 3160 for (k = levState.lastStrongRTL + 1; k < start0; k++) { 3161 /* reset odd level, leave runLevel+2 as is */ 3162 levels[k] = (byte)((levels[k] - 2) & ~1); 3163 } 3164 /* mark insert points as confirmed */ 3165 insertPoints.confirmed = insertPoints.size; 3166 levState.lastStrongRTL = -1; 3167 if (_prop == _S) { /* add LRM before S */ 3168 addPoint(start0, LRM_BEFORE); 3169 insertPoints.confirmed = insertPoints.size; 3170 } 3171 break; 3172 3173 case 6: /* R/AL after possible relevant EN/AN */ 3174 /* just clean up */ 3175 if (insertPoints.points.length > 0) 3176 /* remove all non confirmed insert points */ 3177 insertPoints.size = insertPoints.confirmed; 3178 levState.startON = -1; 3179 levState.startL2EN = -1; 3180 levState.lastStrongRTL = limit - 1; 3181 break; 3182 3183 case 7: /* EN/AN after R/AL + possible cont */ 3184 /* check for real AN */ 3185 if ((_prop == _AN) && (dirProps[start0] == AN) && 3186 (reorderingMode != REORDER_INVERSE_FOR_NUMBERS_SPECIAL)) 3187 { 3188 /* real AN */ 3189 if (levState.startL2EN == -1) { /* if no relevant EN already found */ 3190 /* just note the rightmost digit as a strong RTL */ 3191 levState.lastStrongRTL = limit - 1; 3192 break; 3193 } 3194 if (levState.startL2EN >= 0) { /* after EN, no AN */ 3195 addPoint(levState.startL2EN, LRM_BEFORE); 3196 levState.startL2EN = -2; 3197 } 3198 /* note AN */ 3199 addPoint(start0, LRM_BEFORE); 3200 break; 3201 } 3202 /* if first EN/AN after R/AL */ 3203 if (levState.startL2EN == -1) { 3204 levState.startL2EN = start0; 3205 } 3206 break; 3207 3208 case 8: /* note location of latest R/AL */ 3209 levState.lastStrongRTL = limit - 1; 3210 levState.startON = -1; 3211 break; 3212 3213 case 9: /* L after R+ON/EN/AN */ 3214 /* include possible adjacent number on the left */ 3215 for (k = start0-1; k >= 0 && ((levels[k] & 1) == 0); k--) { 3216 } 3217 if (k >= 0) { 3218 addPoint(k, RLM_BEFORE); /* add RLM before */ 3219 insertPoints.confirmed = insertPoints.size; /* confirm it */ 3220 } 3221 levState.startON = start0; 3222 break; 3223 3224 case 10: /* AN after L */ 3225 /* AN numbers between L text on both sides may be trouble. */ 3226 /* tentatively bracket with LRMs; will be confirmed if followed by L */ 3227 addPoint(start0, LRM_BEFORE); /* add LRM before */ 3228 addPoint(start0, LRM_AFTER); /* add LRM after */ 3229 break; 3230 3231 case 11: /* R after L+ON/EN/AN */ 3232 /* false alert, infirm LRMs around previous AN */ 3233 insertPoints.size=insertPoints.confirmed; 3234 if (_prop == _S) { /* add RLM before S */ 3235 addPoint(start0, RLM_BEFORE); 3236 insertPoints.confirmed = insertPoints.size; 3237 } 3238 break; 3239 3240 case 12: /* L after L+ON/AN */ 3241 level = (byte)(levState.runLevel + addLevel); 3242 for (k=levState.startON; k < start0; k++) { 3243 if (levels[k] < level) { 3244 levels[k] = level; 3245 } 3246 } 3247 insertPoints.confirmed = insertPoints.size; /* confirm inserts */ 3248 levState.startON = start0; 3249 break; 3250 3251 case 13: /* L after L+ON+EN/AN/ON */ 3252 level = levState.runLevel; 3253 for (k = start0-1; k >= levState.startON; k--) { 3254 if (levels[k] == level+3) { 3255 while (levels[k] == level+3) { 3256 levels[k--] -= 2; 3257 } 3258 while (levels[k] == level) { 3259 k--; 3260 } 3261 } 3262 if (levels[k] == level+2) { 3263 levels[k] = level; 3264 continue; 3265 } 3266 levels[k] = (byte)(level+1); 3267 } 3268 break; 3269 3270 case 14: /* R after L+ON+EN/AN/ON */ 3271 level = (byte)(levState.runLevel+1); 3272 for (k = start0-1; k >= levState.startON; k--) { 3273 if (levels[k] > level) { 3274 levels[k] -= 2; 3275 } 3276 } 3277 break; 3278 3279 default: /* we should never get here */ 3280 throw new IllegalStateException("Internal ICU error in processPropertySeq"); 3281 } 3282 } 3283 if ((addLevel) != 0 || (start < start0)) { 3284 level = (byte)(levState.runLevel + addLevel); 3285 if (start >= levState.runStart) { 3286 for (k = start; k < limit; k++) { 3287 levels[k] = level; 3288 } 3289 } else { 3290 setLevelsOutsideIsolates(start, limit, level); 3291 } 3292 } 3293 } 3294 3295 /** 3296 * Returns the directionality of the last strong character at the end of the prologue, if any. 3297 * Requires prologue!=null. 3298 */ 3299 private byte lastL_R_AL() { 3300 for (int i = prologue.length(); i > 0; ) { 3301 int uchar = prologue.codePointBefore(i); 3302 i -= Character.charCount(uchar); 3303 byte dirProp = (byte)getCustomizedClass(uchar); 3304 if (dirProp == L) { 3305 return _L; 3306 } 3307 if (dirProp == R || dirProp == AL) { 3308 return _R; 3309 } 3310 if(dirProp == B) { 3311 return _ON; 3312 } 3313 } 3314 return _ON; 3315 } 3316 3317 /** 3318 * Returns the directionality of the first strong character, or digit, in the epilogue, if any. 3319 * Requires epilogue!=null. 3320 */ 3321 private byte firstL_R_AL_EN_AN() { 3322 for (int i = 0; i < epilogue.length(); ) { 3323 int uchar = epilogue.codePointAt(i); 3324 i += Character.charCount(uchar); 3325 byte dirProp = (byte)getCustomizedClass(uchar); 3326 if (dirProp == L) { 3327 return _L; 3328 } 3329 if (dirProp == R || dirProp == AL) { 3330 return _R; 3331 } 3332 if (dirProp == EN) { 3333 return _EN; 3334 } 3335 if (dirProp == AN) { 3336 return _AN; 3337 } 3338 } 3339 return _ON; 3340 } 3341 3342 private void resolveImplicitLevels(int start, int limit, short sor, short eor) 3343 { 3344 byte dirProp; 3345 LevState levState = new LevState(); 3346 int i, start1, start2; 3347 short oldStateImp, stateImp, actionImp; 3348 short gprop, resProp, cell; 3349 boolean inverseRTL; 3350 short nextStrongProp = R; 3351 int nextStrongPos = -1; 3352 3353 /* check for RTL inverse Bidi mode */ 3354 /* FOOD FOR THOUGHT: in case of RTL inverse Bidi, it would make sense to 3355 * loop on the text characters from end to start. 3356 * This would need a different properties state table (at least different 3357 * actions) and different levels state tables (maybe very similar to the 3358 * LTR corresponding ones. 3359 */ 3360 inverseRTL=((start<lastArabicPos) && ((GetParaLevelAt(start) & 1)>0) && 3361 (reorderingMode == REORDER_INVERSE_LIKE_DIRECT || 3362 reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL)); 3363 /* initialize for property and levels state table */ 3364 levState.startL2EN = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */ 3365 levState.lastStrongRTL = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */ 3366 levState.runStart = start; 3367 levState.runLevel = levels[start]; 3368 levState.impTab = impTabPair.imptab[levState.runLevel & 1]; 3369 levState.impAct = impTabPair.impact[levState.runLevel & 1]; 3370 if (start == 0 && prologue != null) { 3371 byte lastStrong = lastL_R_AL(); 3372 if (lastStrong != _ON) { 3373 sor = lastStrong; 3374 } 3375 } 3376 /* The isolates[] entries contain enough information to 3377 resume the bidi algorithm in the same state as it was 3378 when it was interrupted by an isolate sequence. */ 3379 if (dirProps[start] == PDI) { 3380 levState.startON = isolates[isolateCount].startON; 3381 start1 = isolates[isolateCount].start1; 3382 stateImp = isolates[isolateCount].stateImp; 3383 levState.state = isolates[isolateCount].state; 3384 isolateCount--; 3385 } else { 3386 levState.startON = -1; 3387 start1 = start; 3388 if (dirProps[start] == NSM) 3389 stateImp = (short)(1 + sor); 3390 else 3391 stateImp = 0; 3392 levState.state = 0; 3393 processPropertySeq(levState, sor, start, start); 3394 } 3395 start2 = start; /* to make the Java compiler happy */ 3396 3397 for (i = start; i <= limit; i++) { 3398 if (i >= limit) { 3399 int k; 3400 for (k = limit - 1; 3401 k > start && 3402 (DirPropFlag(dirProps[k]) & MASK_BN_EXPLICIT) != 0; 3403 k--); 3404 dirProp = dirProps[k]; 3405 if (dirProp == LRI || dirProp == RLI) 3406 break; /* no forced closing for sequence ending with LRI/RLI */ 3407 gprop = eor; 3408 } else { 3409 byte prop, prop1; 3410 prop = dirProps[i]; 3411 if (prop == B) 3412 isolateCount = -1; /* current isolates stack entry == none */ 3413 if (inverseRTL) { 3414 if (prop == AL) { 3415 /* AL before EN does not make it AN */ 3416 prop = R; 3417 } else if (prop == EN) { 3418 if (nextStrongPos <= i) { 3419 /* look for next strong char (L/R/AL) */ 3420 int j; 3421 nextStrongProp = R; /* set default */ 3422 nextStrongPos = limit; 3423 for (j = i+1; j < limit; j++) { 3424 prop1 = dirProps[j]; 3425 if (prop1 == L || prop1 == R || prop1 == AL) { 3426 nextStrongProp = prop1; 3427 nextStrongPos = j; 3428 break; 3429 } 3430 } 3431 } 3432 if (nextStrongProp == AL) { 3433 prop = AN; 3434 } 3435 } 3436 } 3437 gprop = groupProp[prop]; 3438 } 3439 oldStateImp = stateImp; 3440 cell = impTabProps[oldStateImp][gprop]; 3441 stateImp = GetStateProps(cell); /* isolate the new state */ 3442 actionImp = GetActionProps(cell); /* isolate the action */ 3443 if ((i == limit) && (actionImp == 0)) { 3444 /* there is an unprocessed sequence if its property == eor */ 3445 actionImp = 1; /* process the last sequence */ 3446 } 3447 if (actionImp != 0) { 3448 resProp = impTabProps[oldStateImp][IMPTABPROPS_RES]; 3449 switch (actionImp) { 3450 case 1: /* process current seq1, init new seq1 */ 3451 processPropertySeq(levState, resProp, start1, i); 3452 start1 = i; 3453 break; 3454 case 2: /* init new seq2 */ 3455 start2 = i; 3456 break; 3457 case 3: /* process seq1, process seq2, init new seq1 */ 3458 processPropertySeq(levState, resProp, start1, start2); 3459 processPropertySeq(levState, _ON, start2, i); 3460 start1 = i; 3461 break; 3462 case 4: /* process seq1, set seq1=seq2, init new seq2 */ 3463 processPropertySeq(levState, resProp, start1, start2); 3464 start1 = start2; 3465 start2 = i; 3466 break; 3467 default: /* we should never get here */ 3468 throw new IllegalStateException("Internal ICU error in resolveImplicitLevels"); 3469 } 3470 } 3471 } 3472 3473 /* flush possible pending sequence, e.g. ON */ 3474 if (limit == length && epilogue != null) { 3475 byte firstStrong = firstL_R_AL_EN_AN(); 3476 if (firstStrong != _ON) { 3477 eor = firstStrong; 3478 } 3479 } 3480 3481 /* look for the last char not a BN or LRE/RLE/LRO/RLO/PDF */ 3482 for (i = limit - 1; 3483 i > start && 3484 (DirPropFlag(dirProps[i]) & MASK_BN_EXPLICIT) != 0; 3485 i--); 3486 dirProp = dirProps[i]; 3487 if ((dirProp == LRI || dirProp == RLI) && limit < length) { 3488 isolateCount++; 3489 if (isolates[isolateCount] == null) 3490 isolates[isolateCount] = new Isolate(); 3491 isolates[isolateCount].stateImp = stateImp; 3492 isolates[isolateCount].state = levState.state; 3493 isolates[isolateCount].start1 = start1; 3494 isolates[isolateCount].startON = levState.startON; 3495 } 3496 else 3497 processPropertySeq(levState, eor, limit, limit); 3498 } 3499 3500 /* perform (L1) and (X9) ---------------------------------------------------- */ 3501 3502 /* 3503 * Reset the embedding levels for some non-graphic characters (L1). 3504 * This method also sets appropriate levels for BN, and 3505 * explicit embedding types that are supposed to have been removed 3506 * from the paragraph in (X9). 3507 */ 3508 private void adjustWSLevels() { 3509 int i; 3510 3511 if ((flags & MASK_WS) != 0) { 3512 int flag; 3513 i = trailingWSStart; 3514 while (i > 0) { 3515 /* reset a sequence of WS/BN before eop and B/S to the paragraph paraLevel */ 3516 while (i > 0 && ((flag = DirPropFlag(dirProps[--i])) & MASK_WS) != 0) { 3517 if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) { 3518 levels[i] = 0; 3519 } else { 3520 levels[i] = GetParaLevelAt(i); 3521 } 3522 } 3523 3524 /* reset BN to the next character's paraLevel until B/S, which restarts above loop */ 3525 /* here, i+1 is guaranteed to be <length */ 3526 while (i > 0) { 3527 flag = DirPropFlag(dirProps[--i]); 3528 if ((flag & MASK_BN_EXPLICIT) != 0) { 3529 levels[i] = levels[i + 1]; 3530 } else if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) { 3531 levels[i] = 0; 3532 break; 3533 } else if ((flag & MASK_B_S) != 0){ 3534 levels[i] = GetParaLevelAt(i); 3535 break; 3536 } 3537 } 3538 } 3539 } 3540 } 3541 3542 /** 3543 * Set the context before a call to setPara().<p> 3544 * 3545 * setPara() computes the left-right directionality for a given piece 3546 * of text which is supplied as one of its arguments. Sometimes this piece 3547 * of text (the "main text") should be considered in context, because text 3548 * appearing before ("prologue") and/or after ("epilogue") the main text 3549 * may affect the result of this computation.<p> 3550 * 3551 * This function specifies the prologue and/or the epilogue for the next 3552 * call to setPara(). If successive calls to setPara() 3553 * all need specification of a context, setContext() must be called 3554 * before each call to setPara(). In other words, a context is not 3555 * "remembered" after the following successful call to setPara().<p> 3556 * 3557 * If a call to setPara() specifies DEFAULT_LTR or 3558 * DEFAULT_RTL as paraLevel and is preceded by a call to 3559 * setContext() which specifies a prologue, the paragraph level will 3560 * be computed taking in consideration the text in the prologue.<p> 3561 * 3562 * When setPara() is called without a previous call to 3563 * setContext, the main text is handled as if preceded and followed 3564 * by strong directional characters at the current paragraph level. 3565 * Calling setContext() with specification of a prologue will change 3566 * this behavior by handling the main text as if preceded by the last 3567 * strong character appearing in the prologue, if any. 3568 * Calling setContext() with specification of an epilogue will change 3569 * the behavior of setPara() by handling the main text as if followed 3570 * by the first strong character or digit appearing in the epilogue, if any.<p> 3571 * 3572 * Note 1: if <code>setContext</code> is called repeatedly without 3573 * calling <code>setPara</code>, the earlier calls have no effect, 3574 * only the last call will be remembered for the next call to 3575 * <code>setPara</code>.<p> 3576 * 3577 * Note 2: calling <code>setContext(null, null)</code> 3578 * cancels any previous setting of non-empty prologue or epilogue. 3579 * The next call to <code>setPara()</code> will process no 3580 * prologue or epilogue.<p> 3581 * 3582 * Note 3: users must be aware that even after setting the context 3583 * before a call to setPara() to perform e.g. a logical to visual 3584 * transformation, the resulting string may not be identical to what it 3585 * would have been if all the text, including prologue and epilogue, had 3586 * been processed together.<br> 3587 * Example (upper case letters represent RTL characters):<br> 3588 * prologue = "<code>abc DE</code>"<br> 3589 * epilogue = none<br> 3590 * main text = "<code>FGH xyz</code>"<br> 3591 * paraLevel = LTR<br> 3592 * display without prologue = "<code>HGF xyz</code>" 3593 * ("HGF" is adjacent to "xyz")<br> 3594 * display with prologue = "<code>abc HGFED xyz</code>" 3595 * ("HGF" is not adjacent to "xyz")<br> 3596 * 3597 * @param prologue is the text which precedes the text that 3598 * will be specified in a coming call to setPara(). 3599 * If there is no prologue to consider, 3600 * this parameter can be <code>null</code>. 3601 * 3602 * @param epilogue is the text which follows the text that 3603 * will be specified in a coming call to setPara(). 3604 * If there is no epilogue to consider, 3605 * this parameter can be <code>null</code>. 3606 * 3607 * @see #setPara 3608 */ 3609 public void setContext(String prologue, String epilogue) { 3610 this.prologue = prologue != null && prologue.length() > 0 ? prologue : null; 3611 this.epilogue = epilogue != null && epilogue.length() > 0 ? epilogue : null; 3612 } 3613 3614 private void setParaSuccess() { 3615 prologue = null; /* forget the last context */ 3616 epilogue = null; 3617 paraBidi = this; /* mark successful setPara */ 3618 } 3619 3620 int Bidi_Min(int x, int y) { 3621 return x < y ? x : y; 3622 } 3623 3624 int Bidi_Abs(int x) { 3625 return x >= 0 ? x : -x; 3626 } 3627 3628 void setParaRunsOnly(char[] parmText, byte parmParaLevel) { 3629 int[] visualMap; 3630 String visualText; 3631 int saveLength, saveTrailingWSStart; 3632 byte[] saveLevels; 3633 byte saveDirection; 3634 int i, j, visualStart, logicalStart, 3635 oldRunCount, runLength, addedRuns, insertRemove, 3636 start, limit, step, indexOddBit, logicalPos, 3637 index, index1; 3638 int saveOptions; 3639 3640 reorderingMode = REORDER_DEFAULT; 3641 int parmLength = parmText.length; 3642 if (parmLength == 0) { 3643 setPara(parmText, parmParaLevel, null); 3644 reorderingMode = REORDER_RUNS_ONLY; 3645 return; 3646 } 3647 /* obtain memory for mapping table and visual text */ 3648 saveOptions = reorderingOptions; 3649 if ((saveOptions & OPTION_INSERT_MARKS) > 0) { 3650 reorderingOptions &= ~OPTION_INSERT_MARKS; 3651 reorderingOptions |= OPTION_REMOVE_CONTROLS; 3652 } 3653 parmParaLevel &= 1; /* accept only 0 or 1 */ 3654 setPara(parmText, parmParaLevel, null); 3655 /* we cannot access directly levels since it is not yet set if 3656 * direction is not MIXED 3657 */ 3658 saveLevels = new byte[this.length]; 3659 System.arraycopy(getLevels(), 0, saveLevels, 0, this.length); 3660 saveTrailingWSStart = trailingWSStart; 3661 3662 /* FOOD FOR THOUGHT: instead of writing the visual text, we could use 3663 * the visual map and the dirProps array to drive the second call 3664 * to setPara (but must make provision for possible removal of 3665 * Bidi controls. Alternatively, only use the dirProps array via 3666 * customized classifier callback. 3667 */ 3668 visualText = writeReordered(DO_MIRRORING); 3669 visualMap = getVisualMap(); 3670 this.reorderingOptions = saveOptions; 3671 saveLength = this.length; 3672 saveDirection=this.direction; 3673 3674 this.reorderingMode = REORDER_INVERSE_LIKE_DIRECT; 3675 parmParaLevel ^= 1; 3676 setPara(visualText, parmParaLevel, null); 3677 BidiLine.getRuns(this); 3678 /* check if some runs must be split, count how many splits */ 3679 addedRuns = 0; 3680 oldRunCount = this.runCount; 3681 visualStart = 0; 3682 for (i = 0; i < oldRunCount; i++, visualStart += runLength) { 3683 runLength = runs[i].limit - visualStart; 3684 if (runLength < 2) { 3685 continue; 3686 } 3687 logicalStart = runs[i].start; 3688 for (j = logicalStart+1; j < logicalStart+runLength; j++) { 3689 index = visualMap[j]; 3690 index1 = visualMap[j-1]; 3691 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) { 3692 addedRuns++; 3693 } 3694 } 3695 } 3696 if (addedRuns > 0) { 3697 getRunsMemory(oldRunCount + addedRuns); 3698 if (runCount == 1) { 3699 /* because we switch from UBiDi.simpleRuns to UBiDi.runs */ 3700 runsMemory[0] = runs[0]; 3701 } else { 3702 System.arraycopy(runs, 0, runsMemory, 0, runCount); 3703 } 3704 runs = runsMemory; 3705 runCount += addedRuns; 3706 for (i = oldRunCount; i < runCount; i++) { 3707 if (runs[i] == null) { 3708 runs[i] = new BidiRun(0, 0, (byte)0); 3709 } 3710 } 3711 } 3712 /* split runs which are not consecutive in source text */ 3713 int newI; 3714 for (i = oldRunCount-1; i >= 0; i--) { 3715 newI = i + addedRuns; 3716 runLength = i==0 ? runs[0].limit : 3717 runs[i].limit - runs[i-1].limit; 3718 logicalStart = runs[i].start; 3719 indexOddBit = runs[i].level & 1; 3720 if (runLength < 2) { 3721 if (addedRuns > 0) { 3722 runs[newI].copyFrom(runs[i]); 3723 } 3724 logicalPos = visualMap[logicalStart]; 3725 runs[newI].start = logicalPos; 3726 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); 3727 continue; 3728 } 3729 if (indexOddBit > 0) { 3730 start = logicalStart; 3731 limit = logicalStart + runLength - 1; 3732 step = 1; 3733 } else { 3734 start = logicalStart + runLength - 1; 3735 limit = logicalStart; 3736 step = -1; 3737 } 3738 for (j = start; j != limit; j += step) { 3739 index = visualMap[j]; 3740 index1 = visualMap[j+step]; 3741 if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) { 3742 logicalPos = Bidi_Min(visualMap[start], index); 3743 runs[newI].start = logicalPos; 3744 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); 3745 runs[newI].limit = runs[i].limit; 3746 runs[i].limit -= Bidi_Abs(j - start) + 1; 3747 insertRemove = runs[i].insertRemove & (LRM_AFTER|RLM_AFTER); 3748 runs[newI].insertRemove = insertRemove; 3749 runs[i].insertRemove &= ~insertRemove; 3750 start = j + step; 3751 addedRuns--; 3752 newI--; 3753 } 3754 } 3755 if (addedRuns > 0) { 3756 runs[newI].copyFrom(runs[i]); 3757 } 3758 logicalPos = Bidi_Min(visualMap[start], visualMap[limit]); 3759 runs[newI].start = logicalPos; 3760 runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); 3761 } 3762 3763// cleanup1: 3764 /* restore initial paraLevel */ 3765 this.paraLevel ^= 1; 3766// cleanup2: 3767 /* restore real text */ 3768 this.text = parmText; 3769 this.length = saveLength; 3770 this.originalLength = parmLength; 3771 this.direction=saveDirection; 3772 this.levels = saveLevels; 3773 this.trailingWSStart = saveTrailingWSStart; 3774 if (runCount > 1) { 3775 this.direction = MIXED; 3776 } 3777// cleanup3: 3778 this.reorderingMode = REORDER_RUNS_ONLY; 3779 } 3780 3781 /** 3782 * Perform the Unicode Bidi algorithm. It is defined in the 3783 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>, 3784 * version 13, 3785 * also described in The Unicode Standard, Version 4.0 .<p> 3786 * 3787 * This method takes a piece of plain text containing one or more paragraphs, 3788 * with or without externally specified embedding levels from <i>styled</i> 3789 * text and computes the left-right-directionality of each character.<p> 3790 * 3791 * If the entire text is all of the same directionality, then 3792 * the method may not perform all the steps described by the algorithm, 3793 * i.e., some levels may not be the same as if all steps were performed. 3794 * This is not relevant for unidirectional text.<br> 3795 * For example, in pure LTR text with numbers the numbers would get 3796 * a resolved level of 2 higher than the surrounding text according to 3797 * the algorithm. This implementation may set all resolved levels to 3798 * the same value in such a case.<p> 3799 * 3800 * The text can be composed of multiple paragraphs. Occurrence of a block 3801 * separator in the text terminates a paragraph, and whatever comes next starts 3802 * a new paragraph. The exception to this rule is when a Carriage Return (CR) 3803 * is followed by a Line Feed (LF). Both CR and LF are block separators, but 3804 * in that case, the pair of characters is considered as terminating the 3805 * preceding paragraph, and a new paragraph will be started by a character 3806 * coming after the LF. 3807 * 3808 * Although the text is passed here as a <code>String</code>, it is 3809 * stored internally as an array of characters. Therefore the 3810 * documentation will refer to indexes of the characters in the text. 3811 * 3812 * @param text contains the text that the Bidi algorithm will be performed 3813 * on. This text can be retrieved with <code>getText()</code> or 3814 * <code>getTextAsString</code>.<br> 3815 * 3816 * @param paraLevel specifies the default level for the text; 3817 * it is typically 0 (LTR) or 1 (RTL). 3818 * If the method shall determine the paragraph level from the text, 3819 * then <code>paraLevel</code> can be set to 3820 * either <code>LEVEL_DEFAULT_LTR</code> 3821 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple 3822 * paragraphs, the paragraph level shall be determined separately for 3823 * each paragraph; if a paragraph does not include any strongly typed 3824 * character, then the desired default is used (0 for LTR or 1 for RTL). 3825 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code> 3826 * is also valid, with odd levels indicating RTL. 3827 * 3828 * @param embeddingLevels (in) may be used to preset the embedding and override levels, 3829 * ignoring characters like LRE and PDF in the text. 3830 * A level overrides the directional property of its corresponding 3831 * (same index) character if the level has the 3832 * <code>LEVEL_OVERRIDE</code> bit set.<br><br> 3833 * Except for that bit, it must be 3834 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>, 3835 * with one exception: a level of zero may be specified for a 3836 * paragraph separator even if <code>paraLevel>0</code> when multiple 3837 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br> 3838 * <strong>Caution: </strong>A reference to this array, not a copy 3839 * of the levels, will be stored in the <code>Bidi</code> object; 3840 * the <code>embeddingLevels</code> 3841 * should not be modified to avoid unexpected results on subsequent 3842 * Bidi operations. However, the <code>setPara()</code> and 3843 * <code>setLine()</code> methods may modify some or all of the 3844 * levels.<br><br> 3845 * <strong>Note:</strong> the <code>embeddingLevels</code> array must 3846 * have one entry for each character in <code>text</code>. 3847 * 3848 * @throws IllegalArgumentException if the values in embeddingLevels are 3849 * not within the allowed range 3850 * 3851 * @see #LEVEL_DEFAULT_LTR 3852 * @see #LEVEL_DEFAULT_RTL 3853 * @see #LEVEL_OVERRIDE 3854 * @see #MAX_EXPLICIT_LEVEL 3855 */ 3856 public void setPara(String text, byte paraLevel, byte[] embeddingLevels) 3857 { 3858 if (text == null) { 3859 setPara(new char[0], paraLevel, embeddingLevels); 3860 } else { 3861 setPara(text.toCharArray(), paraLevel, embeddingLevels); 3862 } 3863 } 3864 3865 /** 3866 * Perform the Unicode Bidi algorithm. It is defined in the 3867 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>, 3868 * version 13, 3869 * also described in The Unicode Standard, Version 4.0 .<p> 3870 * 3871 * This method takes a piece of plain text containing one or more paragraphs, 3872 * with or without externally specified embedding levels from <i>styled</i> 3873 * text and computes the left-right-directionality of each character.<p> 3874 * 3875 * If the entire text is all of the same directionality, then 3876 * the method may not perform all the steps described by the algorithm, 3877 * i.e., some levels may not be the same as if all steps were performed. 3878 * This is not relevant for unidirectional text.<br> 3879 * For example, in pure LTR text with numbers the numbers would get 3880 * a resolved level of 2 higher than the surrounding text according to 3881 * the algorithm. This implementation may set all resolved levels to 3882 * the same value in such a case.<p> 3883 * 3884 * The text can be composed of multiple paragraphs. Occurrence of a block 3885 * separator in the text terminates a paragraph, and whatever comes next starts 3886 * a new paragraph. The exception to this rule is when a Carriage Return (CR) 3887 * is followed by a Line Feed (LF). Both CR and LF are block separators, but 3888 * in that case, the pair of characters is considered as terminating the 3889 * preceding paragraph, and a new paragraph will be started by a character 3890 * coming after the LF. 3891 * 3892 * The text is stored internally as an array of characters. Therefore the 3893 * documentation will refer to indexes of the characters in the text. 3894 * 3895 * @param chars contains the text that the Bidi algorithm will be performed 3896 * on. This text can be retrieved with <code>getText()</code> or 3897 * <code>getTextAsString</code>.<br> 3898 * 3899 * @param paraLevel specifies the default level for the text; 3900 * it is typically 0 (LTR) or 1 (RTL). 3901 * If the method shall determine the paragraph level from the text, 3902 * then <code>paraLevel</code> can be set to 3903 * either <code>LEVEL_DEFAULT_LTR</code> 3904 * or <code>LEVEL_DEFAULT_RTL</code>; if the text contains multiple 3905 * paragraphs, the paragraph level shall be determined separately for 3906 * each paragraph; if a paragraph does not include any strongly typed 3907 * character, then the desired default is used (0 for LTR or 1 for RTL). 3908 * Any other value between 0 and <code>MAX_EXPLICIT_LEVEL</code> 3909 * is also valid, with odd levels indicating RTL. 3910 * 3911 * @param embeddingLevels (in) may be used to preset the embedding and 3912 * override levels, ignoring characters like LRE and PDF in the text. 3913 * A level overrides the directional property of its corresponding 3914 * (same index) character if the level has the 3915 * <code>LEVEL_OVERRIDE</code> bit set.<br><br> 3916 * Except for that bit, it must be 3917 * <code>paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL</code>, 3918 * with one exception: a level of zero may be specified for a 3919 * paragraph separator even if <code>paraLevel>0</code> when multiple 3920 * paragraphs are submitted in the same call to <code>setPara()</code>.<br><br> 3921 * <strong>Caution: </strong>A reference to this array, not a copy 3922 * of the levels, will be stored in the <code>Bidi</code> object; 3923 * the <code>embeddingLevels</code> 3924 * should not be modified to avoid unexpected results on subsequent 3925 * Bidi operations. However, the <code>setPara()</code> and 3926 * <code>setLine()</code> methods may modify some or all of the 3927 * levels.<br><br> 3928 * <strong>Note:</strong> the <code>embeddingLevels</code> array must 3929 * have one entry for each character in <code>text</code>. 3930 * 3931 * @throws IllegalArgumentException if the values in embeddingLevels are 3932 * not within the allowed range 3933 * 3934 * @see #LEVEL_DEFAULT_LTR 3935 * @see #LEVEL_DEFAULT_RTL 3936 * @see #LEVEL_OVERRIDE 3937 * @see #MAX_EXPLICIT_LEVEL 3938 */ 3939 public void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels) 3940 { 3941 /* check the argument values */ 3942 if (paraLevel < LEVEL_DEFAULT_LTR) { 3943 verifyRange(paraLevel, 0, MAX_EXPLICIT_LEVEL + 1); 3944 } 3945 if (chars == null) { 3946 chars = new char[0]; 3947 } 3948 3949 /* special treatment for RUNS_ONLY mode */ 3950 if (reorderingMode == REORDER_RUNS_ONLY) { 3951 setParaRunsOnly(chars, paraLevel); 3952 return; 3953 } 3954 3955 /* initialize the Bidi object */ 3956 this.paraBidi = null; /* mark unfinished setPara */ 3957 this.text = chars; 3958 this.length = this.originalLength = this.resultLength = text.length; 3959 this.paraLevel = paraLevel; 3960 this.direction = (byte)(paraLevel & 1); 3961 this.paraCount = 1; 3962 3963 /* Allocate zero-length arrays instead of setting to null here; then 3964 * checks for null in various places can be eliminated. 3965 */ 3966 dirProps = new byte[0]; 3967 levels = new byte[0]; 3968 runs = new BidiRun[0]; 3969 isGoodLogicalToVisualRunsMap = false; 3970 insertPoints.size = 0; /* clean up from last call */ 3971 insertPoints.confirmed = 0; /* clean up from last call */ 3972 3973 /* 3974 * Save the original paraLevel if contextual; otherwise, set to 0. 3975 */ 3976 defaultParaLevel = IsDefaultLevel(paraLevel) ? paraLevel : 0; 3977 3978 if (length == 0) { 3979 /* 3980 * For an empty paragraph, create a Bidi object with the paraLevel and 3981 * the flags and the direction set but without allocating zero-length arrays. 3982 * There is nothing more to do. 3983 */ 3984 if (IsDefaultLevel(paraLevel)) { 3985 this.paraLevel &= 1; 3986 defaultParaLevel = 0; 3987 } 3988 flags = DirPropFlagLR(paraLevel); 3989 runCount = 0; 3990 paraCount = 0; 3991 setParaSuccess(); 3992 return; 3993 } 3994 3995 runCount = -1; 3996 3997 /* 3998 * Get the directional properties, 3999 * the flags bit-set, and 4000 * determine the paragraph level if necessary. 4001 */ 4002 getDirPropsMemory(length); 4003 dirProps = dirPropsMemory; 4004 getDirProps(); 4005 /* the processed length may have changed if OPTION_STREAMING is set */ 4006 trailingWSStart = length; /* the levels[] will reflect the WS run */ 4007 4008 /* are explicit levels specified? */ 4009 if (embeddingLevels == null) { 4010 /* no: determine explicit levels according to the (Xn) rules */ 4011 getLevelsMemory(length); 4012 levels = levelsMemory; 4013 direction = resolveExplicitLevels(); 4014 } else { 4015 /* set BN for all explicit codes, check that all levels are 0 or paraLevel..MAX_EXPLICIT_LEVEL */ 4016 levels = embeddingLevels; 4017 direction = checkExplicitLevels(); 4018 } 4019 4020 /* allocate isolate memory */ 4021 if (isolateCount > 0) { 4022 if (isolates == null || isolates.length < isolateCount) 4023 isolates = new Isolate[isolateCount + 3]; /* keep some reserve */ 4024 } 4025 isolateCount = -1; /* current isolates stack entry == none */ 4026 4027 /* 4028 * The steps after (X9) in the Bidi algorithm are performed only if 4029 * the paragraph text has mixed directionality! 4030 */ 4031 switch (direction) { 4032 case LTR: 4033 /* all levels are implicitly at paraLevel (important for getLevels()) */ 4034 trailingWSStart = 0; 4035 break; 4036 case RTL: 4037 /* all levels are implicitly at paraLevel (important for getLevels()) */ 4038 trailingWSStart = 0; 4039 break; 4040 default: 4041 /* 4042 * Choose the right implicit state table 4043 */ 4044 switch(reorderingMode) { 4045 case REORDER_DEFAULT: 4046 this.impTabPair = impTab_DEFAULT; 4047 break; 4048 case REORDER_NUMBERS_SPECIAL: 4049 this.impTabPair = impTab_NUMBERS_SPECIAL; 4050 break; 4051 case REORDER_GROUP_NUMBERS_WITH_R: 4052 this.impTabPair = impTab_GROUP_NUMBERS_WITH_R; 4053 break; 4054 case REORDER_RUNS_ONLY: 4055 /* we should never get here */ 4056 throw new InternalError("Internal ICU error in setPara"); 4057 /* break; */ 4058 case REORDER_INVERSE_NUMBERS_AS_L: 4059 this.impTabPair = impTab_INVERSE_NUMBERS_AS_L; 4060 break; 4061 case REORDER_INVERSE_LIKE_DIRECT: 4062 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) { 4063 this.impTabPair = impTab_INVERSE_LIKE_DIRECT_WITH_MARKS; 4064 } else { 4065 this.impTabPair = impTab_INVERSE_LIKE_DIRECT; 4066 } 4067 break; 4068 case REORDER_INVERSE_FOR_NUMBERS_SPECIAL: 4069 if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) { 4070 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS; 4071 } else { 4072 this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL; 4073 } 4074 break; 4075 } 4076 /* 4077 * If there are no external levels specified and there 4078 * are no significant explicit level codes in the text, 4079 * then we can treat the entire paragraph as one run. 4080 * Otherwise, we need to perform the following rules on runs of 4081 * the text with the same embedding levels. (X10) 4082 * "Significant" explicit level codes are ones that actually 4083 * affect non-BN characters. 4084 * Examples for "insignificant" ones are empty embeddings 4085 * LRE-PDF, LRE-RLE-PDF-PDF, etc. 4086 */ 4087 if (embeddingLevels == null && paraCount <= 1 && 4088 (flags & DirPropFlagMultiRuns) == 0) { 4089 resolveImplicitLevels(0, length, 4090 GetLRFromLevel(GetParaLevelAt(0)), 4091 GetLRFromLevel(GetParaLevelAt(length - 1))); 4092 } else { 4093 /* sor, eor: start and end types of same-level-run */ 4094 int start, limit = 0; 4095 byte level, nextLevel; 4096 short sor, eor; 4097 4098 /* determine the first sor and set eor to it because of the loop body (sor=eor there) */ 4099 level = GetParaLevelAt(0); 4100 nextLevel = levels[0]; 4101 if (level < nextLevel) { 4102 eor = GetLRFromLevel(nextLevel); 4103 } else { 4104 eor = GetLRFromLevel(level); 4105 } 4106 4107 do { 4108 /* determine start and limit of the run (end points just behind the run) */ 4109 4110 /* the values for this run's start are the same as for the previous run's end */ 4111 start = limit; 4112 level = nextLevel; 4113 if ((start > 0) && (dirProps[start - 1] == B)) { 4114 /* except if this is a new paragraph, then set sor = para level */ 4115 sor = GetLRFromLevel(GetParaLevelAt(start)); 4116 } else { 4117 sor = eor; 4118 } 4119 4120 /* search for the limit of this run */ 4121 while ((++limit < length) && 4122 ((levels[limit] == level) || 4123 ((DirPropFlag(dirProps[limit]) & MASK_BN_EXPLICIT) != 0))) {} 4124 4125 /* get the correct level of the next run */ 4126 if (limit < length) { 4127 nextLevel = levels[limit]; 4128 } else { 4129 nextLevel = GetParaLevelAt(length - 1); 4130 } 4131 4132 /* determine eor from max(level, nextLevel); sor is last run's eor */ 4133 if (NoOverride(level) < NoOverride(nextLevel)) { 4134 eor = GetLRFromLevel(nextLevel); 4135 } else { 4136 eor = GetLRFromLevel(level); 4137 } 4138 4139 /* if the run consists of overridden directional types, then there 4140 are no implicit types to be resolved */ 4141 if ((level & LEVEL_OVERRIDE) == 0) { 4142 resolveImplicitLevels(start, limit, sor, eor); 4143 } else { 4144 /* remove the LEVEL_OVERRIDE flags */ 4145 do { 4146 levels[start++] &= ~LEVEL_OVERRIDE; 4147 } while (start < limit); 4148 } 4149 } while (limit < length); 4150 } 4151 4152 /* reset the embedding levels for some non-graphic characters (L1), (X9) */ 4153 adjustWSLevels(); 4154 4155 break; 4156 } 4157 /* add RLM for inverse Bidi with contextual orientation resolving 4158 * to RTL which would not round-trip otherwise 4159 */ 4160 if ((defaultParaLevel > 0) && 4161 ((reorderingOptions & OPTION_INSERT_MARKS) != 0) && 4162 ((reorderingMode == REORDER_INVERSE_LIKE_DIRECT) || 4163 (reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL))) { 4164 int start, last; 4165 byte level; 4166 byte dirProp; 4167 for (int i = 0; i < paraCount; i++) { 4168 last = paras_limit[i] - 1; 4169 level = paras_level[i]; 4170 if (level == 0) 4171 continue; /* LTR paragraph */ 4172 start = i == 0 ? 0 : paras_limit[i - 1]; 4173 for (int j = last; j >= start; j--) { 4174 dirProp = dirProps[j]; 4175 if (dirProp == L) { 4176 if (j < last) { 4177 while (dirProps[last] == B) { 4178 last--; 4179 } 4180 } 4181 addPoint(last, RLM_BEFORE); 4182 break; 4183 } 4184 if ((DirPropFlag(dirProp) & MASK_R_AL) != 0) { 4185 break; 4186 } 4187 } 4188 } 4189 } 4190 4191 if ((reorderingOptions & OPTION_REMOVE_CONTROLS) != 0) { 4192 resultLength -= controlCount; 4193 } else { 4194 resultLength += insertPoints.size; 4195 } 4196 setParaSuccess(); 4197 } 4198 4199 /** 4200 * Perform the Unicode Bidi algorithm on a given paragraph, as defined in the 4201 * <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard Annex #9</a>, 4202 * version 13, 4203 * also described in The Unicode Standard, Version 4.0 .<p> 4204 * 4205 * This method takes a paragraph of text and computes the 4206 * left-right-directionality of each character. The text should not 4207 * contain any Unicode block separators.<p> 4208 * 4209 * The RUN_DIRECTION attribute in the text, if present, determines the base 4210 * direction (left-to-right or right-to-left). If not present, the base 4211 * direction is computed using the Unicode Bidirectional Algorithm, 4212 * defaulting to left-to-right if there are no strong directional characters 4213 * in the text. This attribute, if present, must be applied to all the text 4214 * in the paragraph.<p> 4215 * 4216 * The BIDI_EMBEDDING attribute in the text, if present, represents 4217 * embedding level information. Negative values from -1 to -62 indicate 4218 * overrides at the absolute value of the level. Positive values from 1 to 4219 * 62 indicate embeddings. Where values are zero or not defined, the base 4220 * embedding level as determined by the base direction is assumed.<p> 4221 * 4222 * The NUMERIC_SHAPING attribute in the text, if present, converts European 4223 * digits to other decimal digits before running the bidi algorithm. This 4224 * attribute, if present, must be applied to all the text in the paragraph. 4225 * 4226 * If the entire text is all of the same directionality, then 4227 * the method may not perform all the steps described by the algorithm, 4228 * i.e., some levels may not be the same as if all steps were performed. 4229 * This is not relevant for unidirectional text.<br> 4230 * For example, in pure LTR text with numbers the numbers would get 4231 * a resolved level of 2 higher than the surrounding text according to 4232 * the algorithm. This implementation may set all resolved levels to 4233 * the same value in such a case.<p> 4234 * 4235 * @param paragraph a paragraph of text with optional character and 4236 * paragraph attribute information 4237 */ 4238 public void setPara(AttributedCharacterIterator paragraph) 4239 { 4240 byte paraLvl; 4241 Boolean runDirection = (Boolean) paragraph.getAttribute(TextAttribute.RUN_DIRECTION); 4242 if (runDirection == null) { 4243 paraLvl = LEVEL_DEFAULT_LTR; 4244 } else { 4245 paraLvl = (runDirection.equals(TextAttribute.RUN_DIRECTION_LTR)) ? 4246 LTR : RTL; 4247 } 4248 4249 byte[] lvls = null; 4250 int len = paragraph.getEndIndex() - paragraph.getBeginIndex(); 4251 byte[] embeddingLevels = new byte[len]; 4252 char[] txt = new char[len]; 4253 int i = 0; 4254 char ch = paragraph.first(); 4255 while (ch != AttributedCharacterIterator.DONE) { 4256 txt[i] = ch; 4257 Integer embedding = (Integer) paragraph.getAttribute(TextAttribute.BIDI_EMBEDDING); 4258 if (embedding != null) { 4259 byte level = embedding.byteValue(); 4260 if (level == 0) { 4261 /* no-op */ 4262 } else if (level < 0) { 4263 lvls = embeddingLevels; 4264 embeddingLevels[i] = (byte)((0 - level) | LEVEL_OVERRIDE); 4265 } else { 4266 lvls = embeddingLevels; 4267 embeddingLevels[i] = level; 4268 } 4269 } 4270 ch = paragraph.next(); 4271 ++i; 4272 } 4273 4274 NumericShaper shaper = (NumericShaper) paragraph.getAttribute(TextAttribute.NUMERIC_SHAPING); 4275 if (shaper != null) { 4276 shaper.shape(txt, 0, len); 4277 } 4278 setPara(txt, paraLvl, lvls); 4279 } 4280 4281 /** 4282 * Specify whether block separators must be allocated level zero, 4283 * so that successive paragraphs will progress from left to right. 4284 * This method must be called before <code>setPara()</code>. 4285 * Paragraph separators (B) may appear in the text. Setting them to level zero 4286 * means that all paragraph separators (including one possibly appearing 4287 * in the last text position) are kept in the reordered text after the text 4288 * that they follow in the source text. 4289 * When this feature is not enabled, a paragraph separator at the last 4290 * position of the text before reordering will go to the first position 4291 * of the reordered text when the paragraph level is odd. 4292 * 4293 * @param ordarParaLTR specifies whether paragraph separators (B) must 4294 * receive level 0, so that successive paragraphs progress from left to right. 4295 * 4296 * @see #setPara 4297 */ 4298 public void orderParagraphsLTR(boolean ordarParaLTR) { 4299 orderParagraphsLTR = ordarParaLTR; 4300 } 4301 4302 /** 4303 * Is this <code>Bidi</code> object set to allocate level 0 to block 4304 * separators so that successive paragraphs progress from left to right? 4305 * 4306 * @return <code>true</code> if the <code>Bidi</code> object is set to 4307 * allocate level 0 to block separators. 4308 */ 4309 public boolean isOrderParagraphsLTR() { 4310 return orderParagraphsLTR; 4311 } 4312 4313 /** 4314 * Get the directionality of the text. 4315 * 4316 * @return a value of <code>LTR</code>, <code>RTL</code> or <code>MIXED</code> 4317 * that indicates if the entire text 4318 * represented by this object is unidirectional, 4319 * and which direction, or if it is mixed-directional. 4320 * 4321 * @throws IllegalStateException if this call is not preceded by a successful 4322 * call to <code>setPara</code> or <code>setLine</code> 4323 * 4324 * @see #LTR 4325 * @see #RTL 4326 * @see #MIXED 4327 */ 4328 public byte getDirection() 4329 { 4330 verifyValidParaOrLine(); 4331 return direction; 4332 } 4333 4334 /** 4335 * Get the text. 4336 * 4337 * @return A <code>String</code> containing the text that the 4338 * <code>Bidi</code> object was created for. 4339 * 4340 * @throws IllegalStateException if this call is not preceded by a successful 4341 * call to <code>setPara</code> or <code>setLine</code> 4342 * 4343 * @see #setPara 4344 * @see #setLine 4345 */ 4346 public String getTextAsString() 4347 { 4348 verifyValidParaOrLine(); 4349 return new String(text); 4350 } 4351 4352 /** 4353 * Get the text. 4354 * 4355 * @return A <code>char</code> array containing the text that the 4356 * <code>Bidi</code> object was created for. 4357 * 4358 * @throws IllegalStateException if this call is not preceded by a successful 4359 * call to <code>setPara</code> or <code>setLine</code> 4360 * 4361 * @see #setPara 4362 * @see #setLine 4363 */ 4364 public char[] getText() 4365 { 4366 verifyValidParaOrLine(); 4367 return text; 4368 } 4369 4370 /** 4371 * Get the length of the text. 4372 * 4373 * @return The length of the text that the <code>Bidi</code> object was 4374 * created for. 4375 * 4376 * @throws IllegalStateException if this call is not preceded by a successful 4377 * call to <code>setPara</code> or <code>setLine</code> 4378 */ 4379 public int getLength() 4380 { 4381 verifyValidParaOrLine(); 4382 return originalLength; 4383 } 4384 4385 /** 4386 * Get the length of the source text processed by the last call to 4387 * <code>setPara()</code>. This length may be different from the length of 4388 * the source text if option <code>OPTION_STREAMING</code> has been 4389 * set. 4390 * <br> 4391 * Note that whenever the length of the text affects the execution or the 4392 * result of a method, it is the processed length which must be considered, 4393 * except for <code>setPara</code> (which receives unprocessed source text) 4394 * and <code>getLength</code> (which returns the original length of the 4395 * source text).<br> 4396 * In particular, the processed length is the one to consider in the 4397 * following cases: 4398 * <ul> 4399 * <li>maximum value of the <code>limit</code> argument of 4400 * <code>setLine</code></li> 4401 * <li>maximum value of the <code>charIndex</code> argument of 4402 * <code>getParagraph</code></li> 4403 * <li>maximum value of the <code>charIndex</code> argument of 4404 * <code>getLevelAt</code></li> 4405 * <li>number of elements in the array returned by <code>getLevels</code> 4406 * </li> 4407 * <li>maximum value of the <code>logicalStart</code> argument of 4408 * <code>getLogicalRun</code></li> 4409 * <li>maximum value of the <code>logicalIndex</code> argument of 4410 * <code>getVisualIndex</code></li> 4411 * <li>number of elements returned by <code>getLogicalMap</code></li> 4412 * <li>length of text processed by <code>writeReordered</code></li> 4413 * </ul> 4414 * 4415 * @return The length of the part of the source text processed by 4416 * the last call to <code>setPara</code>. 4417 * 4418 * @throws IllegalStateException if this call is not preceded by a successful 4419 * call to <code>setPara</code> or <code>setLine</code> 4420 * 4421 * @see #setPara 4422 * @see #OPTION_STREAMING 4423 */ 4424 public int getProcessedLength() { 4425 verifyValidParaOrLine(); 4426 return length; 4427 } 4428 4429 /** 4430 * Get the length of the reordered text resulting from the last call to 4431 * <code>setPara()</code>. This length may be different from the length 4432 * of the source text if option <code>OPTION_INSERT_MARKS</code> 4433 * or option <code>OPTION_REMOVE_CONTROLS</code> has been set. 4434 * <br> 4435 * This resulting length is the one to consider in the following cases: 4436 * <ul> 4437 * <li>maximum value of the <code>visualIndex</code> argument of 4438 * <code>getLogicalIndex</code></li> 4439 * <li>number of elements returned by <code>getVisualMap</code></li> 4440 * </ul> 4441 * Note that this length stays identical to the source text length if 4442 * Bidi marks are inserted or removed using option bits of 4443 * <code>writeReordered</code>, or if option 4444 * <code>REORDER_INVERSE_NUMBERS_AS_L</code> has been set. 4445 * 4446 * @return The length of the reordered text resulting from 4447 * the last call to <code>setPara</code>. 4448 * 4449 * @throws IllegalStateException if this call is not preceded by a successful 4450 * call to <code>setPara</code> or <code>setLine</code> 4451 * 4452 * @see #setPara 4453 * @see #OPTION_INSERT_MARKS 4454 * @see #OPTION_REMOVE_CONTROLS 4455 * @see #REORDER_INVERSE_NUMBERS_AS_L 4456 */ 4457 public int getResultLength() { 4458 verifyValidParaOrLine(); 4459 return resultLength; 4460 } 4461 4462 /* paragraphs API methods ------------------------------------------------- */ 4463 4464 /** 4465 * Get the paragraph level of the text. 4466 * 4467 * @return The paragraph level. If there are multiple paragraphs, their 4468 * level may vary if the required paraLevel is LEVEL_DEFAULT_LTR or 4469 * LEVEL_DEFAULT_RTL. In that case, the level of the first paragraph 4470 * is returned. 4471 * 4472 * @throws IllegalStateException if this call is not preceded by a successful 4473 * call to <code>setPara</code> or <code>setLine</code> 4474 * 4475 * @see #LEVEL_DEFAULT_LTR 4476 * @see #LEVEL_DEFAULT_RTL 4477 * @see #getParagraph 4478 * @see #getParagraphByIndex 4479 */ 4480 public byte getParaLevel() 4481 { 4482 verifyValidParaOrLine(); 4483 return paraLevel; 4484 } 4485 4486 /** 4487 * Get the number of paragraphs. 4488 * 4489 * @return The number of paragraphs. 4490 * 4491 * @throws IllegalStateException if this call is not preceded by a successful 4492 * call to <code>setPara</code> or <code>setLine</code> 4493 */ 4494 public int countParagraphs() 4495 { 4496 verifyValidParaOrLine(); 4497 return paraCount; 4498 } 4499 4500 /** 4501 * Get a paragraph, given the index of this paragraph. 4502 * 4503 * This method returns information about a paragraph.<p> 4504 * 4505 * @param paraIndex is the number of the paragraph, in the 4506 * range <code>[0..countParagraphs()-1]</code>. 4507 * 4508 * @return a BidiRun object with the details of the paragraph:<br> 4509 * <code>start</code> will receive the index of the first character 4510 * of the paragraph in the text.<br> 4511 * <code>limit</code> will receive the limit of the paragraph.<br> 4512 * <code>embeddingLevel</code> will receive the level of the paragraph. 4513 * 4514 * @throws IllegalStateException if this call is not preceded by a successful 4515 * call to <code>setPara</code> or <code>setLine</code> 4516 * @throws IllegalArgumentException if paraIndex is not in the range 4517 * <code>[0..countParagraphs()-1]</code> 4518 * 4519 * @see android.icu.text.BidiRun 4520 */ 4521 public BidiRun getParagraphByIndex(int paraIndex) 4522 { 4523 verifyValidParaOrLine(); 4524 verifyRange(paraIndex, 0, paraCount); 4525 4526 Bidi bidi = paraBidi; /* get Para object if Line object */ 4527 int paraStart; 4528 if (paraIndex == 0) { 4529 paraStart = 0; 4530 } else { 4531 paraStart = bidi.paras_limit[paraIndex - 1]; 4532 } 4533 BidiRun bidiRun = new BidiRun(); 4534 bidiRun.start = paraStart; 4535 bidiRun.limit = bidi.paras_limit[paraIndex]; 4536 bidiRun.level = GetParaLevelAt(paraStart); 4537 return bidiRun; 4538 } 4539 4540 /** 4541 * Get a paragraph, given a position within the text. 4542 * This method returns information about a paragraph.<br> 4543 * Note: if the paragraph index is known, it is more efficient to 4544 * retrieve the paragraph information using getParagraphByIndex().<p> 4545 * 4546 * @param charIndex is the index of a character within the text, in the 4547 * range <code>[0..getProcessedLength()-1]</code>. 4548 * 4549 * @return a BidiRun object with the details of the paragraph:<br> 4550 * <code>start</code> will receive the index of the first character 4551 * of the paragraph in the text.<br> 4552 * <code>limit</code> will receive the limit of the paragraph.<br> 4553 * <code>embeddingLevel</code> will receive the level of the paragraph. 4554 * 4555 * @throws IllegalStateException if this call is not preceded by a successful 4556 * call to <code>setPara</code> or <code>setLine</code> 4557 * @throws IllegalArgumentException if charIndex is not within the legal range 4558 * 4559 * @see android.icu.text.BidiRun 4560 * @see #getParagraphByIndex 4561 * @see #getProcessedLength 4562 */ 4563 public BidiRun getParagraph(int charIndex) 4564 { 4565 verifyValidParaOrLine(); 4566 Bidi bidi = paraBidi; /* get Para object if Line object */ 4567 verifyRange(charIndex, 0, bidi.length); 4568 int paraIndex; 4569 for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) { 4570 } 4571 return getParagraphByIndex(paraIndex); 4572 } 4573 4574 /** 4575 * Get the index of a paragraph, given a position within the text.<p> 4576 * 4577 * @param charIndex is the index of a character within the text, in the 4578 * range <code>[0..getProcessedLength()-1]</code>. 4579 * 4580 * @return The index of the paragraph containing the specified position, 4581 * starting from 0. 4582 * 4583 * @throws IllegalStateException if this call is not preceded by a successful 4584 * call to <code>setPara</code> or <code>setLine</code> 4585 * @throws IllegalArgumentException if charIndex is not within the legal range 4586 * 4587 * @see android.icu.text.BidiRun 4588 * @see #getProcessedLength 4589 */ 4590 public int getParagraphIndex(int charIndex) 4591 { 4592 verifyValidParaOrLine(); 4593 Bidi bidi = paraBidi; /* get Para object if Line object */ 4594 verifyRange(charIndex, 0, bidi.length); 4595 int paraIndex; 4596 for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) { 4597 } 4598 return paraIndex; 4599 } 4600 4601 /** 4602 * Set a custom Bidi classifier used by the UBA implementation for Bidi 4603 * class determination. 4604 * 4605 * @param classifier A new custom classifier. This can be null. 4606 * 4607 * @see #getCustomClassifier 4608 */ 4609 public void setCustomClassifier(BidiClassifier classifier) { 4610 this.customClassifier = classifier; 4611 } 4612 4613 /** 4614 * Gets the current custom class classifier used for Bidi class 4615 * determination. 4616 * 4617 * @return An instance of class <code>BidiClassifier</code> 4618 * 4619 * @see #setCustomClassifier 4620 */ 4621 public BidiClassifier getCustomClassifier() { 4622 return this.customClassifier; 4623 } 4624 4625 /** 4626 * Retrieves the Bidi class for a given code point. 4627 * <p>If a <code>BidiClassifier</code> is defined and returns a value 4628 * other than <code>CLASS_DEFAULT=UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1</code>, 4629 * that value is used; otherwise the default class determination mechanism is invoked. 4630 * 4631 * @param c The code point to get a Bidi class for. 4632 * 4633 * @return The Bidi class for the character <code>c</code> that is in effect 4634 * for this <code>Bidi</code> instance. 4635 * 4636 * @see BidiClassifier 4637 */ 4638 public int getCustomizedClass(int c) { 4639 int dir; 4640 4641 if (customClassifier == null || 4642 (dir = customClassifier.classify(c)) == Bidi.CLASS_DEFAULT) { 4643 dir = bdp.getClass(c); 4644 } 4645 if (dir >= UCharacterDirection.CHAR_DIRECTION_COUNT) 4646 dir = ON; 4647 return dir; 4648 } 4649 4650 /** 4651 * <code>setLine()</code> returns a <code>Bidi</code> object to 4652 * contain the reordering information, especially the resolved levels, 4653 * for all the characters in a line of text. This line of text is 4654 * specified by referring to a <code>Bidi</code> object representing 4655 * this information for a piece of text containing one or more paragraphs, 4656 * and by specifying a range of indexes in this text.<p> 4657 * In the new line object, the indexes will range from 0 to <code>limit-start-1</code>.<p> 4658 * 4659 * This is used after calling <code>setPara()</code> 4660 * for a piece of text, and after line-breaking on that text. 4661 * It is not necessary if each paragraph is treated as a single line.<p> 4662 * 4663 * After line-breaking, rules (L1) and (L2) for the treatment of 4664 * trailing WS and for reordering are performed on 4665 * a <code>Bidi</code> object that represents a line.<p> 4666 * 4667 * <strong>Important: </strong>the line <code>Bidi</code> object may 4668 * reference data within the global text <code>Bidi</code> object. 4669 * You should not alter the content of the global text object until 4670 * you are finished using the line object. 4671 * 4672 * @param start is the line's first index into the text. 4673 * 4674 * @param limit is just behind the line's last index into the text 4675 * (its last index +1). 4676 * 4677 * @return a <code>Bidi</code> object that will now represent a line of the text. 4678 * 4679 * @throws IllegalStateException if this call is not preceded by a successful 4680 * call to <code>setPara</code> 4681 * @throws IllegalArgumentException if start and limit are not in the range 4682 * <code>0<=start<limit<=getProcessedLength()</code>, 4683 * or if the specified line crosses a paragraph boundary 4684 * 4685 * @see #setPara 4686 * @see #getProcessedLength 4687 */ 4688 public Bidi setLine(int start, int limit) 4689 { 4690 verifyValidPara(); 4691 verifyRange(start, 0, limit); 4692 verifyRange(limit, 0, length+1); 4693 if (getParagraphIndex(start) != getParagraphIndex(limit - 1)) { 4694 /* the line crosses a paragraph boundary */ 4695 throw new IllegalArgumentException(); 4696 } 4697 return BidiLine.setLine(this, start, limit); 4698 } 4699 4700 /** 4701 * Get the level for one character. 4702 * 4703 * @param charIndex the index of a character. 4704 * 4705 * @return The level for the character at <code>charIndex</code>. 4706 * 4707 * @throws IllegalStateException if this call is not preceded by a successful 4708 * call to <code>setPara</code> or <code>setLine</code> 4709 * @throws IllegalArgumentException if charIndex is not in the range 4710 * <code>0<=charIndex<getProcessedLength()</code> 4711 * 4712 * @see #getProcessedLength 4713 */ 4714 public byte getLevelAt(int charIndex) 4715 { 4716 verifyValidParaOrLine(); 4717 verifyRange(charIndex, 0, length); 4718 return BidiLine.getLevelAt(this, charIndex); 4719 } 4720 4721 /** 4722 * Get an array of levels for each character.<p> 4723 * 4724 * Note that this method may allocate memory under some 4725 * circumstances, unlike <code>getLevelAt()</code>. 4726 * 4727 * @return The levels array for the text, 4728 * or <code>null</code> if an error occurs. 4729 * 4730 * @throws IllegalStateException if this call is not preceded by a successful 4731 * call to <code>setPara</code> or <code>setLine</code> 4732 */ 4733 public byte[] getLevels() 4734 { 4735 verifyValidParaOrLine(); 4736 if (length <= 0) { 4737 return new byte[0]; 4738 } 4739 return BidiLine.getLevels(this); 4740 } 4741 4742 /** 4743 * Get a logical run. 4744 * This method returns information about a run and is used 4745 * to retrieve runs in logical order.<p> 4746 * This is especially useful for line-breaking on a paragraph. 4747 * 4748 * @param logicalPosition is a logical position within the source text. 4749 * 4750 * @return a BidiRun object filled with <code>start</code> containing 4751 * the first character of the run, <code>limit</code> containing 4752 * the limit of the run, and <code>embeddingLevel</code> containing 4753 * the level of the run. 4754 * 4755 * @throws IllegalStateException if this call is not preceded by a successful 4756 * call to <code>setPara</code> or <code>setLine</code> 4757 * @throws IllegalArgumentException if logicalPosition is not in the range 4758 * <code>0<=logicalPosition<getProcessedLength()</code> 4759 * 4760 * @see android.icu.text.BidiRun 4761 * @see android.icu.text.BidiRun#getStart() 4762 * @see android.icu.text.BidiRun#getLimit() 4763 * @see android.icu.text.BidiRun#getEmbeddingLevel() 4764 */ 4765 public BidiRun getLogicalRun(int logicalPosition) 4766 { 4767 verifyValidParaOrLine(); 4768 verifyRange(logicalPosition, 0, length); 4769 return BidiLine.getLogicalRun(this, logicalPosition); 4770 } 4771 4772 /** 4773 * Get the number of runs. 4774 * This method may invoke the actual reordering on the 4775 * <code>Bidi</code> object, after <code>setPara()</code> 4776 * may have resolved only the levels of the text. Therefore, 4777 * <code>countRuns()</code> may have to allocate memory, 4778 * and may throw an exception if it fails to do so. 4779 * 4780 * @return The number of runs. 4781 * 4782 * @throws IllegalStateException if this call is not preceded by a successful 4783 * call to <code>setPara</code> or <code>setLine</code> 4784 */ 4785 public int countRuns() 4786 { 4787 verifyValidParaOrLine(); 4788 BidiLine.getRuns(this); 4789 return runCount; 4790 } 4791 4792 /** 4793 * 4794 * Get a <code>BidiRun</code> object according to its index. BidiRun methods 4795 * may be used to retrieve the run's logical start, length and level, 4796 * which can be even for an LTR run or odd for an RTL run. 4797 * In an RTL run, the character at the logical start is 4798 * visually on the right of the displayed run. 4799 * The length is the number of characters in the run.<p> 4800 * <code>countRuns()</code> is normally called 4801 * before the runs are retrieved. 4802 * 4803 * <p> 4804 * Example: 4805 * <pre> 4806 * Bidi bidi = new Bidi(); 4807 * String text = "abc 123 DEFG xyz"; 4808 * bidi.setPara(text, Bidi.RTL, null); 4809 * int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length; 4810 * BidiRun run; 4811 * for (i = 0; i < count; ++i) { 4812 * run = bidi.getVisualRun(i); 4813 * logicalStart = run.getStart(); 4814 * length = run.getLength(); 4815 * if (Bidi.LTR == run.getEmbeddingLevel()) { 4816 * do { // LTR 4817 * show_char(text.charAt(logicalStart++), visualIndex++); 4818 * } while (--length > 0); 4819 * } else { 4820 * logicalStart += length; // logicalLimit 4821 * do { // RTL 4822 * show_char(text.charAt(--logicalStart), visualIndex++); 4823 * } while (--length > 0); 4824 * } 4825 * } 4826 * </pre> 4827 * <p> 4828 * Note that in right-to-left runs, code like this places 4829 * second surrogates before first ones (which is generally a bad idea) 4830 * and combining characters before base characters. 4831 * <p> 4832 * Use of <code>{@link #writeReordered}</code>, optionally with the 4833 * <code>{@link #KEEP_BASE_COMBINING}</code> option, can be considered in 4834 * order to avoid these issues. 4835 * 4836 * @param runIndex is the number of the run in visual order, in the 4837 * range <code>[0..countRuns()-1]</code>. 4838 * 4839 * @return a BidiRun object containing the details of the run. The 4840 * directionality of the run is 4841 * <code>LTR==0</code> or <code>RTL==1</code>, 4842 * never <code>MIXED</code>. 4843 * 4844 * @throws IllegalStateException if this call is not preceded by a successful 4845 * call to <code>setPara</code> or <code>setLine</code> 4846 * @throws IllegalArgumentException if <code>runIndex</code> is not in 4847 * the range <code>0<=runIndex<countRuns()</code> 4848 * 4849 * @see #countRuns() 4850 * @see android.icu.text.BidiRun 4851 * @see android.icu.text.BidiRun#getStart() 4852 * @see android.icu.text.BidiRun#getLength() 4853 * @see android.icu.text.BidiRun#getEmbeddingLevel() 4854 */ 4855 public BidiRun getVisualRun(int runIndex) 4856 { 4857 verifyValidParaOrLine(); 4858 BidiLine.getRuns(this); 4859 verifyRange(runIndex, 0, runCount); 4860 return BidiLine.getVisualRun(this, runIndex); 4861 } 4862 4863 /** 4864 * Get the visual position from a logical text position. 4865 * If such a mapping is used many times on the same 4866 * <code>Bidi</code> object, then calling 4867 * <code>getLogicalMap()</code> is more efficient. 4868 * <p> 4869 * The value returned may be <code>MAP_NOWHERE</code> if there is no 4870 * visual position because the corresponding text character is a Bidi 4871 * control removed from output by the option 4872 * <code>OPTION_REMOVE_CONTROLS</code>. 4873 * <p> 4874 * When the visual output is altered by using options of 4875 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>, 4876 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>, 4877 * <code>REMOVE_BIDI_CONTROLS</code>, the visual position returned may not 4878 * be correct. It is advised to use, when possible, reordering options 4879 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. 4880 * <p> 4881 * Note that in right-to-left runs, this mapping places 4882 * second surrogates before first ones (which is generally a bad idea) 4883 * and combining characters before base characters. 4884 * Use of <code>{@link #writeReordered}</code>, optionally with the 4885 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead 4886 * of using the mapping, in order to avoid these issues. 4887 * 4888 * @param logicalIndex is the index of a character in the text. 4889 * 4890 * @return The visual position of this character. 4891 * 4892 * @throws IllegalStateException if this call is not preceded by a successful 4893 * call to <code>setPara</code> or <code>setLine</code> 4894 * @throws IllegalArgumentException if <code>logicalIndex</code> is not in 4895 * the range <code>0<=logicalIndex<getProcessedLength()</code> 4896 * 4897 * @see #getLogicalMap 4898 * @see #getLogicalIndex 4899 * @see #getProcessedLength 4900 * @see #MAP_NOWHERE 4901 * @see #OPTION_REMOVE_CONTROLS 4902 * @see #writeReordered 4903 */ 4904 public int getVisualIndex(int logicalIndex) 4905 { 4906 verifyValidParaOrLine(); 4907 verifyRange(logicalIndex, 0, length); 4908 return BidiLine.getVisualIndex(this, logicalIndex); 4909 } 4910 4911 4912 /** 4913 * Get the logical text position from a visual position. 4914 * If such a mapping is used many times on the same 4915 * <code>Bidi</code> object, then calling 4916 * <code>getVisualMap()</code> is more efficient. 4917 * <p> 4918 * The value returned may be <code>MAP_NOWHERE</code> if there is no 4919 * logical position because the corresponding text character is a Bidi 4920 * mark inserted in the output by option 4921 * <code>OPTION_INSERT_MARKS</code>. 4922 * <p> 4923 * This is the inverse method to <code>getVisualIndex()</code>. 4924 * <p> 4925 * When the visual output is altered by using options of 4926 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>, 4927 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>, 4928 * <code>REMOVE_BIDI_CONTROLS</code>, the logical position returned may not 4929 * be correct. It is advised to use, when possible, reordering options 4930 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. 4931 * 4932 * @param visualIndex is the visual position of a character. 4933 * 4934 * @return The index of this character in the text. 4935 * 4936 * @throws IllegalStateException if this call is not preceded by a successful 4937 * call to <code>setPara</code> or <code>setLine</code> 4938 * @throws IllegalArgumentException if <code>visualIndex</code> is not in 4939 * the range <code>0<=visualIndex<getResultLength()</code> 4940 * 4941 * @see #getVisualMap 4942 * @see #getVisualIndex 4943 * @see #getResultLength 4944 * @see #MAP_NOWHERE 4945 * @see #OPTION_INSERT_MARKS 4946 * @see #writeReordered 4947 */ 4948 public int getLogicalIndex(int visualIndex) 4949 { 4950 verifyValidParaOrLine(); 4951 verifyRange(visualIndex, 0, resultLength); 4952 /* we can do the trivial cases without the runs array */ 4953 if (insertPoints.size == 0 && controlCount == 0) { 4954 if (direction == LTR) { 4955 return visualIndex; 4956 } 4957 else if (direction == RTL) { 4958 return length - visualIndex - 1; 4959 } 4960 } 4961 BidiLine.getRuns(this); 4962 return BidiLine.getLogicalIndex(this, visualIndex); 4963 } 4964 4965 /** 4966 * Get a logical-to-visual index map (array) for the characters in the 4967 * <code>Bidi</code> (paragraph or line) object. 4968 * <p> 4969 * Some values in the map may be <code>MAP_NOWHERE</code> if the 4970 * corresponding text characters are Bidi controls removed from the visual 4971 * output by the option <code>OPTION_REMOVE_CONTROLS</code>. 4972 * <p> 4973 * When the visual output is altered by using options of 4974 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>, 4975 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>, 4976 * <code>REMOVE_BIDI_CONTROLS</code>, the visual positions returned may not 4977 * be correct. It is advised to use, when possible, reordering options 4978 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. 4979 * <p> 4980 * Note that in right-to-left runs, this mapping places 4981 * second surrogates before first ones (which is generally a bad idea) 4982 * and combining characters before base characters. 4983 * Use of <code>{@link #writeReordered}</code>, optionally with the 4984 * <code>{@link #KEEP_BASE_COMBINING}</code> option can be considered instead 4985 * of using the mapping, in order to avoid these issues. 4986 * 4987 * @return an array of <code>getProcessedLength()</code> 4988 * indexes which will reflect the reordering of the characters.<br><br> 4989 * The index map will result in 4990 * <code>indexMap[logicalIndex]==visualIndex</code>, where 4991 * <code>indexMap</code> represents the returned array. 4992 * 4993 * @throws IllegalStateException if this call is not preceded by a successful 4994 * call to <code>setPara</code> or <code>setLine</code> 4995 * 4996 * @see #getVisualMap 4997 * @see #getVisualIndex 4998 * @see #getProcessedLength 4999 * @see #MAP_NOWHERE 5000 * @see #OPTION_REMOVE_CONTROLS 5001 * @see #writeReordered 5002 */ 5003 public int[] getLogicalMap() 5004 { 5005 /* countRuns() checks successful call to setPara/setLine */ 5006 countRuns(); 5007 if (length <= 0) { 5008 return new int[0]; 5009 } 5010 return BidiLine.getLogicalMap(this); 5011 } 5012 5013 /** 5014 * Get a visual-to-logical index map (array) for the characters in the 5015 * <code>Bidi</code> (paragraph or line) object. 5016 * <p> 5017 * Some values in the map may be <code>MAP_NOWHERE</code> if the 5018 * corresponding text characters are Bidi marks inserted in the visual 5019 * output by the option <code>OPTION_INSERT_MARKS</code>. 5020 * <p> 5021 * When the visual output is altered by using options of 5022 * <code>writeReordered()</code> such as <code>INSERT_LRM_FOR_NUMERIC</code>, 5023 * <code>KEEP_BASE_COMBINING</code>, <code>OUTPUT_REVERSE</code>, 5024 * <code>REMOVE_BIDI_CONTROLS</code>, the logical positions returned may not 5025 * be correct. It is advised to use, when possible, reordering options 5026 * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. 5027 * 5028 * @return an array of <code>getResultLength()</code> 5029 * indexes which will reflect the reordering of the characters.<br><br> 5030 * The index map will result in 5031 * <code>indexMap[visualIndex]==logicalIndex</code>, where 5032 * <code>indexMap</code> represents the returned array. 5033 * 5034 * @throws IllegalStateException if this call is not preceded by a successful 5035 * call to <code>setPara</code> or <code>setLine</code> 5036 * 5037 * @see #getLogicalMap 5038 * @see #getLogicalIndex 5039 * @see #getResultLength 5040 * @see #MAP_NOWHERE 5041 * @see #OPTION_INSERT_MARKS 5042 * @see #writeReordered 5043 */ 5044 public int[] getVisualMap() 5045 { 5046 /* countRuns() checks successful call to setPara/setLine */ 5047 countRuns(); 5048 if (resultLength <= 0) { 5049 return new int[0]; 5050 } 5051 return BidiLine.getVisualMap(this); 5052 } 5053 5054 /** 5055 * This is a convenience method that does not use a <code>Bidi</code> object. 5056 * It is intended to be used for when an application has determined the levels 5057 * of objects (character sequences) and just needs to have them reordered (L2). 5058 * This is equivalent to using <code>getLogicalMap()</code> on a 5059 * <code>Bidi</code> object. 5060 * 5061 * @param levels is an array of levels that have been determined by 5062 * the application. 5063 * 5064 * @return an array of <code>levels.length</code> 5065 * indexes which will reflect the reordering of the characters.<p> 5066 * The index map will result in 5067 * <code>indexMap[logicalIndex]==visualIndex</code>, where 5068 * <code>indexMap</code> represents the returned array. 5069 */ 5070 public static int[] reorderLogical(byte[] levels) 5071 { 5072 return BidiLine.reorderLogical(levels); 5073 } 5074 5075 /** 5076 * This is a convenience method that does not use a <code>Bidi</code> object. 5077 * It is intended to be used for when an application has determined the levels 5078 * of objects (character sequences) and just needs to have them reordered (L2). 5079 * This is equivalent to using <code>getVisualMap()</code> on a 5080 * <code>Bidi</code> object. 5081 * 5082 * @param levels is an array of levels that have been determined by 5083 * the application. 5084 * 5085 * @return an array of <code>levels.length</code> 5086 * indexes which will reflect the reordering of the characters.<p> 5087 * The index map will result in 5088 * <code>indexMap[visualIndex]==logicalIndex</code>, where 5089 * <code>indexMap</code> represents the returned array. 5090 */ 5091 public static int[] reorderVisual(byte[] levels) 5092 { 5093 return BidiLine.reorderVisual(levels); 5094 } 5095 5096 /** 5097 * Invert an index map. 5098 * The index mapping of the argument map is inverted and returned as 5099 * an array of indexes that we will call the inverse map. 5100 * 5101 * @param srcMap is an array whose elements define the original mapping 5102 * from a source array to a destination array. 5103 * Some elements of the source array may have no mapping in the 5104 * destination array. In that case, their value will be 5105 * the special value <code>MAP_NOWHERE</code>. 5106 * All elements must be >=0 or equal to <code>MAP_NOWHERE</code>. 5107 * Some elements in the source map may have a value greater than the 5108 * srcMap.length if the destination array has more elements than the 5109 * source array. 5110 * There must be no duplicate indexes (two or more elements with the 5111 * same value except <code>MAP_NOWHERE</code>). 5112 * 5113 * @return an array representing the inverse map. 5114 * This array has a number of elements equal to 1 + the highest 5115 * value in <code>srcMap</code>. 5116 * For elements of the result array which have no matching elements 5117 * in the source array, the corresponding elements in the inverse 5118 * map will receive a value equal to <code>MAP_NOWHERE</code>. 5119 * If element with index i in <code>srcMap</code> has a value k different 5120 * from <code>MAP_NOWHERE</code>, this means that element i of 5121 * the source array maps to element k in the destination array. 5122 * The inverse map will have value i in its k-th element. 5123 * For all elements of the destination array which do not map to 5124 * an element in the source array, the corresponding element in the 5125 * inverse map will have a value equal to <code>MAP_NOWHERE</code>. 5126 * 5127 * @see #MAP_NOWHERE 5128 */ 5129 public static int[] invertMap(int[] srcMap) 5130 { 5131 if (srcMap == null) { 5132 return null; 5133 } else { 5134 return BidiLine.invertMap(srcMap); 5135 } 5136 } 5137 5138 /* 5139 * Fields and methods for compatibility with java.text.bidi (Sun implementation) 5140 */ 5141 5142 /** 5143 * Constant indicating base direction is left-to-right. 5144 */ 5145 public static final int DIRECTION_LEFT_TO_RIGHT = LTR; 5146 5147 /** 5148 * Constant indicating base direction is right-to-left. 5149 */ 5150 public static final int DIRECTION_RIGHT_TO_LEFT = RTL; 5151 5152 /** 5153 * Constant indicating that the base direction depends on the first strong 5154 * directional character in the text according to the Unicode Bidirectional 5155 * Algorithm. If no strong directional character is present, the base 5156 * direction is left-to-right. 5157 */ 5158 public static final int DIRECTION_DEFAULT_LEFT_TO_RIGHT = LEVEL_DEFAULT_LTR; 5159 5160 /** 5161 * Constant indicating that the base direction depends on the first strong 5162 * directional character in the text according to the Unicode Bidirectional 5163 * Algorithm. If no strong directional character is present, the base 5164 * direction is right-to-left. 5165 */ 5166 public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT = LEVEL_DEFAULT_RTL; 5167 5168 /** 5169 * Create Bidi from the given paragraph of text and base direction. 5170 * 5171 * @param paragraph a paragraph of text 5172 * @param flags a collection of flags that control the algorithm. The 5173 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT, 5174 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and 5175 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved. 5176 * @see #DIRECTION_LEFT_TO_RIGHT 5177 * @see #DIRECTION_RIGHT_TO_LEFT 5178 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT 5179 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT 5180 */ 5181 public Bidi(String paragraph, int flags) 5182 { 5183 this(paragraph.toCharArray(), 0, null, 0, paragraph.length(), flags); 5184 } 5185 5186 /** 5187 * Create Bidi from the given paragraph of text.<p> 5188 * 5189 * The RUN_DIRECTION attribute in the text, if present, determines the base 5190 * direction (left-to-right or right-to-left). If not present, the base 5191 * direction is computed using the Unicode Bidirectional Algorithm, 5192 * defaulting to left-to-right if there are no strong directional characters 5193 * in the text. This attribute, if present, must be applied to all the text 5194 * in the paragraph.<p> 5195 * 5196 * The BIDI_EMBEDDING attribute in the text, if present, represents 5197 * embedding level information. Negative values from -1 to -62 indicate 5198 * overrides at the absolute value of the level. Positive values from 1 to 5199 * 62 indicate embeddings. Where values are zero or not defined, the base 5200 * embedding level as determined by the base direction is assumed.<p> 5201 * 5202 * The NUMERIC_SHAPING attribute in the text, if present, converts European 5203 * digits to other decimal digits before running the bidi algorithm. This 5204 * attribute, if present, must be applied to all the text in the paragraph.<p> 5205 * 5206 * Note: this constructor calls setPara() internally. 5207 * 5208 * @param paragraph a paragraph of text with optional character and 5209 * paragraph attribute information 5210 */ 5211 public Bidi(AttributedCharacterIterator paragraph) 5212 { 5213 this(); 5214 setPara(paragraph); 5215 } 5216 5217 /** 5218 * Create Bidi from the given text, embedding, and direction information. 5219 * The embeddings array may be null. If present, the values represent 5220 * embedding level information. Negative values from -1 to -61 indicate 5221 * overrides at the absolute value of the level. Positive values from 1 to 5222 * 61 indicate embeddings. Where values are zero, the base embedding level 5223 * as determined by the base direction is assumed.<p> 5224 * 5225 * Note: this constructor calls setPara() internally. 5226 * 5227 * @param text an array containing the paragraph of text to process. 5228 * @param textStart the index into the text array of the start of the 5229 * paragraph. 5230 * @param embeddings an array containing embedding values for each character 5231 * in the paragraph. This can be null, in which case it is assumed 5232 * that there is no external embedding information. 5233 * @param embStart the index into the embedding array of the start of the 5234 * paragraph. 5235 * @param paragraphLength the length of the paragraph in the text and 5236 * embeddings arrays. 5237 * @param flags a collection of flags that control the algorithm. The 5238 * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT, 5239 * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and 5240 * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved. 5241 * 5242 * @throws IllegalArgumentException if the values in embeddings are 5243 * not within the allowed range 5244 * 5245 * @see #DIRECTION_LEFT_TO_RIGHT 5246 * @see #DIRECTION_RIGHT_TO_LEFT 5247 * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT 5248 * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT 5249 */ 5250 public Bidi(char[] text, 5251 int textStart, 5252 byte[] embeddings, 5253 int embStart, 5254 int paragraphLength, 5255 int flags) 5256 { 5257 this(); 5258 byte paraLvl; 5259 switch (flags) { 5260 case DIRECTION_LEFT_TO_RIGHT: 5261 default: 5262 paraLvl = LTR; 5263 break; 5264 case DIRECTION_RIGHT_TO_LEFT: 5265 paraLvl = RTL; 5266 break; 5267 case DIRECTION_DEFAULT_LEFT_TO_RIGHT: 5268 paraLvl = LEVEL_DEFAULT_LTR; 5269 break; 5270 case DIRECTION_DEFAULT_RIGHT_TO_LEFT: 5271 paraLvl = LEVEL_DEFAULT_RTL; 5272 break; 5273 } 5274 byte[] paraEmbeddings; 5275 if (embeddings == null) { 5276 paraEmbeddings = null; 5277 } else { 5278 paraEmbeddings = new byte[paragraphLength]; 5279 byte lev; 5280 for (int i = 0; i < paragraphLength; i++) { 5281 lev = embeddings[i + embStart]; 5282 if (lev < 0) { 5283 lev = (byte)((- lev) | LEVEL_OVERRIDE); 5284 } else if (lev == 0) { 5285 lev = paraLvl; 5286 if (paraLvl > MAX_EXPLICIT_LEVEL) { 5287 lev &= 1; 5288 } 5289 } 5290 paraEmbeddings[i] = lev; 5291 } 5292 } 5293 if (textStart == 0 && embStart == 0 && paragraphLength == text.length) { 5294 setPara(text, paraLvl, paraEmbeddings); 5295 } else { 5296 char[] paraText = new char[paragraphLength]; 5297 System.arraycopy(text, textStart, paraText, 0, paragraphLength); 5298 setPara(paraText, paraLvl, paraEmbeddings); 5299 } 5300 } 5301 5302 /** 5303 * Create a Bidi object representing the bidi information on a line of text 5304 * within the paragraph represented by the current Bidi. This call is not 5305 * required if the entire paragraph fits on one line. 5306 * 5307 * @param lineStart the offset from the start of the paragraph to the start 5308 * of the line. 5309 * @param lineLimit the offset from the start of the paragraph to the limit 5310 * of the line. 5311 * 5312 * @throws IllegalStateException if this call is not preceded by a successful 5313 * call to <code>setPara</code> 5314 * @throws IllegalArgumentException if lineStart and lineLimit are not in the range 5315 * <code>0<=lineStart<lineLimit<=getProcessedLength()</code>, 5316 * or if the specified line crosses a paragraph boundary 5317 */ 5318 public Bidi createLineBidi(int lineStart, int lineLimit) 5319 { 5320 return setLine(lineStart, lineLimit); 5321 } 5322 5323 /** 5324 * Return true if the line is not left-to-right or right-to-left. This means 5325 * it either has mixed runs of left-to-right and right-to-left text, or the 5326 * base direction differs from the direction of the only run of text. 5327 * 5328 * @return true if the line is not left-to-right or right-to-left. 5329 * 5330 * @throws IllegalStateException if this call is not preceded by a successful 5331 * call to <code>setPara</code> 5332 */ 5333 public boolean isMixed() 5334 { 5335 return (!isLeftToRight() && !isRightToLeft()); 5336 } 5337 5338 /** 5339 * Return true if the line is all left-to-right text and the base direction 5340 * is left-to-right. 5341 * 5342 * @return true if the line is all left-to-right text and the base direction 5343 * is left-to-right. 5344 * 5345 * @throws IllegalStateException if this call is not preceded by a successful 5346 * call to <code>setPara</code> 5347 */ 5348 public boolean isLeftToRight() 5349 { 5350 return (getDirection() == LTR && (paraLevel & 1) == 0); 5351 } 5352 5353 /** 5354 * Return true if the line is all right-to-left text, and the base direction 5355 * is right-to-left 5356 * 5357 * @return true if the line is all right-to-left text, and the base 5358 * direction is right-to-left 5359 * 5360 * @throws IllegalStateException if this call is not preceded by a successful 5361 * call to <code>setPara</code> 5362 */ 5363 public boolean isRightToLeft() 5364 { 5365 return (getDirection() == RTL && (paraLevel & 1) == 1); 5366 } 5367 5368 /** 5369 * Return true if the base direction is left-to-right 5370 * 5371 * @return true if the base direction is left-to-right 5372 * 5373 * @throws IllegalStateException if this call is not preceded by a successful 5374 * call to <code>setPara</code> or <code>setLine</code> 5375 */ 5376 public boolean baseIsLeftToRight() 5377 { 5378 return (getParaLevel() == LTR); 5379 } 5380 5381 /** 5382 * Return the base level (0 if left-to-right, 1 if right-to-left). 5383 * 5384 * @return the base level 5385 * 5386 * @throws IllegalStateException if this call is not preceded by a successful 5387 * call to <code>setPara</code> or <code>setLine</code> 5388 */ 5389 public int getBaseLevel() 5390 { 5391 return getParaLevel(); 5392 } 5393 5394 /** 5395 * Return the number of level runs. 5396 * 5397 * @return the number of level runs 5398 * 5399 * @throws IllegalStateException if this call is not preceded by a successful 5400 * call to <code>setPara</code> or <code>setLine</code> 5401 */ 5402 public int getRunCount() 5403 { 5404 return countRuns(); 5405 } 5406 5407 /** 5408 * Compute the logical to visual run mapping 5409 */ 5410 void getLogicalToVisualRunsMap() 5411 { 5412 if (isGoodLogicalToVisualRunsMap) { 5413 return; 5414 } 5415 int count = countRuns(); 5416 if ((logicalToVisualRunsMap == null) || 5417 (logicalToVisualRunsMap.length < count)) { 5418 logicalToVisualRunsMap = new int[count]; 5419 } 5420 int i; 5421 long[] keys = new long[count]; 5422 for (i = 0; i < count; i++) { 5423 keys[i] = ((long)(runs[i].start)<<32) + i; 5424 } 5425 Arrays.sort(keys); 5426 for (i = 0; i < count; i++) { 5427 logicalToVisualRunsMap[i] = (int)(keys[i] & 0x00000000FFFFFFFF); 5428 } 5429 isGoodLogicalToVisualRunsMap = true; 5430 } 5431 5432 /** 5433 * Return the level of the nth logical run in this line. 5434 * 5435 * @param run the index of the run, between 0 and <code>countRuns()-1</code> 5436 * 5437 * @return the level of the run 5438 * 5439 * @throws IllegalStateException if this call is not preceded by a successful 5440 * call to <code>setPara</code> or <code>setLine</code> 5441 * @throws IllegalArgumentException if <code>run</code> is not in 5442 * the range <code>0<=run<countRuns()</code> 5443 */ 5444 public int getRunLevel(int run) 5445 { 5446 verifyValidParaOrLine(); 5447 BidiLine.getRuns(this); 5448 verifyRange(run, 0, runCount); 5449 getLogicalToVisualRunsMap(); 5450 return runs[logicalToVisualRunsMap[run]].level; 5451 } 5452 5453 /** 5454 * Return the index of the character at the start of the nth logical run in 5455 * this line, as an offset from the start of the line. 5456 * 5457 * @param run the index of the run, between 0 and <code>countRuns()</code> 5458 * 5459 * @return the start of the run 5460 * 5461 * @throws IllegalStateException if this call is not preceded by a successful 5462 * call to <code>setPara</code> or <code>setLine</code> 5463 * @throws IllegalArgumentException if <code>run</code> is not in 5464 * the range <code>0<=run<countRuns()</code> 5465 */ 5466 public int getRunStart(int run) 5467 { 5468 verifyValidParaOrLine(); 5469 BidiLine.getRuns(this); 5470 verifyRange(run, 0, runCount); 5471 getLogicalToVisualRunsMap(); 5472 return runs[logicalToVisualRunsMap[run]].start; 5473 } 5474 5475 /** 5476 * Return the index of the character past the end of the nth logical run in 5477 * this line, as an offset from the start of the line. For example, this 5478 * will return the length of the line for the last run on the line. 5479 * 5480 * @param run the index of the run, between 0 and <code>countRuns()</code> 5481 * 5482 * @return the limit of the run 5483 * 5484 * @throws IllegalStateException if this call is not preceded by a successful 5485 * call to <code>setPara</code> or <code>setLine</code> 5486 * @throws IllegalArgumentException if <code>run</code> is not in 5487 * the range <code>0<=run<countRuns()</code> 5488 */ 5489 public int getRunLimit(int run) 5490 { 5491 verifyValidParaOrLine(); 5492 BidiLine.getRuns(this); 5493 verifyRange(run, 0, runCount); 5494 getLogicalToVisualRunsMap(); 5495 int idx = logicalToVisualRunsMap[run]; 5496 int len = idx == 0 ? runs[idx].limit : 5497 runs[idx].limit - runs[idx-1].limit; 5498 return runs[idx].start + len; 5499 } 5500 5501 /** 5502 * Return true if the specified text requires bidi analysis. If this returns 5503 * false, the text will display left-to-right. Clients can then avoid 5504 * constructing a Bidi object. Text in the Arabic Presentation Forms area of 5505 * Unicode is presumed to already be shaped and ordered for display, and so 5506 * will not cause this method to return true. 5507 * 5508 * @param text the text containing the characters to test 5509 * @param start the start of the range of characters to test 5510 * @param limit the limit of the range of characters to test 5511 * 5512 * @return true if the range of characters requires bidi analysis 5513 */ 5514 public static boolean requiresBidi(char[] text, 5515 int start, 5516 int limit) 5517 { 5518 final int RTLMask = (1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT | 5519 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC | 5520 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING | 5521 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE | 5522 1 << UCharacter.DIRECTIONALITY_ARABIC_NUMBER); 5523 5524 for (int i = start; i < limit; ++i) { 5525 if (((1 << UCharacter.getDirection(text[i])) & RTLMask) != 0) { 5526 return true; 5527 } 5528 } 5529 return false; 5530 } 5531 5532 /** 5533 * Reorder the objects in the array into visual order based on their levels. 5534 * This is a utility method to use when you have a collection of objects 5535 * representing runs of text in logical order, each run containing text at a 5536 * single level. The elements at <code>index</code> from 5537 * <code>objectStart</code> up to <code>objectStart + count</code> in the 5538 * objects array will be reordered into visual order assuming 5539 * each run of text has the level indicated by the corresponding element in 5540 * the levels array (at <code>index - objectStart + levelStart</code>). 5541 * 5542 * @param levels an array representing the bidi level of each object 5543 * @param levelStart the start position in the levels array 5544 * @param objects the array of objects to be reordered into visual order 5545 * @param objectStart the start position in the objects array 5546 * @param count the number of objects to reorder 5547 */ 5548 public static void reorderVisually(byte[] levels, 5549 int levelStart, 5550 Object[] objects, 5551 int objectStart, 5552 int count) 5553 { 5554 byte[] reorderLevels = new byte[count]; 5555 System.arraycopy(levels, levelStart, reorderLevels, 0, count); 5556 int[] indexMap = reorderVisual(reorderLevels); 5557 Object[] temp = new Object[count]; 5558 System.arraycopy(objects, objectStart, temp, 0, count); 5559 for (int i = 0; i < count; ++i) { 5560 objects[objectStart + i] = temp[indexMap[i]]; 5561 } 5562 } 5563 5564 /** 5565 * Take a <code>Bidi</code> object containing the reordering 5566 * information for a piece of text (one or more paragraphs) set by 5567 * <code>setPara()</code> or for a line of text set by <code>setLine()</code> 5568 * and return a string containing the reordered text. 5569 * 5570 * <p>The text may have been aliased (only a reference was stored 5571 * without copying the contents), thus it must not have been modified 5572 * since the <code>setPara()</code> call. 5573 * 5574 * This method preserves the integrity of characters with multiple 5575 * code units and (optionally) combining characters. 5576 * Characters in RTL runs can be replaced by mirror-image characters 5577 * in the returned string. Note that "real" mirroring has to be done in a 5578 * rendering engine by glyph selection and that for many "mirrored" 5579 * characters there are no Unicode characters as mirror-image equivalents. 5580 * There are also options to insert or remove Bidi control 5581 * characters; see the descriptions of the return value and the 5582 * <code>options</code> parameter, and of the option bit flags. 5583 * 5584 * @param options A bit set of options for the reordering that control 5585 * how the reordered text is written. 5586 * The options include mirroring the characters on a code 5587 * point basis and inserting LRM characters, which is used 5588 * especially for transforming visually stored text 5589 * to logically stored text (although this is still an 5590 * imperfect implementation of an "inverse Bidi" algorithm 5591 * because it uses the "forward Bidi" algorithm at its core). 5592 * The available options are: 5593 * <code>DO_MIRRORING</code>, 5594 * <code>INSERT_LRM_FOR_NUMERIC</code>, 5595 * <code>KEEP_BASE_COMBINING</code>, 5596 * <code>OUTPUT_REVERSE</code>, 5597 * <code>REMOVE_BIDI_CONTROLS</code>, 5598 * <code>STREAMING</code> 5599 * 5600 * @return The reordered text. 5601 * If the <code>INSERT_LRM_FOR_NUMERIC</code> option is set, then 5602 * the length of the returned string could be as large as 5603 * <code>getLength()+2*countRuns()</code>.<br> 5604 * If the <code>REMOVE_BIDI_CONTROLS</code> option is set, then the 5605 * length of the returned string may be less than 5606 * <code>getLength()</code>.<br> 5607 * If none of these options is set, then the length of the returned 5608 * string will be exactly <code>getProcessedLength()</code>. 5609 * 5610 * @throws IllegalStateException if this call is not preceded by a successful 5611 * call to <code>setPara</code> or <code>setLine</code> 5612 * 5613 * @see #DO_MIRRORING 5614 * @see #INSERT_LRM_FOR_NUMERIC 5615 * @see #KEEP_BASE_COMBINING 5616 * @see #OUTPUT_REVERSE 5617 * @see #REMOVE_BIDI_CONTROLS 5618 * @see #OPTION_STREAMING 5619 * @see #getProcessedLength 5620 */ 5621 public String writeReordered(int options) 5622 { 5623 verifyValidParaOrLine(); 5624 if (length == 0) { 5625 /* nothing to do */ 5626 return ""; 5627 } 5628 return BidiWriter.writeReordered(this, options); 5629 } 5630 5631 /** 5632 * Reverse a Right-To-Left run of Unicode text. 5633 * 5634 * This method preserves the integrity of characters with multiple 5635 * code units and (optionally) combining characters. 5636 * Characters can be replaced by mirror-image characters 5637 * in the destination buffer. Note that "real" mirroring has 5638 * to be done in a rendering engine by glyph selection 5639 * and that for many "mirrored" characters there are no 5640 * Unicode characters as mirror-image equivalents. 5641 * There are also options to insert or remove Bidi control 5642 * characters. 5643 * 5644 * This method is the implementation for reversing RTL runs as part 5645 * of <code>writeReordered()</code>. For detailed descriptions 5646 * of the parameters, see there. 5647 * Since no Bidi controls are inserted here, the output string length 5648 * will never exceed <code>src.length()</code>. 5649 * 5650 * @see #writeReordered 5651 * 5652 * @param src The RTL run text. 5653 * 5654 * @param options A bit set of options for the reordering that control 5655 * how the reordered text is written. 5656 * See the <code>options</code> parameter in <code>writeReordered()</code>. 5657 * 5658 * @return The reordered text. 5659 * If the <code>REMOVE_BIDI_CONTROLS</code> option 5660 * is set, then the length of the returned string may be less than 5661 * <code>src.length()</code>. If this option is not set, 5662 * then the length of the returned string will be exactly 5663 * <code>src.length()</code>. 5664 * 5665 * @throws IllegalArgumentException if <code>src</code> is null. 5666 */ 5667 public static String writeReverse(String src, int options) 5668 { 5669 /* error checking */ 5670 if (src == null) { 5671 throw new IllegalArgumentException(); 5672 } 5673 5674 if (src.length() > 0) { 5675 return BidiWriter.writeReverse(src, options); 5676 } else { 5677 /* nothing to do */ 5678 return ""; 5679 } 5680 } 5681 5682} 5683