Cross Reference: /frameworks/minikin/libs/minikin/WordBreaker.cpp

History log of /frameworks/minikin/libs/minikin/WordBreaker.cpp
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
d99530157d0d7bb621ffd89727583a089311b032	17-Jul-2017	Fredrik Roubert <roubert@google.com>	Update JNI code in preparation for ICU 59 switching to C++11 char16_t. am: a515fd19f9 Change-Id: I2f5de9a1d39691dd81e09d26d1f42ea93bbba108
a515fd19f9607a5abffaaec3db5bee1fc34f9a79	06-Apr-2017	Fredrik Roubert <roubert@google.com>	Update JNI code in preparation for ICU 59 switching to C++11 char16_t. ICU 59 (update pending on the aosp/icu59 branch) has switched to using the C++11 char16_t data type, which is a distinct type from uint16_t (which is what JNI's jchar is typedef'd as), even though they are bitwise identical. All code that passes UTF-16 data between ICU4C and JNI must therefore be updated with typecasts in the appropriate places before ICU 59 is merged to aosp/master. Bug: 37554848 Test: make Change-Id: I537d4abaf9a04033af5910e1252751ee363aeb9a /frameworks/minikin/libs/minikin/WordBreaker.cpp
bab3b98ceb29fa3fc5d8832284312859d7f32cc7	31-Mar-2017	Roozbeh Pournader <roozbeh@google.com>	Override the bidi properties of new emojis Test: new Minikin tests are added, and pass Bug: 32952475 Change-Id: Ibcae60d18d0cd5efd7556aaf58a716b6b59c8ee0 /frameworks/minikin/libs/minikin/WordBreaker.cpp
c97689439cb98ddf46fa279d8088b8c4a5f7b2f4	17-Mar-2017	Roozbeh Pournader <roozbeh@google.com>	Remove workaround for line breaks around currency symbols This is now done properly in ICU so we no longer need to do it ourselves. Also updated some comments about emoji line-breaking. Test: Existings tests for this in Minikin continue to pass. Bug: 24959657 Bug: 27365282 Change-Id: I865ea9ba1e79a64409d84d2d30c121f740e35ad6 /frameworks/minikin/libs/minikin/WordBreaker.cpp
8bdd9b948fc8a55ade32c2d84ff1a6b5be5659e1	16-Mar-2017	Roozbeh Pournader <roozbeh@google.com>	Refactor WordBreaker Refactor WordBreaker to make it ready for more complex behavior. Test: existing unit tests continue to pass Change-Id: Ife758f3e2cf48922ab56109e6c5d3cffa3673feb /frameworks/minikin/libs/minikin/WordBreaker.cpp
c7ef4000c1e840c3d3b66e85a40ebd34a5a2a8ee	18-Feb-2017	Roozbeh Pournader <roozbeh@google.com>	Correct hyphenation for various complex cases This adds better support for Arabic script languages, Armenian, Catalan, Hebrew, Kannada, Malayalam, Polish, Tamil, and Telugu by adding various hyphenation types and edits appropriate for the locales. For Arabic script languages, soft hyphens act transparently with regard to joining: If a line is broken at a soft hyphen where the two characters around the soft hyphen were joining each other before, they will continue to appear joining if the line is broken at the soft hyphen and a hyphen glyph is inserted. This is needed for Central Asian languages such as Uighur. For Armenian, U+058A ARMENIAN HYPHEN is used for line breaks caused by either automatic hyphenation or soft hyphens. For Catalan, nonstandard line breaks are implemented for "l·l", which hyphenates as "l-/l". For Polish, when there is a line break at a hyphen, the hyphen is repeated at the next line. For the South Indic languages, when breaks happen due to soft breaks or automatic hyphenation, no visible hyphen is inserted, although a penalty is added. For Hebrew, support for using U+05BE HEBREW PUNCTUATION MAQAF has been implemented, but it's turned off pending confirmation of desirability. Also, hard hyphens, which previously had no penalty added for breaking the line after them, now have the same penalty as an automatic or soft break, with the difference that no hyphen is inserted when they break. Finally, some bugs have been fixed with hyphenating multiscript and multi-font words. Bug: 19950445 Bug: 19955011 Bug: 25623243 Bug: 26154469 Bug: 26154471 Bug: 33387871 Bug: 33560754 Bug: 33752592 Bug: 33754204 Test: Unit tests added, plus thorough manual testing Change-Id: Iaccf776ce8d1d434ee8b1c534ff3659d80fdc338 /frameworks/minikin/libs/minikin/WordBreaker.cpp
32c12c0ac825453f4c3bba07718cfe684ec90bec	28-Dec-2016	Mark Salyzyn <salyzyn@google.com>	resolve merge conflicts of 2377a00 to master Test: build Bug: 26552300 Bug: 31289077 Change-Id: I6181ae7e84f9bdcbed50841c70d07f6906a10eb7
555d84c6f98eafcbe677cdcb8e9605760acd8ce5	29-Sep-2016	Mark Salyzyn <salyzyn@google.com>	minikin: Replace cutils/log.h with android/log.h or log/log.h - replace cutils/log.h with android/log.h (main buffer logging) - replace cutils/log.h with log.log.h (+SafetyNet logging) - define LOG_TAG before use. Test: compile Bug: 26552300 Bug: 31289077 Change-Id: I7a4803dd66f31b7103e09e5ff5b8fa523fa0fd60 /frameworks/minikin/libs/minikin/WordBreaker.cpp
14e2d136aaef271ba131f917cf5f27baa31ae5ad	09-Jun-2016	Seigo Nonaka <nona@google.com>	Always use minikin namespace. Here is a new policy of the namespace of minikin. - All components should be in minikin namespace. - All tests are also in minikin namespace and no anonymous namespace. Bug: 29233740 Change-Id: I71a8a35049bb8d624f7a78797231e90fed1e2b8c /frameworks/minikin/libs/minikin/WordBreaker.cpp
74b56175e5d41c1c1dc992208842b5576973d452	26-May-2016	Roozbeh Pournader <roozbeh@google.com>	Do not break after Myanmar viramas This is to work around a bug in ICU's line breaker, which thinks there is a valid line break between a Myanmar kinzi and a consonant. See http://bugs.icu-project.org/trac/ticket/12561 for the ICU bug. Bug: 28964845 Change-Id: I076ac15077e5627cbccf6732900bcc60d8596dda /frameworks/minikin/libs/minikin/WordBreaker.cpp
77f488345316fba46c271fc04bea470819ae1712	19-Apr-2016	Seigo Nonaka <nona@google.com>	Do not break before and after ZWJ. The emoji list is generated from external/unicode/emoji-data.txt Bug: 28248662 Change-Id: Ie49b3782505665d62c24371ca23d317ae5e9c5f7 /frameworks/minikin/libs/minikin/WordBreaker.cpp
d8917c69a9f7b7ca52f7ac850922dab4322113f5	16-Mar-2016	Roozbeh Pournader <roozbeh@google.com>	Do not allow line breaks before currency symbols Implement the change proposed in UTC document L2/16-043R (http://www.unicode.org/L2/L2016/16043r-line-break-pr-po.txt) to make sure we do not break between letters and currency symbols. Bug: 24959657 Change-Id: Ia29d0e5625f84870bd910d0c6e19036d17206704 /frameworks/minikin/libs/minikin/WordBreaker.cpp
56840e8006ca2b822adb401fc8a65f3c075cde10	25-Feb-2016	Raph Levien <raph@google.com>	Suppress line breaks in emoji + modifier An emoji base with an emoji modifier renders as a single glyph and thus should not be a line break. Current (Unicode 8) logic does indicate a line break, so we override the results of the ICU line break iterator. The code references a proposal to improve Unicode behavior; when that is adopted and we upgrade ICU accordingly, the special-case code should be deleted, but the tests can remain. Bug: 27343378 Change-Id: I5de9c53e9a34c503816f9131e3d894e6f7a57d13 /frameworks/minikin/libs/minikin/WordBreaker.cpp
d3f45892c721fb1738bf02fe19a5143a320ca4bf	19-Feb-2016	Raph Levien <raph@google.com>	Suppress linebreaks in emoji ZWJ sequences Due to the way emoji ZWJ sequences are defined, the ICU line breaking algorithm determines that there are valid line breaks inside the sequence. This patch suppresses these line breaks. This is an adaptation of I225ebebc0f4186e4b8f48fee399c4a62b3f0218a into the nyc-dev branch. Bug: 25433289 Change-Id: I84b50b1e6ef13d436965eab389659d02a30d100f /frameworks/minikin/libs/minikin/WordBreaker.cpp
c88ef135fcc2661ec7addc171ebc60787df38aff	09-Sep-2015	Raph Levien <raph@google.com>	Add penalty for breaks in URLs and email addresses Recent changes have added special cases for line breaks within URLs and email addresses. Such breaks are undesirable when they can be avoided, but at other times are needed to avoid huge gaps, or indeed to make the line fit at all. This patch assigns a penalty for such breaks, equal to the hyphenation penalty. The mechanism is currently very simple, but would be easy to fine-tune based on more detailed information about break quality. Bug: 20126487 Bug: 20566159 Change-Id: I0d3323897737a2850f1e734fa17b96b065eabd9c /frameworks/minikin/libs/minikin/WordBreaker.cpp
6d15657e4a3826d4d47d5358f1dde211484527e9	09-Sep-2015	Raph Levien <raph@google.com>	Add line breaks to email addresses and URLs This change adds accceptable line breaks according to sections 7.42 (Dividing URLs and e-mail addresses) and 14.12 (URLs or DOIs and line breaks) of the Chicago Manual of Style (16th ed.). In general, these place breaks before punctuation symbols, and suppresses them after hyphens. Bug: 20126487 Bug: 20566159 Change-Id: I2d07d516b920a506a2f718c38fb435c5eb1ee1f8 /frameworks/minikin/libs/minikin/WordBreaker.cpp
9c4cc648abcae144f3b99d612e58ef01d5e52cce	09-Sep-2015	Raph Levien <raph@google.com>	Special-case URLs and email addresses for line breaking Detect URLs and email addresses, and suppress both line breaking and hyphenation within them. Bug: 20126487 Bug: 20566159 Change-Id: I43629347a063dcf579e355e5b678d7195f453ad9 /frameworks/minikin/libs/minikin/WordBreaker.cpp
57b6dae9894b9362ef04517ff477fd491f9d433b	05-Sep-2015	Raph Levien <raph@google.com>	Refine hyphenation around punctuation Implement a WordBreaker that defines our concept of valid word boundaries, customizing the ICU behavior. Currently, we suppress line breaks at soft hyphens (these are handled specially). Also, the new WordBreaker class has methods that determine the start and end of the word (punctuation stripped) for the purpose of hyphenation. This patch, in its current form, doesn't handle email addresses and URLs specially, but the WordBreaker class is the correct place to do so. Also, special case handling of hyphens and dashes is still done in LineBreaker, but all of that should be moved to WordBreaker. Bug: 20126487 Bug: 20566159 Change-Id: I492cbad963f9b74a2915f010dad46bb91f97b2fe /frameworks/minikin/libs/minikin/WordBreaker.cpp