History log of /frameworks/minikin/libs/minikin/WordBreaker.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
bab3b98ceb29fa3fc5d8832284312859d7f32cc7 31-Mar-2017 Roozbeh Pournader <roozbeh@google.com> Override the bidi properties of new emojis

Test: new Minikin tests are added, and pass
Bug: 32952475
Change-Id: Ibcae60d18d0cd5efd7556aaf58a716b6b59c8ee0
/frameworks/minikin/libs/minikin/WordBreaker.cpp
c97689439cb98ddf46fa279d8088b8c4a5f7b2f4 17-Mar-2017 Roozbeh Pournader <roozbeh@google.com> Remove workaround for line breaks around currency symbols

This is now done properly in ICU so we no longer need to do it ourselves.

Also updated some comments about emoji line-breaking.

Test: Existings tests for this in Minikin continue to pass.
Bug: 24959657
Bug: 27365282
Change-Id: I865ea9ba1e79a64409d84d2d30c121f740e35ad6
/frameworks/minikin/libs/minikin/WordBreaker.cpp
8bdd9b948fc8a55ade32c2d84ff1a6b5be5659e1 16-Mar-2017 Roozbeh Pournader <roozbeh@google.com> Refactor WordBreaker

Refactor WordBreaker to make it ready for more complex behavior.

Test: existing unit tests continue to pass
Change-Id: Ife758f3e2cf48922ab56109e6c5d3cffa3673feb
/frameworks/minikin/libs/minikin/WordBreaker.cpp
c7ef4000c1e840c3d3b66e85a40ebd34a5a2a8ee 18-Feb-2017 Roozbeh Pournader <roozbeh@google.com> Correct hyphenation for various complex cases

This adds better support for Arabic script languages, Armenian,
Catalan, Hebrew, Kannada, Malayalam, Polish, Tamil, and Telugu by
adding various hyphenation types and edits appropriate for the
locales.

For Arabic script languages, soft hyphens act transparently with
regard to joining: If a line is broken at a soft hyphen where the two
characters around the soft hyphen were joining each other before,
they will continue to appear joining if the line is broken at the
soft hyphen and a hyphen glyph is inserted. This is needed for
Central Asian languages such as Uighur.

For Armenian, U+058A ARMENIAN HYPHEN is used for line breaks caused
by either automatic hyphenation or soft hyphens.

For Catalan, nonstandard line breaks are implemented for "l·l", which
hyphenates as "l-/l".

For Polish, when there is a line break at a hyphen, the hyphen is
repeated at the next line.

For the South Indic languages, when breaks happen due to soft breaks
or automatic hyphenation, no visible hyphen is inserted, although a
penalty is added.

For Hebrew, support for using U+05BE HEBREW PUNCTUATION MAQAF has
been implemented, but it's turned off pending confirmation of
desirability.

Also, hard hyphens, which previously had no penalty added for
breaking the line after them, now have the same penalty as an
automatic or soft break, with the difference that no hyphen is
inserted when they break.

Finally, some bugs have been fixed with hyphenating multiscript and
multi-font words.

Bug: 19950445
Bug: 19955011
Bug: 25623243
Bug: 26154469
Bug: 26154471
Bug: 33387871
Bug: 33560754
Bug: 33752592
Bug: 33754204
Test: Unit tests added, plus thorough manual testing
Change-Id: Iaccf776ce8d1d434ee8b1c534ff3659d80fdc338
/frameworks/minikin/libs/minikin/WordBreaker.cpp
32c12c0ac825453f4c3bba07718cfe684ec90bec 28-Dec-2016 Mark Salyzyn <salyzyn@google.com> resolve merge conflicts of 2377a00 to master

Test: build
Bug: 26552300
Bug: 31289077
Change-Id: I6181ae7e84f9bdcbed50841c70d07f6906a10eb7
555d84c6f98eafcbe677cdcb8e9605760acd8ce5 29-Sep-2016 Mark Salyzyn <salyzyn@google.com> minikin: Replace cutils/log.h with android/log.h or log/log.h

- replace cutils/log.h with android/log.h (main buffer logging)
- replace cutils/log.h with log.log.h (+SafetyNet logging)
- define LOG_TAG before use.

Test: compile
Bug: 26552300
Bug: 31289077
Change-Id: I7a4803dd66f31b7103e09e5ff5b8fa523fa0fd60
/frameworks/minikin/libs/minikin/WordBreaker.cpp
14e2d136aaef271ba131f917cf5f27baa31ae5ad 09-Jun-2016 Seigo Nonaka <nona@google.com> Always use minikin namespace.

Here is a new policy of the namespace of minikin.
- All components should be in minikin namespace.
- All tests are also in minikin namespace and no anonymous namespace.

Bug: 29233740
Change-Id: I71a8a35049bb8d624f7a78797231e90fed1e2b8c
/frameworks/minikin/libs/minikin/WordBreaker.cpp
74b56175e5d41c1c1dc992208842b5576973d452 26-May-2016 Roozbeh Pournader <roozbeh@google.com> Do not break after Myanmar viramas

This is to work around a bug in ICU's line breaker, which thinks
there is a valid line break between a Myanmar kinzi and a consonant.
See http://bugs.icu-project.org/trac/ticket/12561 for the ICU bug.

Bug: 28964845
Change-Id: I076ac15077e5627cbccf6732900bcc60d8596dda
/frameworks/minikin/libs/minikin/WordBreaker.cpp
77f488345316fba46c271fc04bea470819ae1712 19-Apr-2016 Seigo Nonaka <nona@google.com> Do not break before and after ZWJ.

The emoji list is generated from external/unicode/emoji-data.txt

Bug: 28248662
Change-Id: Ie49b3782505665d62c24371ca23d317ae5e9c5f7
/frameworks/minikin/libs/minikin/WordBreaker.cpp
d8917c69a9f7b7ca52f7ac850922dab4322113f5 16-Mar-2016 Roozbeh Pournader <roozbeh@google.com> Do not allow line breaks before currency symbols

Implement the change proposed in UTC document L2/16-043R
(http://www.unicode.org/L2/L2016/16043r-line-break-pr-po.txt) to make
sure we do not break between letters and currency symbols.

Bug: 24959657
Change-Id: Ia29d0e5625f84870bd910d0c6e19036d17206704
/frameworks/minikin/libs/minikin/WordBreaker.cpp
56840e8006ca2b822adb401fc8a65f3c075cde10 25-Feb-2016 Raph Levien <raph@google.com> Suppress line breaks in emoji + modifier

An emoji base with an emoji modifier renders as a single glyph and
thus should not be a line break. Current (Unicode 8) logic does
indicate a line break, so we override the results of the ICU line
break iterator. The code references a proposal to improve Unicode
behavior; when that is adopted and we upgrade ICU accordingly, the
special-case code should be deleted, but the tests can remain.

Bug: 27343378
Change-Id: I5de9c53e9a34c503816f9131e3d894e6f7a57d13
/frameworks/minikin/libs/minikin/WordBreaker.cpp
d3f45892c721fb1738bf02fe19a5143a320ca4bf 19-Feb-2016 Raph Levien <raph@google.com> Suppress linebreaks in emoji ZWJ sequences

Due to the way emoji ZWJ sequences are defined, the ICU line breaking
algorithm determines that there are valid line breaks inside the
sequence. This patch suppresses these line breaks.

This is an adaptation of I225ebebc0f4186e4b8f48fee399c4a62b3f0218a
into the nyc-dev branch.

Bug: 25433289
Change-Id: I84b50b1e6ef13d436965eab389659d02a30d100f
/frameworks/minikin/libs/minikin/WordBreaker.cpp
c88ef135fcc2661ec7addc171ebc60787df38aff 09-Sep-2015 Raph Levien <raph@google.com> Add penalty for breaks in URLs and email addresses

Recent changes have added special cases for line breaks within URLs
and email addresses. Such breaks are undesirable when they can be
avoided, but at other times are needed to avoid huge gaps, or indeed
to make the line fit at all.

This patch assigns a penalty for such breaks, equal to the hyphenation
penalty. The mechanism is currently very simple, but would be easy to
fine-tune based on more detailed information about break quality.

Bug: 20126487
Bug: 20566159
Change-Id: I0d3323897737a2850f1e734fa17b96b065eabd9c
/frameworks/minikin/libs/minikin/WordBreaker.cpp
6d15657e4a3826d4d47d5358f1dde211484527e9 09-Sep-2015 Raph Levien <raph@google.com> Add line breaks to email addresses and URLs

This change adds accceptable line breaks according to sections 7.42
(Dividing URLs and e-mail addresses) and 14.12 (URLs or DOIs and line
breaks) of the Chicago Manual of Style (16th ed.). In general, these
place breaks before punctuation symbols, and suppresses them after
hyphens.

Bug: 20126487
Bug: 20566159
Change-Id: I2d07d516b920a506a2f718c38fb435c5eb1ee1f8
/frameworks/minikin/libs/minikin/WordBreaker.cpp
9c4cc648abcae144f3b99d612e58ef01d5e52cce 09-Sep-2015 Raph Levien <raph@google.com> Special-case URLs and email addresses for line breaking

Detect URLs and email addresses, and suppress both line breaking and
hyphenation within them.

Bug: 20126487
Bug: 20566159

Change-Id: I43629347a063dcf579e355e5b678d7195f453ad9
/frameworks/minikin/libs/minikin/WordBreaker.cpp
57b6dae9894b9362ef04517ff477fd491f9d433b 05-Sep-2015 Raph Levien <raph@google.com> Refine hyphenation around punctuation

Implement a WordBreaker that defines our concept of valid word
boundaries, customizing the ICU behavior. Currently, we suppress line
breaks at soft hyphens (these are handled specially). Also, the
new WordBreaker class has methods that determine the start and end
of the word (punctuation stripped) for the purpose of hyphenation.

This patch, in its current form, doesn't handle email addresses and
URLs specially, but the WordBreaker class is the correct place to do
so. Also, special case handling of hyphens and dashes is still done
in LineBreaker, but all of that should be moved to WordBreaker.

Bug: 20126487
Bug: 20566159
Change-Id: I492cbad963f9b74a2915f010dad46bb91f97b2fe
/frameworks/minikin/libs/minikin/WordBreaker.cpp