hb-ot-shape-normalize.cc revision b00321ea78793d9b3592b5173a9800e6322424fe
1655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod/* 211138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod * Copyright © 2011,2012 Google, Inc. 3655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * 4655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * This is part of HarfBuzz, a text shaping library. 5655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * 6655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * Permission is hereby granted, without written agreement and without 7655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * license or royalty fees, to use, copy, modify, and distribute this 8655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * software and its documentation for any purpose, provided that the 9655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * above copyright notice and the following two paragraphs appear in 10655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * all copies of this software. 11655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * 12655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE TO ANY PARTY FOR 13655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES 14655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN 15655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * IF THE COPYRIGHT HOLDER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH 16655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * DAMAGE. 17655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * 18655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * THE COPYRIGHT HOLDER SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, 19655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 20655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS 21655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * ON AN "AS IS" BASIS, AND THE COPYRIGHT HOLDER HAS NO OBLIGATION TO 22655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 23655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * 24655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod * Google Author(s): Behdad Esfahbod 25655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod */ 26655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 2711138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod#include "hb-ot-shape-normalize-private.hh" 28655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod#include "hb-ot-shape-private.hh" 29655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 30655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 315d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod/* 325d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * HIGHLEVEL DESIGN: 335d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 345d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * This file exports one main function: _hb_ot_shape_normalize(). 355d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 365d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * This function closely reflects the Unicode Normalization Algorithm, 37c346671b6b9b05fa51b95c16212eb29ac69510faBehdad Esfahbod * yet it's different. 38c346671b6b9b05fa51b95c16212eb29ac69510faBehdad Esfahbod * 39c346671b6b9b05fa51b95c16212eb29ac69510faBehdad Esfahbod * Each shaper specifies whether it prefers decomposed (NFD) or composed (NFC). 40c346671b6b9b05fa51b95c16212eb29ac69510faBehdad Esfahbod * The logic however tries to use whatever the font can support. 415d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 425d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * In general what happens is that: each grapheme is decomposed in a chain 43947c9a778c0d4b428b58806f98c34ede59b7439cBehdad Esfahbod * of 1:2 decompositions, marks reordered, and then recomposed if desired, 445d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * so far it's like Unicode Normalization. However, the decomposition and 455d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * recomposition only happens if the font supports the resulting characters. 465d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 475d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * The goals are: 485d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 495d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * - Try to render all canonically equivalent strings similarly. To really 505d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * achieve this we have to always do the full decomposition and then 515d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * selectively recompose from there. It's kinda too expensive though, so 525d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * we skip some cases. For example, if composed is desired, we simply 535d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * don't touch 1-character clusters that are supported by the font, even 545d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * though their NFC may be different. 555d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 565d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * - When a font has a precomposed character for a sequence but the 'ccmp' 57947c9a778c0d4b428b58806f98c34ede59b7439cBehdad Esfahbod * feature in the font is not adequate, use the precomposed character 585d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * which typically has better mark positioning. 595d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 6055deff7595ef357d000fef83559c74c9f8acad00Behdad Esfahbod * - When a font does not support a combining mark, but supports it precomposed 61c346671b6b9b05fa51b95c16212eb29ac69510faBehdad Esfahbod * with previous base, use that. This needs the itemizer to have this 62e3b2e077f549b04779c08a9fedb1f35b9f11075cBehdad Esfahbod * knowledge too. We need to provide assistance to the itemizer. 6355deff7595ef357d000fef83559c74c9f8acad00Behdad Esfahbod * 645d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * - When a font does not support a character but supports its decomposition, 65378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod * well, use the decomposition (preferring the canonical decomposition, but 6684186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * falling back to the compatibility decomposition if necessary). The 6784186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * compatibility decomposition is really nice to have, for characters like 6884186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * ellipsis, or various-sized space characters. 695d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod * 7084186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * - The complex shapers can customize the compose and decompose functions to 7184186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * offload some of their requirements to the normalizer. For example, the 7284186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * Indic shaper may want to disallow recomposing of two matras. 7384186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * 7484186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * - We try compatibility decomposition if decomposing through canonical 7584186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * decomposition alone failed to find a sequence that the font supports. 7684186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * We don't try compatibility decomposition recursively during the canonical 7784186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * decomposition phase. This has minimal impact. There are only a handful 7884186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * of Greek letter that have canonical decompositions that include characters 7984186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * with compatibility decomposition. Those can be found using this command: 8084186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * 8184186a64004e5dcd2ce98b564d0e0a09aa5d68b2Behdad Esfahbod * egrep "`echo -n ';('; grep ';<' UnicodeData.txt | cut -d';' -f1 | tr '\n' '|'; echo ') '`" UnicodeData.txt 825d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod */ 835d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod 84428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbodstatic hb_bool_t 85428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahboddecompose_func (hb_unicode_funcs_t *unicode, 86428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t ab, 87428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t *a, 88428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t *b) 89428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod{ 900f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* XXX FIXME, move these to complex shapers and propagage to normalizer.*/ 910f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod switch (ab) { 920f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0AC9 : return false; 930f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 940f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0931 : return false; 950f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0B94 : return false; 960f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 970f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* These ones have Unicode decompositions, but we do it 980f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod * this way to be close to what Uniscribe does. */ 990f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0DDA : *a = 0x0DD9; *b= 0x0DDA; return true; 1000f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0DDC : *a = 0x0DD9; *b= 0x0DDC; return true; 1010f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0DDD : *a = 0x0DD9; *b= 0x0DDD; return true; 1020f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0DDE : *a = 0x0DD9; *b= 0x0DDE; return true; 1030f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 1040f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0F77 : *a = 0x0FB2; *b= 0x0F81; return true; 1050f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0F79 : *a = 0x0FB3; *b= 0x0F81; return true; 1060f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x17BE : *a = 0x17C1; *b= 0x17BE; return true; 1070f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x17BF : *a = 0x17C1; *b= 0x17BF; return true; 1080f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x17C0 : *a = 0x17C1; *b= 0x17C0; return true; 1090f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x17C4 : *a = 0x17C1; *b= 0x17C4; return true; 1100f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x17C5 : *a = 0x17C1; *b= 0x17C5; return true; 1110f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1925 : *a = 0x1920; *b= 0x1923; return true; 1120f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1926 : *a = 0x1920; *b= 0x1924; return true; 1130f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1B3C : *a = 0x1B42; *b= 0x1B3C; return true; 1140f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1112E : *a = 0x11127; *b= 0x11131; return true; 1150f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1112F : *a = 0x11127; *b= 0x11132; return true; 1160f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod#if 0 1170f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x0B57 : *a = 0xno decomp, -> RIGHT; return true; 1180f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x1C29 : *a = 0xno decomp, -> LEFT; return true; 1190f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0xA9C0 : *a = 0xno decomp, -> RIGHT; return true; 1200f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x111BF : *a = 0xno decomp, -> ABOVE; return true; 1210f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod#endif 1220f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 123428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod return unicode->decompose (ab, a, b); 124428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod} 125428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod 126428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbodstatic hb_bool_t 127428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbodcompose_func (hb_unicode_funcs_t *unicode, 128428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t a, 129428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t b, 130428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod hb_codepoint_t *ab) 131428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod{ 1320f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* XXX, this belongs to indic normalizer. */ 1330f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if ((FLAG (unicode->general_category (a)) & 1340f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod (FLAG (HB_UNICODE_GENERAL_CATEGORY_SPACING_MARK) | 1350f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod FLAG (HB_UNICODE_GENERAL_CATEGORY_ENCLOSING_MARK) | 1360f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod FLAG (HB_UNICODE_GENERAL_CATEGORY_NON_SPACING_MARK)))) 1370f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod return false; 1380f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* XXX, add composition-exclusion exceptions to Indic shaper. */ 1390f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x09AF && b == 0x09BC) { *ab = 0x09DF; return true; } 1400f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 1410f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* XXX, these belong to the hebew / default shaper. */ 1420f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod /* Hebrew presentation-form shaping. 1430f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod * https://bugzilla.mozilla.org/show_bug.cgi?id=728866 */ 1440f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod // Hebrew presentation forms with dagesh, for characters 0x05D0..0x05EA; 1450f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod // note that some letters do not have a dagesh presForm encoded 1460f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod static const hb_codepoint_t sDageshForms[0x05EA - 0x05D0 + 1] = { 1470f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB30, // ALEF 1480f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB31, // BET 1490f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB32, // GIMEL 1500f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB33, // DALET 1510f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB34, // HE 1520f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB35, // VAV 1530f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB36, // ZAYIN 1540f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0, // HET 1550f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB38, // TET 1560f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB39, // YOD 1570f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB3A, // FINAL KAF 1580f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB3B, // KAF 1590f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB3C, // LAMED 1600f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0, // FINAL MEM 1610f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB3E, // MEM 1620f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0, // FINAL NUN 1630f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB40, // NUN 1640f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB41, // SAMEKH 1650f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0, // AYIN 1660f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB43, // FINAL PE 1670f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB44, // PE 1680f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0, // FINAL TSADI 1690f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB46, // TSADI 1700f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB47, // QOF 1710f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB48, // RESH 1720f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB49, // SHIN 1730f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 0xFB4A // TAV 1740f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod }; 1750f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 1760f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod hb_bool_t found = unicode->compose (a, b, ab); 1770f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 1780f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (!found && (b & ~0x7F) == 0x0580) { 1790f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod // special-case Hebrew presentation forms that are excluded from 1800f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod // standard normalization, but wanted for old fonts 1810f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod switch (b) { 1820f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05B4: // HIRIQ 1830f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05D9) { // YOD 1840f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB1D; 1850f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 1860f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 1870f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 1880f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05B7: // patah 1890f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05F2) { // YIDDISH YOD YOD 1900f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB1F; 1910f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 1920f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } else if (a == 0x05D0) { // ALEF 1930f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2E; 1940f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 1950f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 1960f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 1970f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05B8: // QAMATS 1980f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05D0) { // ALEF 1990f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2F; 2000f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2010f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2020f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2030f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05B9: // HOLAM 2040f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05D5) { // VAV 2050f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB4B; 2060f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2070f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2080f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2090f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05BC: // DAGESH 2100f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a >= 0x05D0 && a <= 0x05EA) { 2110f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = sDageshForms[a - 0x05D0]; 2120f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = (*ab != 0); 2130f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } else if (a == 0xFB2A) { // SHIN WITH SHIN DOT 2140f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2C; 2150f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2160f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } else if (a == 0xFB2B) { // SHIN WITH SIN DOT 2170f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2D; 2180f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2190f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2200f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2210f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05BF: // RAFE 2220f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod switch (a) { 2230f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05D1: // BET 2240f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB4C; 2250f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2260f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2270f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05DB: // KAF 2280f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB4D; 2290f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2300f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2310f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05E4: // PE 2320f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB4E; 2330f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2340f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2350f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2360f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2370f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05C1: // SHIN DOT 2380f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05E9) { // SHIN 2390f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2A; 2400f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2410f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } else if (a == 0xFB49) { // SHIN WITH DAGESH 2420f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2C; 2430f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2440f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2450f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2460f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod case 0x05C2: // SIN DOT 2470f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod if (a == 0x05E9) { // SHIN 2480f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2B; 2490f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2500f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } else if (a == 0xFB49) { // SHIN WITH DAGESH 2510f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod *ab = 0xFB2D; 2520f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod found = true; 2530f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2540f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod break; 2550f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2560f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod } 2570f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod 2580f8881d6bbf6cd59938315eeff9b71cfc736aa4eBehdad Esfahbod return found; 259428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod} 260428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod 261b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod 262b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodstatic inline void 263b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodset_glyph (hb_glyph_info_t &info, hb_font_t *font) 264b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod{ 265b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod hb_font_get_glyph (font, info.codepoint, 0, &info.glyph_index()); 266b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod} 267b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod 2688d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbodstatic inline void 269b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodoutput_char (hb_buffer_t *buffer, hb_codepoint_t unichar, hb_codepoint_t glyph) 270c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbod{ 271b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->cur().glyph_index() = glyph; 2728d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod buffer->output_glyph (unichar); 27399c2695759a6af855d565f4994bbdf220570bb48Behdad Esfahbod _hb_glyph_info_set_unicode_props (&buffer->prev(), buffer->unicode); 274c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbod} 27545412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod 2768d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbodstatic inline void 277b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodnext_char (hb_buffer_t *buffer, hb_codepoint_t glyph) 2788d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod{ 279b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->cur().glyph_index() = glyph; 2808d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod buffer->next_glyph (); 2818d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod} 2828d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod 2838d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbodstatic inline void 2848d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbodskip_char (hb_buffer_t *buffer) 2858d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod{ 2868d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod buffer->skip_glyph (); 2878d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod} 2888d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod 2895c6f5982d78e2d7fadc2fbb8b4f3a4be9420c59aBehdad Esfahbodstatic bool 29011138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahboddecompose (hb_font_t *font, hb_buffer_t *buffer, 2914ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod bool shortest, 29245412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod hb_codepoint_t ab) 293655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod{ 294b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod hb_codepoint_t a, b, a_glyph, b_glyph; 29545412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod 296428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod if (!decompose_func (buffer->unicode, ab, &a, &b) || 297b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod (b && !font->get_glyph (b, 0, &b_glyph))) 2980594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod return false; 29945412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod 300b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod bool has_a = font->get_glyph (a, 0, &a_glyph); 3014ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod if (shortest && has_a) { 3024ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod /* Output a and b */ 303b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, a, a_glyph); 304dcdc51cdc0ba9d9fb75f84dd5fa7a49aa0b24ea0Behdad Esfahbod if (b) 305b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, b, b_glyph); 3060594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod return true; 3074ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod } 308655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 30911138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod if (decompose (font, buffer, shortest, a)) { 310dcdc51cdc0ba9d9fb75f84dd5fa7a49aa0b24ea0Behdad Esfahbod if (b) 311b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, b, b_glyph); 3120594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod return true; 3134ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod } 31445412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod 3154ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod if (has_a) { 316b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, a, a_glyph); 317dcdc51cdc0ba9d9fb75f84dd5fa7a49aa0b24ea0Behdad Esfahbod if (b) 318b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, b, b_glyph); 3190594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod return true; 32045412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod } 32145412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbod 3220594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod return false; 3235c6f5982d78e2d7fadc2fbb8b4f3a4be9420c59aBehdad Esfahbod} 3245c6f5982d78e2d7fadc2fbb8b4f3a4be9420c59aBehdad Esfahbod 325378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbodstatic bool 326378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahboddecompose_compatibility (hb_font_t *font, hb_buffer_t *buffer, 327378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod hb_codepoint_t u) 328d6b9c6d20041b4f4fa11befc179aee757c41904dBehdad Esfahbod{ 329378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod unsigned int len, i; 330378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod hb_codepoint_t decomposed[HB_UNICODE_MAX_DECOMPOSITION_LEN]; 331b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod hb_codepoint_t glyphs[HB_UNICODE_MAX_DECOMPOSITION_LEN]; 332378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod 333208f70f0553d73d2908b21b9552298029482a8b9Behdad Esfahbod len = buffer->unicode->decompose_compatibility (u, decomposed); 334378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod if (!len) 335378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod return false; 336378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod 337378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod for (i = 0; i < len; i++) 338b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod if (!font->get_glyph (decomposed[i], 0, &glyphs[i])) 339378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod return false; 340378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod 341378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod for (i = 0; i < len; i++) 342b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod output_char (buffer, decomposed[i], glyphs[i]); 343378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod 344378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod return true; 345d6b9c6d20041b4f4fa11befc179aee757c41904dBehdad Esfahbod} 346d6b9c6d20041b4f4fa11befc179aee757c41904dBehdad Esfahbod 347c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbodstatic void 348378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahboddecompose_current_character (hb_font_t *font, hb_buffer_t *buffer, 349378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod bool shortest) 3505c6f5982d78e2d7fadc2fbb8b4f3a4be9420c59aBehdad Esfahbod{ 3514ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod hb_codepoint_t glyph; 3524ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 353378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod /* Kind of a cute waterfall here... */ 3548fbfda920e0b3bb4ab7afb732826026964b79be9Behdad Esfahbod if (shortest && font->get_glyph (buffer->cur().codepoint, 0, &glyph)) 355b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod next_char (buffer, glyph); 356378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod else if (decompose (font, buffer, shortest, buffer->cur().codepoint)) 3578d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod skip_char (buffer); 3588fbfda920e0b3bb4ab7afb732826026964b79be9Behdad Esfahbod else if (!shortest && font->get_glyph (buffer->cur().codepoint, 0, &glyph)) 359b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod next_char (buffer, glyph); 360378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod else if (decompose_compatibility (font, buffer, buffer->cur().codepoint)) 3618d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod skip_char (buffer); 362b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod else { 363b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod /* A glyph-not-found case... */ 364b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod font->get_glyph (buffer->cur().codepoint, 0, &glyph); 365b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod next_char (buffer, glyph); 366b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod } 367b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod} 368b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod 369b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodstatic inline void 370b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbodhandle_variation_selector_cluster (hb_font_t *font, hb_buffer_t *buffer, 371b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod unsigned int end) 372b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod{ 373b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod for (; buffer->idx < end - 1;) { 374b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod if (unlikely (buffer->unicode->is_variation_selector (buffer->cur(+1).codepoint))) { 375b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod /* The next two lines are some ugly lines... But work. */ 376b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod font->get_glyph (buffer->cur().codepoint, buffer->cur(+1).codepoint, &buffer->cur().glyph_index()); 377b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->replace_glyphs (2, 1, &buffer->cur().codepoint); 378b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod } else { 379b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod set_glyph (buffer->cur(), font); 380b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->next_glyph (); 381b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod } 382b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod } 383b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod if (likely (buffer->idx < end)) { 384b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod set_glyph (buffer->cur(), font); 385b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->next_glyph (); 386b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod } 387655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod} 388655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 389c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbodstatic void 39011138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahboddecompose_multi_char_cluster (hb_font_t *font, hb_buffer_t *buffer, 3914ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod unsigned int end) 392655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod{ 3935d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod /* TODO Currently if there's a variation-selector we give-up, it's just too hard. */ 39411138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod for (unsigned int i = buffer->idx; i < end; i++) 395208f70f0553d73d2908b21b9552298029482a8b9Behdad Esfahbod if (unlikely (buffer->unicode->is_variation_selector (buffer->info[i].codepoint))) { 396b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod handle_variation_selector_cluster (font, buffer, end); 397c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbod return; 398af913c5788e600e36d29f44fe4e77db84cf8c442Behdad Esfahbod } 399d6b9c6d20041b4f4fa11befc179aee757c41904dBehdad Esfahbod 40011138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod while (buffer->idx < end) 401378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod decompose_current_character (font, buffer, false); 402655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod} 403655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod 40445d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbodstatic int 40545d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbodcompare_combining_class (const hb_glyph_info_t *pa, const hb_glyph_info_t *pb) 40645d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod{ 407d1deaa2f5bd028e8076265cba92cffa4fa2834acBehdad Esfahbod unsigned int a = _hb_glyph_info_get_modified_combining_class (pa); 408d1deaa2f5bd028e8076265cba92cffa4fa2834acBehdad Esfahbod unsigned int b = _hb_glyph_info_get_modified_combining_class (pb); 40945d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod 41045d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod return a < b ? -1 : a == b ? 0 : +1; 41145d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod} 41245d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod 41345412523dc295cb5ee12e096bfacb282cc925843Behdad Esfahbodvoid 41411138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod_hb_ot_shape_normalize (hb_font_t *font, hb_buffer_t *buffer, 41511138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod hb_ot_shape_normalization_mode_t mode) 416655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod{ 41711138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod bool recompose = mode != HB_OT_SHAPE_NORMALIZATION_MODE_DECOMPOSED; 4180594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod bool has_multichar_clusters = false; 41934c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod unsigned int count; 4205d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod 421c311d852080b50ffc85e80168de62abb05a6be59Behdad Esfahbod /* We do a fairly straightforward yet custom normalization process in three 4225389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod * separate rounds: decompose, reorder, recompose (if desired). Currently 4235389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod * this makes two buffer swaps. We can make it faster by moving the last 4245389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod * two rounds into the inner loop for the first round, but it's more readable 4255389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod * this way. */ 4265d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod 42734c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 4284ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod /* First round, decompose */ 4294ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 4305389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod buffer->clear_output (); 43134c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod count = buffer->len; 432468e9cb25c9bc14781b7013e447d763f93bf76a3Behdad Esfahbod for (buffer->idx = 0; buffer->idx < count;) 4335d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod { 434655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod unsigned int end; 435468e9cb25c9bc14781b7013e447d763f93bf76a3Behdad Esfahbod for (end = buffer->idx + 1; end < count; end++) 43699c2695759a6af855d565f4994bbdf220570bb48Behdad Esfahbod if (buffer->cur().cluster != buffer->info[end].cluster) 437655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod break; 4385d90a342e319068716429bf7af76c3896b61a0e5Behdad Esfahbod 439468e9cb25c9bc14781b7013e447d763f93bf76a3Behdad Esfahbod if (buffer->idx + 1 == end) 440378d279bbf692195c4654e312dae854ab3be04cfBehdad Esfahbod decompose_current_character (font, buffer, recompose); 4414ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod else { 44211138ccff71f442da1fcf64faa0e1d22e083e775Behdad Esfahbod decompose_multi_char_cluster (font, buffer, end); 4430594a2448440208efa0acac9a5d8d52d43108289Behdad Esfahbod has_multichar_clusters = true; 4444ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod } 445655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod } 446468e9cb25c9bc14781b7013e447d763f93bf76a3Behdad Esfahbod buffer->swap_buffers (); 4474ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 44834c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 449968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod if (mode != HB_OT_SHAPE_NORMALIZATION_MODE_COMPOSED_FULL && !has_multichar_clusters) 4504ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod return; /* Done! */ 4514ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 45234c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 4534ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod /* Second round, reorder (inplace) */ 4544ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 45534c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod count = buffer->len; 45634c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod for (unsigned int i = 0; i < count; i++) 45734c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod { 458d1deaa2f5bd028e8076265cba92cffa4fa2834acBehdad Esfahbod if (_hb_glyph_info_get_modified_combining_class (&buffer->info[i]) == 0) 45934c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod continue; 46034c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 46134c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod unsigned int end; 46234c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod for (end = i + 1; end < count; end++) 463d1deaa2f5bd028e8076265cba92cffa4fa2834acBehdad Esfahbod if (_hb_glyph_info_get_modified_combining_class (&buffer->info[end]) == 0) 46434c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod break; 46534c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 46634c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod /* We are going to do a bubble-sort. Only do this if the 46734c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod * sequence is short. Doing it on long sequences can result 46834c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod * in an O(n^2) DoS. */ 46934c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod if (end - i > 10) { 47034c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod i = end; 47134c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod continue; 47234c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod } 47334c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 47445d6f29f15f1d2323bcaa2498aed23ff0c8a1567Behdad Esfahbod hb_bubble_sort (buffer->info + i, end - i, compare_combining_class); 47534c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 47634c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod i = end; 47734c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod } 47834c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 4794ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 4805389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod if (!recompose) 4815389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod return; 4825389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod 4834ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod /* Third round, recompose */ 48434c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 4855389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod /* As noted in the comment earlier, we don't try to combine 4865389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod * ccc=0 chars with their previous Starter. */ 4874ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod 4885389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod buffer->clear_output (); 4895389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod count = buffer->len; 4905389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod unsigned int starter = 0; 491b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->next_glyph (); 4925389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod while (buffer->idx < count) 4935389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod { 4945389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod hb_codepoint_t composed, glyph; 495968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod if (/* If mode is NOT COMPOSED_FULL (ie. it's COMPOSED_DIACRITICS), we don't try to 496968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod * compose a CCC=0 character with it's preceding starter. */ 497968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod (mode == HB_OT_SHAPE_NORMALIZATION_MODE_COMPOSED_FULL || 49899c2695759a6af855d565f4994bbdf220570bb48Behdad Esfahbod _hb_glyph_info_get_modified_combining_class (&buffer->cur()) != 0) && 499968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod /* If there's anything between the starter and this char, they should have CCC 500968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod * smaller than this character's. */ 501968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod (starter == buffer->out_len - 1 || 50299c2695759a6af855d565f4994bbdf220570bb48Behdad Esfahbod _hb_glyph_info_get_modified_combining_class (&buffer->prev()) < _hb_glyph_info_get_modified_combining_class (&buffer->cur())) && 503968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod /* And compose. */ 504428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod compose_func (buffer->unicode, 505428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod buffer->out_info[starter].codepoint, 506428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod buffer->cur().codepoint, 507428dfcab6634ff264570a0a5d715efb8048c3db5Behdad Esfahbod &composed) && 508968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod /* And the font has glyph for the composite. */ 5098fbfda920e0b3bb4ab7afb732826026964b79be9Behdad Esfahbod font->get_glyph (composed, 0, &glyph)) 5105389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod { 511bc8357ea7b4c0d7c715aae353176434fb9460205Behdad Esfahbod /* Composes. */ 512b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->next_glyph (); /* Copy to out-buffer. */ 513bc8357ea7b4c0d7c715aae353176434fb9460205Behdad Esfahbod if (unlikely (buffer->in_error)) 514bc8357ea7b4c0d7c715aae353176434fb9460205Behdad Esfahbod return; 515bc8357ea7b4c0d7c715aae353176434fb9460205Behdad Esfahbod buffer->merge_out_clusters (starter, buffer->out_len); 5168d1eef3f32fb539de2a72804fa3834acc18daab5Behdad Esfahbod buffer->out_len--; /* Remove the second composable. */ 517bc8357ea7b4c0d7c715aae353176434fb9460205Behdad Esfahbod buffer->out_info[starter].codepoint = composed; /* Modify starter and carry on. */ 518b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod set_glyph (buffer->out_info[starter], font); 519d1deaa2f5bd028e8076265cba92cffa4fa2834acBehdad Esfahbod _hb_glyph_info_set_unicode_props (&buffer->out_info[starter], buffer->unicode); 520e02d9257863b49e33ab5942971266349d3c548f6Behdad Esfahbod 5215389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod continue; 5225389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod } 5235389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod 524e02d9257863b49e33ab5942971266349d3c548f6Behdad Esfahbod /* Blocked, or doesn't compose. */ 525b00321ea78793d9b3592b5173a9800e6322424feBehdad Esfahbod buffer->next_glyph (); 526968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod 52799c2695759a6af855d565f4994bbdf220570bb48Behdad Esfahbod if (_hb_glyph_info_get_modified_combining_class (&buffer->prev()) == 0) 528968318455304804dc53045e8ba0cd4d76800c02dBehdad Esfahbod starter = buffer->out_len - 1; 5294ff0d2d9dfc4f7e4880a4e964ca9872624508ea0Behdad Esfahbod } 5305389ff4dbc46c76c9483e3c95f22524b60e21166Behdad Esfahbod buffer->swap_buffers (); 53134c22f816808d061a980cffca12de03beb437fa0Behdad Esfahbod 532655586fe5e1fadf2a2ef7826e61ee9a445ffa37aBehdad Esfahbod} 533