1== Notes on {kddi,docomo,softbank}-*.ucm mappings. 2 3kddi-jisx-208 is a variant of JIS X 208 used by KDDI, a Japanese cell 4phone carrier. 5 6kddi-shift_jis, docomo-shift_jis, and softbank-shift_jis are variants 7of Shift_JIS used by KDDI, DoCoMo and SoftBank. 8 9 - kddi-jisx-208 contains Emoji (emoticon) code points in 10 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, 11 where xx means 21-7E. 12 13 - kddi-shift_jis contains Emoji code points in 14 0xEBxx, 0xECxx, 0xEDxx, and 0xEExx, 0xF3xx, 0xF4xx, 0xF6xx, 0xF7xx, 15 where xx means 40-7E, 80-FC. 16 17 - docomo-shift_jis contains Emoji code points in 18 0xF8xx, and 0xF9xx, where xx means 40-7E, 80-FC. 19 20 - softbank-shift_jis contains Emoji code points in 21 0xF7xx, 0xF9xx, and 0xFBxx, where xx means 40-7E, 80-FC. 22 23 - softbank-jisx-208 contains Emoji code points in 24 0x75xx, 0x76xx, 0x77xx, 0x78xx, 0x79xx, 0x7Axx, 0x7Bxx, 0x7Dxx 25 where xx means 21-7E. 26 27 28== How the -2012.ucm tables were modified in April 2013 29 30The -2012 versions were created by 31 http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py 32 33using each of the older 2012 versions as the base table files 34to avoid non-Emoji changes: 35 36# gen_google_ucm.sh 37icu_mappings=/google/src/cloud/mscherer/icubranch/google_vendor_src_branch/icu/source/data/mappings 38dest=/home/mscherer/www/no_crawl/emoji 39./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2012.ucm 40cp ../generated/docomo-shift_jis-2012.ucm $dest 41./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2012.ucm 42cp ../generated/kddi-shift_jis-2012.ucm $dest 43./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2012.ucm 44cp ../generated/softbank-shift_jis-2012.ucm $dest 45./gen_conversion_files.py 46 47The only differences from 2012-sep are in mappings for symbols 48that have Unicode Variation Selector (VS) sequences. 49 50The older tables relied on a hack in the ICU conversion code that 51ignored the "use fallback" flag for fallbacks from sequences with VS. 52 53The new tables rely on a new feature in ICU4C 51: 54For the relevant symbols that have roundtrip mappings, 55- the mappings with Emoji Variation Selector 56 use the |0 roundtrip precision 57- the other mappings (no VS & text VS) 58 use the |4 "good one-way" precision 59 60See http://bugs.icu-project.org/trac/ticket/9602 61 62== How the -2012.ucm tables were created in September 2012 63 64The 2012 versions were created by 65 http://code.google.com/p/emoji4unicode/source/browse/trunk/src/gen_conversion_files.py 66 67using each of the 2007 versions as the base table files 68to avoid non-Emoji changes: 69 70icu_mappings=~/p4/emoji/google_vendor_src_branch/icu/source/data/mappings 71dest=~/www/no_crawl/emoji 72./gen_conversion_files.py $icu_mappings/docomo-shift_jis-2007.ucm 73cp ../generated/docomo-shift_jis-2012.ucm $dest 74./gen_conversion_files.py $icu_mappings/kddi-shift_jis-2007.ucm 75cp ../generated/kddi-shift_jis-2012.ucm $dest 76./gen_conversion_files.py $icu_mappings/softbank-shift_jis-2007.ucm 77cp ../generated/softbank-shift_jis-2012.ucm $dest 78./gen_conversion_files.py 79 80The emoji4unicode code uses the mappings that were established during the 81Unicode Emoji standardization process. 82The new conversion tables round-trip carrier Emoji symbol codes 83to and from Unicode 6 standard code points 84and also include fallback mappings from the Google PUA code points 85to the carrier codes. 86 87The trailing "|0" etc. on the mapping table lines specify the mapping type: 88 |0 round-trip Unicode <-> charset 89 |1 fallback Unicode -> charset 90 |3 "reverse fallback" Unicode <- charset 91 92For details about the .ucm file format see 93http://userguide.icu-project.org/conversion/data#TOC-.ucm-File-Format 94 95== How the -2007.ucm tables were created 96 97So far, we haven't obtained "official" conversion tables from the cell 98phone carriers. However, we empirically know their clients support 99VDCs in MS932, like U2460 (CIRCLED DIGIT ONE), etc. Hence we use 100MS932 as the base table for them. 101 102kddi-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. 103The original table's mappings to codes 0x75xx to 0x7Bxx are excluded 104to avoid collisions with emoji. 105 106kddi-shift_jis-2007.ucm is based on windows-932-2000.ucm. 107The original table's mappings to codes 0xEBxx to 0xEExx, and 0xF0xx to 1080xF90xx (EUDC block), are excluded to avoid collisions with emoji. 109 110docomo-shift_jis-2007.ucm is based on windows-932-2000.ucm. 111The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block) 112are excluded to avoid collisions with emoji. 113 114softbank-shift_jis-2007.ucm is based on windows-932-2000.ucm. 115The original table's mappings to codes 0xF0xx to 0xF90xx (EUDC block), 116and 0xFBxx, are excluded to avoid collisions with emoji. 117 118softbank-jisx-208-2007.ucm is based on jisx-208.ucm in this directory. 119The original table's mappings to codes 0x75xx to 0x7Bxx, and 0x7Dxx 120are excluded to avoid collisions with emoji. 121 122== Google Standard Emoji Unicode Mapping 123 124The Google standard emoji Unicode mapping can be found at: 125 126 /home/build/google3/i18n/encodings/emoji/emoji_unicode_mapping.txt 127 128 129 130TODO(mscherer): Use <icu:base> to share most standard JIS mappings 131among *-shift_jis-2007.ucm files. 132