16f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgCopyright (c) 2002-2010, International Business Machines Corporation and others. All Rights Reserved.
26f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
36f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
46f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgIMPORTANT:
56f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
66f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgThis sample was originally intended as an exercise for the ICU Workshop (September 2000).
76f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgThe code currently provided in the solution file is the answer to the exercises, each step can still be found in the 'answers' subdirectory.
86f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
96f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
106f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
116f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  http://www.icu-project.org/docs/workshop_2000/agenda.html
126f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
136f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Day 2: September 12th 2000
146f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Pre-requisite:
156f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  1. All the hardware and software requirements from Day 1.
166f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  2. Attended or fully understand Day 1 material.
176f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  3. Read through the ICU user's guide at
186f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  http://www.icu-project.org/userguide/.
196f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
206f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  #Transformation Support
216f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  10:45am - 12:00pm
226f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Alan Liu
236f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
246f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Topics:
256f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  1. What is the Unicode normalization?
266f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  2. What kind of case mapping support is available in ICU?
276f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  3. What is Transliteration and how do I use a Transliterator on a document?
286f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  4. How do I add my own Transliterator?
296f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
306f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
316f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgINSTRUCTIONS
326f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org------------
336f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
346f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgThis exercise was developed and tested on ICU release 1.6.0, Win32,
356f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgMicrosoft Visual C++ 6.0.  It should work on other ICU releases and
366f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgother platforms as well.
376f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
386f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org MSVC:
396f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org   Open the file "translit.sln" in Microsoft Visual C++.
406f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
416f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org Unix:
426f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org   - Build and install ICU with a prefix, for example '--prefix=/home/srl/ICU'
436f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org   - Set the variable  ICU_PREFIX=/home/srl/ICU and use GNU make in 
446f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org        this directory.
456f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org   - You may use 'make check' to invoke this sample.
466f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
476f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
486f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgPROBLEMS
496f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org--------
506f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
516f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgProblem 0:
526f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
536f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  To start with, the program prints out a series of dates formatted in
546f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Greek.  Set up the program, build it, and run it.
556f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
566f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgProblem 1: Basic Transliterator (Easy)
576f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
586f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  The Greek text shows up almost entirely as Unicode escapes.  These
596f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  are unreadable on a US machine.  Use an existing system
606f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  transliterator to transliterate the Greek text to Latin so it can be
616f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  phonetically read on a US machine.  If you don't know the names of
626f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  the system transliterators, use Transliterator::getAvailableID() and
636f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Transliterator::countAvailableIDs(), or look directly in the index
646f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  table icu/data/translit_index.txt.
656f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
666f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgProblem 2: RuleBasedTransliterator (Medium)
676f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
686f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Some of the text is still unreadable and shows up as Unicode escape
696f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  sequences.  Create a RuleBasedTransliterator to change the
706f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  unreadable characters to close ASCII equivalents.  For example, the
716f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  rule "\u00C0 > A;" will change an 'A' with a grave accent to a plain
726f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  'A'.
736f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
746f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  To save typing, use UnicodeSets to handle ranges of characters.
756f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
766f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  See the included file "U0080.pdf" for a table of the U+00C0 to U+00FF
776f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Unicode block.
786f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
796f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgProblem 3: Transliterator subclassing; Normalizer (Difficult)
806f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
816f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  The rule-based approach is flexible and, in most cases, the best
826f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  choice for creating a new transliterator.  Sometimes, however, a
836f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  more elegant algorithmic solution is available.  Instead of typing
846f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  in a list of rules, you can write C++ code to accomplish the desired
856f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  transliteration.
866f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
876f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Use a Normalizer to remove accents from characters.  You will need
886f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  to convert each character to a sequence of base and combining
896f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  characters by applying a canonical denormalization transformation.
906f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Then discard the combining characters (the accents etc.) leaving the
916f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  base character.  Wrap this all up in a subclass of the
926f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  Transliterator class that overrides the pure virtual
936f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org  handleTransliterate() method.
946f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
956f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
966f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgANSWERS
976f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org-------
986f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
996f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgThe exercise includes answers.  These are in the "answers" directory,
1006f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.organd are numbered 1, 2, etc.  In some cases new files that the user
1016f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgneeds to create are included in the answers directory.
1026f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
1036f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgIf you get stuck and you want to move to the next step, copy the
1046f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.organswers file into the main directory in order to proceed.  E.g.,
1056f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org"main_1.cpp" contains the original "main.cpp" file.  "main_2.cpp"
1066f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgcontains the "main.cpp" file after problem 1.  Etc.
1076f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
1086f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.org
1096f31ac30b9092fd02a8c97e5216cf53f3e4fae4jshin@chromium.orgHave fun!
110