• Home
  • History
  • Annotate
  • only in /external/icu/icu4j/tools/misc/src/com/ibm/icu/dev/tool/rbbi/
NameDateSize

..23-Aug-20164 KiB

BuildDictionaryFile.java23-Aug-201632.8 KiB

readme.html23-Aug-20162.8 KiB

readme.html

1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3<head>
4  <meta http-equiv="content-type"
5 content="text/html; charset=ISO-8859-1">
6  <title>README For RBBI Tools</title>
7<!-- Copyright (C) 2003-2004, International Business Machines Corporation and
8     others. All Rights Reserved.
9-->
10</head>
11<body>
12<h3>What Are These Tools?</h3>
13This directory contains two tools, WriteTablesToFiles, which converts
14the Java&nbsp; BreakIterators into .brk files for ICU4C, and
15BuildDictionaryFile, which builds the binary the Thai word break
16dictionary from a Unicode text file containing a list of Thai words.
17The rest of this document describes how to use these tools.<br>
18<h3>How To Build The ICU4C BreakIterator Files</h3>
19The RuleBasedBreakIterator code was originally developed for ICU4J, and
20then ported to ICU4C. For various reasons, the code which compiled the
21state tables from the rule text was hard to port. Instead the
22WriteTablesToFiles tool was wirtten to read in the Java data and write
23the .brk files which ICU4C reads. Later the RBBI code was re-written
24for ICU4C, including the ability to compile the state tables from rules
25stored in text files. This means that the WriteTablesToFiles tool is
26now obsolete.<br>
27<br>
28<h3>How To Build The Thai Word Break Dictionary</h3>
29The Thai word berak code was developed originally for ICU4J, and then
30ported to ICU4C - the dictionary builder tool was never ported, so you
31have to use the Java tool to build the dictionary file for ICU4C. On
32the other hand, all of the rest of the ICU locale data was developed
33originally for
34ICU4C, and a tool was written to covert the ICU4C locale data to Java
35resource bundles for use by ICU4J. Consequently, the process of
36building the Thai
37word break dictionary for ICU4C and
38ICU4J is a bit convoluted. Here are the steps:<br>
39<div style="margin-left: 40px;">
40<ol>
41  <li>Download and build both ICU4C and ICU4J on a <span
42 style="font-weight: bold;">Big Endian</span> machine.<br>
43  </li>
44  <li>Run the following command line to build the Thai dictionary file:<br>
45java -classpath $icu4j_root/classes
46com.ibm.icu.dev.tool.rbbi.BuildDictionaryFile
47$icu4j_root/src/com/ibm/icu/dev/data/thai6.ucs Unicode
48$icu_root/soruce/data/brkitr/thai_dict.brk</li>
49  <li>Rebuild the ICU4C resources.</li>
50  <li>Rebuild the ICU4J ICULocaleData.jar file. (See <a
51 href="/readme.html">the ICU4J readme file</a> for
52instructions)</li>
53  <li>Move ICULocaleData.jar from $icu_root/source/data/locales/java to
54$icu4j_root/src/com/ibm/icu/impl/data</li>
55  <li>Build ICU4J's _resources target to unjar the new files.<br>
56  </li>
57</ol>
58</div>
59In the above, $icu_root is the root of your ICU4C source tree, for
60example
61"~/dev/icu" and $icu4j_root is the root of your ICU4J source tree, for
62example "~/dev/icu4j".<br>
63<br>
64</body>
65</html>
66