1# ******************************************************************************
2# *
3# *   Copyright (C) 1995-2013, International Business Machines
4# *   Corporation and others.  All Rights Reserved.
5# *
6# ******************************************************************************
7
8# If this converter alias table looks very confusing, a much easier to
9# understand view can be found at this demo:
10# http://demo.icu-project.org/icu-bin/convexp
11
12# IMPORTANT NOTE
13#
14# This file is not read directly by ICU. If you change it, you need to
15# run gencnval, and eventually run pkgdata to update the representation that
16# ICU uses for aliases. The gencnval tool will normally compile this file into
17# cnvalias.icu. The gencnval -v verbose option will help you when you edit
18# this file.
19
20# Please be friendly to the rest of us that edit this table by
21# keeping this table free of tabs.
22
23# This is an alias file used by the character set converter.
24# A lot of converter information can be found in unicode/ucnv.h, but here
25# is more information about this file.
26# 
27# If you are adding a new converter to this list and want to include it in the
28# icu data library, please be sure to add an entry to the appropriate ucm*.mk file
29# (see ucmfiles.mk for more information).
30# 
31# Here is the file format using BNF-like syntax:
32#
33# converterTable ::= tags { converterLine* }
34# converterLine ::= converterName [ tags ] { taggedAlias* }'\n'
35# taggedAlias ::= alias [ tags ]
36# tags ::= '{' { tag+ } '}'
37# tag ::= standard['*']
38# converterName ::= [0-9a-zA-Z:_'-']+
39# alias ::= converterName
40#
41# Except for the converter name, aliases are case insensitive.
42# Names are separated by whitespace.
43# Line continuation and comment sytax are similar to the GNU make syntax.
44# Any lines beginning with whitespace (e.g. U+0020 SPACE or U+0009 HORIZONTAL
45# TABULATION) are presumed to be a continuation of the previous line.
46# The # symbol starts a comment and the comment continues till the end of
47# the line.
48#
49# The converter
50#
51# All names can be tagged by including a space-separated list of tags in
52# curly braces, as in ISO_8859-1:1987{IANA*} iso-8859-1 { MIME* } or
53# some-charset{MIME* IANA*}. The order of tags does not matter, and
54# whitespace is allowed between the tagged name and the tags list.
55#
56# The tags can be used to get standard names using ucnv_getStandardName().
57#
58# The complete list of recognized tags used in this file is defined in
59# the affinity list near the beginning of the file.
60#
61# The * after the standard tag denotes that the previous alias is the
62# preferred (default) charset name for that standard. There can only
63# be one of these default charset names per converter.
64
65
66
67# The world is getting more complicated...
68# Supporting XML parsers, HTML, MIME, and similar applications
69# that mark encodings with a charset name can be difficult.
70# Many of these applications and operating systems will update
71# their codepages over time.
72
73# It means that a new codepage, one that differs from an
74# old one by changing a code point, e.g., to the Euro sign,
75# must not get an old alias, because it would mean that
76# old files with this alias would be interpreted differently.
77
78# If an codepage gets updated by assigning characters to previously
79# unassigned code points, then a new name is not necessary.
80# Also, some codepages map unassigned codepage byte values
81# to the same numbers in Unicode for roundtripping. It may be
82# industry practice to keep the encoding name in such a case, too
83# (example: Windows codepages).
84
85# The aliases listed in the list of character sets
86# that is maintained by the IANA (http://www.iana.org/) must
87# not be changed to mean encodings different from what this
88# list shows. Currently, the IANA list is at
89# http://www.iana.org/assignments/character-sets
90# It should also be mentioned that the exact mapping table used for each
91# IANA names usually isn't specified. This means that some other applications
92# and operating systems are left to interpret the exact mappings for the
93# underspecified aliases. For instance, Shift-JIS on a Solaris platform
94# may be different from Shift-JIS on a Windows platform. This is why
95# some of the aliases can be tagged to differentiate different mapping
96# tables with the same alias. If an alias is given to more than one converter,
97# it is considered to be an ambiguous alias, and the affinity list will
98# choose the converter to use when a standard isn't specified with the alias.
99
100# Name matching is case-insensitive. Also, dashes '-', underscores '_'
101# and spaces ' ' are ignored in names (thus cs-iso_latin-1, csisolatin1
102# and "cs iso latin 1" are the same).
103# However, the names in the left column are directly file names
104# or names of algorithmic converters, and their case must not
105# be changed - or else code and/or file names must also be changed.
106# For example, the converter ibm-921 is expected to be the file ibm-921.cnv.
107
108
109
110# The immediately following list is the affinity list of supported standard tags.
111# When multiple converters have the same alias under different standards,
112# the standard nearest to the top of this list with that alias will
113# be the first converter that will be opened. The ordering of the aliases
114# after this affinity list does not affect the preferred alias, but it may
115# affect the order of the returned list of aliases for a given converter.
116#
117# The general ordering is from specific and frequently used to more general
118# or rarely used at the bottom.
119{   UTR22           # Name format specified by http://www.unicode.org/unicode/reports/tr22/
120    # ICU             # Can also use ICU_FEATURE
121    IBM             # The IBM CCSID number is specified by ibm-*
122    WINDOWS         # The Microsoft code page identifier number is specified by windows-*. The rest are recognized IE names.
123    JAVA            # Source: Sun JDK. Alias name case is ignored, but dashes are not ignored.
124    # GLIBC
125    # AIX
126    # DB2
127    # SOLARIS
128    # APPLE
129    # HPUX
130    IANA            # Source: http://www.iana.org/assignments/character-sets
131    MIME            # Source: http://www.iana.org/assignments/character-sets
132    # MSIE            # MSIE is Internet Explorer, which can be different from Windows (From the IMultiLanguage COM interface)
133    # ZOS_USS         # z/OS (os/390) Unix System Services (USS), which has NL<->LF swapping. They have the same format as the IBM tag.
134    }
135
136
137
138# Fully algorithmic converters
139
140UTF-8 { IANA* MIME* JAVA* WINDOWS }
141                                ibm-1208 { IBM* } # UTF-8 with IBM PUA
142                                ibm-1209 { IBM }  # UTF-8
143                                ibm-5304 { IBM }  # Unicode 2.0, UTF-8 with IBM PUA
144                                ibm-5305 { IBM }  # Unicode 2.0, UTF-8
145                                ibm-13496 { IBM } # Unicode 3.0, UTF-8 with IBM PUA
146                                ibm-13497 { IBM } # Unicode 3.0, UTF-8
147                                ibm-17592 { IBM } # Unicode 4.0, UTF-8 with IBM PUA
148                                ibm-17593 { IBM } # Unicode 4.0, UTF-8
149                                windows-65001 { WINDOWS* }
150                                cp1208
151                                x-UTF_8J
152                                unicode-1-1-utf-8
153                                unicode-2-0-utf-8
154
155# The ICU 2.2 UTF-16/32 converters detect and write a BOM.
156UTF-16 { IANA* MIME* JAVA* }    ISO-10646-UCS-2 { IANA }
157                                ibm-1204 { IBM* } # UTF-16 with IBM PUA and BOM sensitive
158                                ibm-1205 { IBM }  # UTF-16 BOM sensitive
159                                unicode
160                                csUnicode
161                                ucs-2
162# The following Unicode CCSIDs (IBM) are not valid in ICU because they are
163# considered pure DBCS (exactly 2 bytes) of Unicode,
164# and they are a subset of Unicode. ICU does not support their encoding structures.
165# 1400 1401 1402 1410 1414 1415 1446 1447 1448 1449 64770 64771 65520 5496 5497 5498 9592 13688
166UTF-16BE { IANA* MIME* JAVA* }  x-utf-16be { JAVA }
167                                UnicodeBigUnmarked { JAVA } # java.io name
168                                ibm-1200 { IBM* } # UTF-16 BE with IBM PUA
169                                ibm-1201 { IBM }  # UTF-16 BE
170                                ibm-13488 { IBM } # Unicode 2.0, UTF-16 BE with IBM PUA
171                                ibm-13489 { IBM } # Unicode 2.0, UTF-16 BE
172                                ibm-17584 { IBM } # Unicode 3.0, UTF-16 BE with IBM PUA
173                                ibm-17585 { IBM } # Unicode 3.0, UTF-16 BE
174                                ibm-21680 { IBM } # Unicode 4.0, UTF-16 BE with IBM PUA
175                                ibm-21681 { IBM } # Unicode 4.0, UTF-16 BE
176                                ibm-25776 { IBM } # Unicode 4.1, UTF-16 BE with IBM PUA
177                                ibm-25777 { IBM } # Unicode 4.1, UTF-16 BE
178                                ibm-29872 { IBM } # Unicode 5.0, UTF-16 BE with IBM PUA
179                                ibm-29873 { IBM } # Unicode 5.0, UTF-16 BE
180                                ibm-61955 { IBM } # UTF-16BE with Gaidai University (Japan) PUA
181                                ibm-61956 { IBM } # UTF-16BE with Microsoft HKSCS-Big 5 PUA
182                                windows-1201 { WINDOWS* }
183                                cp1200
184                                cp1201
185                                UTF16_BigEndian
186                                # ibm-5297 { IBM }  # Unicode 2.0, UTF-16 (BE) (reserved, never used)
187                                # iso-10646-ucs-2 { JAVA } # This is ambiguous
188                                # ibm-61952 is not a valid CCSID because it's Unicode 1.1
189                                # ibm-61953 is not a valid CCSID because it's Unicode 1.0
190UTF-16LE { IANA* MIME* JAVA* }  x-utf-16le { JAVA }
191                                UnicodeLittleUnmarked { JAVA } # java.io name
192                                ibm-1202 { IBM* } # UTF-16 LE with IBM PUA
193                                ibm-1203 { IBM }  # UTF-16 LE
194                                ibm-13490 { IBM } # Unicode 2.0, UTF-16 LE with IBM PUA
195                                ibm-13491 { IBM } # Unicode 2.0, UTF-16 LE
196                                ibm-17586 { IBM } # Unicode 3.0, UTF-16 LE with IBM PUA
197                                ibm-17587 { IBM } # Unicode 3.0, UTF-16 LE
198                                ibm-21682 { IBM } # Unicode 4.0, UTF-16 LE with IBM PUA
199                                ibm-21683 { IBM } # Unicode 4.0, UTF-16 LE
200                                ibm-25778 { IBM } # Unicode 4.1, UTF-16 LE with IBM PUA
201                                ibm-25779 { IBM } # Unicode 4.1, UTF-16 LE
202                                ibm-29874 { IBM } # Unicode 5.0, UTF-16 LE with IBM PUA
203                                ibm-29875 { IBM } # Unicode 5.0, UTF-16 LE
204                                UTF16_LittleEndian
205                                windows-1200 { WINDOWS* }
206
207UTF-32 { IANA* MIME* }          ISO-10646-UCS-4 { IANA }
208                                ibm-1236 { IBM* } # UTF-32 with IBM PUA and BOM sensitive
209                                ibm-1237 { IBM }  # UTF-32 BOM sensitive
210                                csUCS4
211                                ucs-4
212UTF-32BE { IANA* }              UTF32_BigEndian
213                                ibm-1232 { IBM* } # UTF-32 BE with IBM PUA
214                                ibm-1233 { IBM }  # UTF-32 BE
215                                ibm-9424 { IBM }  # Unicode 4.1, UTF-32 BE with IBM PUA
216UTF-32LE { IANA* }              UTF32_LittleEndian
217                                ibm-1234 { IBM* } # UTF-32 LE, with IBM PUA
218                                ibm-1235 { IBM }  # UTF-32 LE
219
220# ICU-specific names for special uses
221UTF16_PlatformEndian
222UTF16_OppositeEndian
223
224UTF32_PlatformEndian
225UTF32_OppositeEndian
226
227
228# Java-specific, non-Unicode-standard UTF-16 variants.
229# These are in the Java "Basic Encoding Set (contained in lib/rt.jar)".
230# See the "Supported Encodings" at
231# http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
232# or a newer version of this document.
233#
234# Aliases marked with { JAVA* } are canonical names for java.io and java.lang APIs.
235# Aliases marked with { JAVA } are canonical names for the java.nio API.
236#
237# "BOM" means the Unicode Byte Order Mark, which is the encoding-scheme-specific
238# byte sequence for U+FEFF.
239# "Reverse BOM" means the BOM for the sibling encoding scheme with the
240# opposite endianness. (LE<->BE)
241
242# "Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order,
243# with byte-order mark"
244#
245# From Unicode: Writes BOM.
246# To Unicode: Detects and consumes BOM. 
247#   If there is a "reverse BOM", Java throws
248#   MalformedInputException: Incorrect byte-order mark.
249#   In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value
250#   and a UCNV_ILLEGAL UConverterCallbackReason.
251UTF-16BE,version=1		UnicodeBig { JAVA* }
252
253# "Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order,
254# with byte-order mark"
255#
256# From Unicode: Writes BOM.
257# To Unicode: Detects and consumes BOM. 
258#   If there is a "reverse BOM", Java throws
259#   MalformedInputException: Incorrect byte-order mark.
260#   In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value
261#   and a UCNV_ILLEGAL UConverterCallbackReason.
262UTF-16LE,version=1		UnicodeLittle { JAVA* }  x-UTF-16LE-BOM { JAVA }
263
264# This one is not mentioned on the "Supported Encodings" page
265# but is available in Java.
266# In Java, this is called "Unicode" but we cannot give it that alias
267# because the standard UTF-16 converter already has a "unicode" alias.
268#
269# From Unicode: Writes BOM.
270# To Unicode: Detects and consumes BOM.
271#   If there is no BOM, rather than defaulting to BE, Java throws
272#   MalformedInputException: Missing byte-order mark.
273#   In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value
274#   and a UCNV_ILLEGAL UConverterCallbackReason.
275UTF-16,version=1
276
277# This is the same as standard UTF-16 but always writes a big-endian byte stream,
278# regardless of the platform endianness, as expected by the Java compatibility tests.
279# See the java.nio.charset.Charset API documentation at
280# http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
281# or a newer version of this document.
282#
283# From Unicode: Write BE BOM and BE bytes
284# To Unicode: Detects and consumes BOM. Defaults to BE.
285UTF-16,version=2
286
287# Note: ICU does not currently support Java-specific, non-Unicode-standard UTF-32 variants.
288# Presumably, these behave analogously to the UTF-16 variants with similar names.
289# UTF_32BE_BOM  x-UTF-32BE-BOM
290# UTF_32LE_BOM  x-UTF-32LE-BOM
291
292# End of Java-specific, non-Unicode-standard UTF variants.
293
294
295# Chrome: Remove all the entries for UTF-7, SCSU, BOCU, CESU-8.
296
297# Standard iso-8859-1, which does not have the Euro update.
298# See iso-8859-15 (latin9) for the Euro update
299ISO-8859-1 { MIME* IANA JAVA* }
300    ibm-819 { IBM* JAVA }    # This is not truely ibm-819 because it's missing the fallbacks.
301    IBM819 { IANA }
302    cp819 { IANA JAVA }
303    latin1 { IANA JAVA }
304    8859_1 { JAVA }
305    csISOLatin1 { IANA JAVA }
306    iso-ir-100 { IANA JAVA }
307    ISO_8859-1:1987 { IANA* JAVA }
308    l1 { IANA JAVA }
309    819 { JAVA }
310    # windows-28591 { WINDOWS* } # This has odd behavior because it has the Euro update, which isn't correct.
311    # LATIN_1     # Old ICU name
312    # ANSI_X3.110-1983  # This is for a different IANA alias.  This isn't iso-8859-1.
313
314US-ASCII { MIME* IANA JAVA WINDOWS }
315    ASCII { JAVA* IANA WINDOWS }
316    ANSI_X3.4-1968 { IANA* WINDOWS }
317    ANSI_X3.4-1986 { IANA WINDOWS }
318    ISO_646.irv:1991 { IANA WINDOWS }
319    iso_646.irv:1983 { JAVA }
320    ISO646-US { JAVA IANA WINDOWS }
321    us { IANA }
322    csASCII { IANA WINDOWS }
323    iso-ir-6 { IANA }
324    cp367 { IANA WINDOWS }
325    ascii7 { JAVA }
326    646 { JAVA }
327    windows-20127 { WINDOWS* }
328    ibm-367 { IBM* } IBM367 { IANA WINDOWS } # This is not truely ibm-367 because it's missing the fallbacks.
329
330# GB 18030 is partly algorithmic, using the MBCS converter
331# Chrome: HTML5 GBK an alias for GB18030
332#         TODO(jshin): Decide if Chrome should follow spec. crbug.com/339862
333gb18030 { IANA* }       ibm-1392 { IBM* } windows-54936 { WINDOWS* } gb18030 { MIME* }
334
335# Table-based interchange codepages
336
337# Central Europe
338ibm-912_P100-1995 { UTR22* }
339                        ibm-912 { IBM* JAVA }
340                        ISO-8859-2 { MIME* IANA JAVA* WINDOWS }
341                        ISO_8859-2:1987 { IANA* WINDOWS JAVA }
342                        latin2 { IANA WINDOWS JAVA }
343                        csISOLatin2 { IANA WINDOWS JAVA }
344                        iso-ir-101 { IANA WINDOWS JAVA }
345                        l2 { IANA WINDOWS JAVA }
346                        8859_2 { JAVA }
347                        cp912 { JAVA }
348                        912 { JAVA }
349                        windows-28592 { WINDOWS* }
350
351# Maltese Esperanto
352ibm-913_P100-2000 { UTR22* }
353                        ibm-913 { IBM* JAVA }
354                        ISO-8859-3 { MIME* IANA WINDOWS JAVA* }
355                        ISO_8859-3:1988 { IANA* WINDOWS JAVA }
356                        latin3 { IANA JAVA WINDOWS }
357                        csISOLatin3 { IANA WINDOWS }
358                        iso-ir-109 { IANA WINDOWS JAVA }
359                        l3 { IANA WINDOWS JAVA }
360                        8859_3 { JAVA }
361                        cp913 { JAVA }
362                        913 { JAVA }
363                        windows-28593 { WINDOWS* }
364
365# Baltic
366ibm-914_P100-1995 { UTR22* }
367                        ibm-914 { IBM* JAVA }
368                        ISO-8859-4 { MIME* IANA WINDOWS JAVA* }
369                        latin4 { IANA WINDOWS JAVA }
370                        csISOLatin4 { IANA WINDOWS JAVA }
371                        iso-ir-110 { IANA WINDOWS JAVA }
372                        ISO_8859-4:1988 { IANA* WINDOWS JAVA }
373                        l4 { IANA WINDOWS JAVA }
374                        8859_4 { JAVA }
375                        cp914 { JAVA }
376                        914 { JAVA }
377                        windows-28594 { WINDOWS* }
378
379# Cyrillic
380ibm-915_P100-1995 { UTR22* }
381                        ibm-915 { IBM* JAVA }
382                        ISO-8859-5 { MIME* IANA WINDOWS JAVA* }
383                        cyrillic { IANA WINDOWS JAVA }
384                        csISOLatinCyrillic { IANA WINDOWS JAVA }
385                        iso-ir-144 { IANA WINDOWS JAVA }
386                        ISO_8859-5:1988 { IANA* WINDOWS JAVA }
387                        8859_5 { JAVA }
388                        cp915 { JAVA }
389                        915 { JAVA }
390                        windows-28595 { WINDOWS* }
391
392# Arabic
393# ISO_8859-6-E and ISO_8859-6-I are similar to this charset, but BiDi is done differently
394# From a narrow mapping point of view, there is no difference.
395# -E means explicit. -I means implicit.
396# -E requires the client to handle the ISO 6429 bidirectional controls
397ibm-1089_P100-1995 { UTR22* }
398                        ibm-1089 { IBM* JAVA }
399                        ISO-8859-6 { MIME* IANA WINDOWS JAVA* }
400                        arabic { IANA WINDOWS JAVA }
401                        csISOLatinArabic { IANA WINDOWS JAVA }
402                        iso-ir-127 { IANA WINDOWS JAVA }
403                        ISO_8859-6:1987 { IANA* WINDOWS JAVA }
404                        ECMA-114 { IANA JAVA }
405                        ASMO-708 { IANA JAVA }
406                        8859_6 { JAVA }
407                        cp1089 { JAVA }
408                        1089 { JAVA }
409                        windows-28596 { WINDOWS* }
410                        ISO-8859-6-I { IANA MIME } # IANA considers this alias different and BiDi needs to be applied.
411                        ISO-8859-6-E { IANA MIME } # IANA considers this alias different and BiDi needs to be applied.
412                        x-ISO-8859-6S { JAVA }
413
414# ISO Greek (with euro update). This is really ISO_8859-7:2003
415ibm-9005_X110-2007 { UTR22* }
416                        ibm-9005 { IBM* }
417                        ISO-8859-7 { MIME* IANA JAVA* WINDOWS }
418                        8859_7 { JAVA }
419                        greek { IANA JAVA WINDOWS }
420                        greek8 { IANA JAVA WINDOWS }
421                        ELOT_928 { IANA JAVA WINDOWS }
422                        ECMA-118 { IANA JAVA WINDOWS }
423                        csISOLatinGreek { IANA JAVA WINDOWS }
424                        iso-ir-126 { IANA JAVA WINDOWS }
425                        ISO_8859-7:1987 { IANA* JAVA WINDOWS }
426                        windows-28597 { WINDOWS* }
427                        sun_eu_greek # For Solaris
428
429# hebrew
430# ISO_8859-8-E and ISO_8859-8-I are similar to this charset, but BiDi is done differently
431# From a narrow mapping point of view, there is no difference.
432# -E means explicit. -I means implicit.
433# -E requires the client to handle the ISO 6429 bidirectional controls
434# This matches the official mapping on unicode.org
435ibm-5012_P100-1999 { UTR22* }
436                        ibm-5012 { IBM* }
437                        ISO-8859-8 { MIME* IANA WINDOWS JAVA* }
438                        hebrew { IANA WINDOWS JAVA }
439                        csISOLatinHebrew { IANA WINDOWS JAVA }
440                        iso-ir-138 { IANA WINDOWS JAVA }
441                        ISO_8859-8:1988 { IANA* WINDOWS JAVA }
442                        ISO-8859-8-I { IANA MIME } # IANA and Windows considers this alias different and BiDi needs to be applied.
443                        ISO-8859-8-E { IANA MIME } # IANA and Windows considers this alias different and BiDi needs to be applied.
444                        8859_8 { JAVA }
445                        windows-28598 { WINDOWS* } # Hebrew (ISO-Visual). A hybrid between ibm-5012 and ibm-916 with extra PUA mappings.
446                        hebrew8 # Reflect HP-UX code page update
447
448# Turkish
449# Chrome: ISO-8859-9 and its aliases are moved to windows-1254 per
450# HTML5. 
451ibm-920_P100-1995 { UTR22* }
452                        ibm-920 { IBM* JAVA }
453                        ISO-8859-9
454                        latin5
455                        csISOLatin5
456                        iso-ir-148 
457                        ISO_8859-9:1989
458                        l5
459                        cp920 { JAVA }
460                        920 { JAVA }
461                        windows-28599 { WINDOWS* }
462                        ECMA-128    # IANA doesn't have this alias 6/24/2002
463                        turkish8    # Reflect HP-UX codepage update 8/1/2008
464                        turkish     # Reflect HP-UX codepage update 8/1/2008
465
466# Nordic languages
467iso-8859_10-1998 { UTR22* } ISO-8859-10 { MIME* IANA* }
468                        iso-ir-157 { IANA }
469                        l6 { IANA }
470                        ISO_8859-10:1992 { IANA }
471                        csISOLatin6 { IANA }
472                        latin6 { IANA }
473
474# Thai
475# Be warned. There several iso-8859-11 codepage variants, and they are all incompatible.
476# ISO-8859-11 is a superset of TIS-620. The difference is that ISO-8859-11 contains the C1 control codes.
477iso-8859_11-2001 { UTR22* } ISO-8859-11
478                        thai8 # HP-UX alias. HP-UX says TIS-620, but it's closer to ISO-8859-11.
479                        x-iso-8859-11 { JAVA* }
480
481# iso-8859-13, PC Baltic (w/o euro update)
482ibm-921_P100-1995 { UTR22* }
483                        ibm-921 { IBM* }
484                        ISO-8859-13 { IANA* MIME* JAVA* }
485                        8859_13 { JAVA }
486                        windows-28603 { WINDOWS* }
487                        cp921
488                        921
489                        x-IBM921 { JAVA }
490
491# Celtic
492iso-8859_14-1998 { UTR22* } ISO-8859-14 { IANA* }
493                        iso-ir-199 { IANA }
494                        ISO_8859-14:1998 { IANA }
495                        latin8 { IANA }
496                        iso-celtic { IANA }
497                        l8 { IANA }
498
499# Latin 9
500ibm-923_P100-1998 { UTR22* }
501                        ibm-923 { IBM* JAVA }
502                        ISO-8859-15 { IANA* MIME* WINDOWS JAVA* }
503                        Latin-9 { IANA WINDOWS }
504                        l9 { WINDOWS }
505                        8859_15 { JAVA }
506                        latin0 { JAVA }
507                        csisolatin0 { JAVA }
508                        csisolatin9 { JAVA }
509                        iso8859_15_fdis { JAVA }
510                        cp923 { JAVA }
511                        923 { JAVA }
512                        windows-28605 { WINDOWS* }
513
514# CJK encodings
515
516# Chrome: Instead of ibm-943_P15A-2003, we use what's specified in the WHATWG
517# encoding standard (HTML5) for Shift_JIS. Keep all the aliases (even though
518not all of them not required by the encoding spec) for now.
519
520shift_jis-html5 { UTR22* }
521                        ibm-943 # Leave untagged because this isn't the default
522                        Shift_JIS { IANA* MIME* WINDOWS JAVA }
523                        MS_Kanji { IANA WINDOWS JAVA }
524                        csShiftJIS { IANA WINDOWS JAVA }
525                        windows-31j { IANA JAVA } # A further extension of Shift_JIS to include NEC special characters (Row 13)
526                        csWindows31J { IANA WINDOWS JAVA } # A further extension of Shift_JIS to include NEC special characters (Row 13)
527                        x-sjis { WINDOWS JAVA }
528                        x-ms-cp932 { WINDOWS }
529                        cp932 { WINDOWS }
530                        windows-932 { WINDOWS* }
531                        cp943c { JAVA* }    # This is slightly different, but the backslash mapping is the same.
532                        IBM-943C #{ AIX* } # Add this tag once AIX aliases becomes available
533                        ms932
534                        pck     # Probably SOLARIS
535                        sjis    # This might be for ibm-1351
536                        ibm-943_VSUB_VPUA
537                        x-MS932_0213 { JAVA }
538                        x-JISAutoDetect { JAVA }
539
540# Chrome: Instead of ibm-33722_P*, we use what's specified in the WHATWG
541# encoding standard (HTML5). All the
542# 3-byte seqeunces in the normative EUC-JP are now decode-only.
543euc-jp-html5 { UTR22* }
544                        EUC-JP { MIME* IANA JAVA* WINDOWS*}
545                        Extended_UNIX_Code_Packed_Format_for_Japanese { IANA* JAVA WINDOWS }
546                        csEUCPkdFmtJapanese { IANA JAVA WINDOWS }
547                        windows-51932 { WINDOWS }
548                        X-EUC-JP { MIME JAVA WINDOWS }   # Japan EUC. x-euc-jp is a MIME name
549                        eucjis {JAVA}
550                        ujis # Linux sometimes uses this name. This is an unfortunate generic and rarely used name. Its use is discouraged.
551
552
553windows-950-2000 { UTR22* }
554                        Big5 { IANA* MIME* JAVA* WINDOWS }
555                        csBig5 { IANA WINDOWS }
556                        windows-950 { WINDOWS* }
557                        x-windows-950 { JAVA }
558                        x-big5
559                        ms950
560ibm-1375_P100-2007 { UTR22* }   # Big5-HKSCS-2004 with Unicode 3.1 mappings. This uses supplementary characters.
561                        ibm-1375 { IBM* }
562                        Big5-HKSCS { IANA* JAVA* }
563                        big5hk { JAVA }
564                        HKSCS-BIG5  # From http://www.openi18n.org/localenameguide/
565
566# Chrome: HTML5 has big5-hkscs as an alias for big5
567#         TODO(jshin): Decide if Chrome should follow spec. crbug.com/277040
568ibm-5471_P100-2006 { UTR22* }   # Big5-HKSCS-2001 with Unicode 3.0 mappings. This uses many PUA characters.
569                        ibm-5471 { IBM* }
570                        Big5-HKSCS
571                        MS950_HKSCS { JAVA* }
572                        hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
573                        big5-hkscs:unicode3.0
574                        x-MS950-HKSCS { JAVA }
575                        # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By default it's not.
576                        # windows-950_hkscs
577# GBK
578# Chrome: Added 4 GB2312 aliases and EUC-CN to Windows-936 to reflect the
579#         reality of the web (GB2312 is treated synonymously with its
580#         superset, Windows-936/GBK)
581#         All the aliases listed for this converter (windows-936-2000)
582#         are removed from the list of aliases for other simplified Chinese
583#         converters above.
584#         HTML5 makes GBK an alias for GB18030
585#         TODO(jshin): Decide if Chrome should follow spec. crbug.com/339862
586windows-936-2000 { UTR22* }
587                        GB2312 { IANA MIME }
588                        GBK { IANA* MIME* WINDOWS JAVA* }
589                        CP936 { IANA JAVA }
590                        MS936 { IANA }  # In JDK 1.5, this goes to x-mswin-936. This is an IANA name split.
591                        windows-936 { IANA WINDOWS* JAVA }
592                        chinese { IANA }
593                        iso-ir-58 { IANA }
594                        gb2312-1980
595                        EUC-CN
596                        csGB2312 { IANA }
597                        GB_2312-80 { IANA }
598
599
600# Chrome: ibm-5478 and ibm-949 are replaced by noop-gb2312_gl and windows-949
601# (ksc_5601), respectively, in ucnv2022.c
602
603# Korean EUC.
604
605# Chrome: Windows-949 is not EUC-KR, but a superset of EUC-KR with 8,822
606#         additional Hangul syllables. However, the reality of the web
607#         and HTML5 require that we treat EUC-KR a
608#         synonym of windows-949.
609#         All the aliases listed for this converter (windows-949-2000)
610#         are removed from the list of aliases for other Korean converters
611#         above.
612windows-949-2000 { UTR22* }
613                        windows-949 { JAVA* WINDOWS* }
614                        EUC-KR { IANA* MIME* WINDOWS }
615                        KS_C_5601-1987 { WINDOWS IANA }
616                        KS_C_5601-1989 { WINDOWS IANA }
617                        KSC_5601 { IANA WINDOWS } # Needed by iso-2022
618                        csKSC56011987 { WINDOWS }
619                        korean { IANA WINDOWS }
620                        iso-ir-149 { IANA WINDOWS }
621                        csEUCKR { IANA WINDOWS }
622
623#Chrome: TIS-620, ISO-8859-11 and Windows-874 are slightly different from
624# each other, but they're used as if they're identical on the web. This is
625# also per HTML5.
626windows-874-2000 { UTR22* }   # Thai (w/ euro update)
627                        TIS-620 { IANA* WINDOWS MIME* }
628                        windows-874 { JAVA* WINDOWS* MIME }
629                        MS874 { JAVA }
630                        x-windows-874 { JAVA }
631                        iso-8859-11 { IANA WINDOWS MIME } # iso-8859-11 is similar to TIS-620. ibm-13162 is a closer match.
632
633# Platform codepages
634# Chrome: only keep ibm-878 for KOI8-R, ibm-1168 for KOI8-RU and ibm-866
635ibm-878_P100-1996 { UTR22* }    ibm-878 { IBM* } KOI8-R { IANA* MIME* WINDOWS JAVA* } koi8 { WINDOWS JAVA } csKOI8R { IANA WINDOWS JAVA } windows-20866 { WINDOWS* } cp878   # Russian internet
636# Chrome: Use the table from the WHATWG encoding standard (HTML5).
637ibm-866_html5-2012 { UTR22* }    ibm-866 { IBM* } IBM866 { IANA* MIME* JAVA } cp866 { IANA MIME WINDOWS JAVA* } 866 { IANA JAVA } csIBM866 { IANA JAVA } # PC Russian (w/o euro update)
638ibm-1168_P100-2002 { UTR22* }   ibm-1168 { IBM* } KOI8-U { IANA* WINDOWS } windows-21866 { WINDOWS* } # Ukrainian KOI8. koi8-ru != KOI8-U and Microsoft is wrong for aliasing them as the same.
639
640# The cp aliases in this section aren't really windows aliases, but it was used by ICU for Windows.
641# cp is usually used to denote IBM in Java, and that is why we don't do that anymore.
642# The windows-* aliases mean windows codepages.
643ibm-5346_P100-1998 { UTR22* }   ibm-5346 { IBM* } windows-1250 { IANA* JAVA* WINDOWS* } cp1250 { WINDOWS JAVA } # Windows Latin2 (w/ euro update)
644ibm-5347_P100-1998 { UTR22* }   ibm-5347 { IBM* } windows-1251 { IANA* JAVA* WINDOWS* } cp1251 { WINDOWS JAVA } ANSI1251 # Windows Cyrillic (w/ euro update). ANSI1251 is from Solaris
645ibm-5348_P100-1997 { UTR22* }   ibm-5348 { IBM* } windows-1252 { IANA* JAVA* WINDOWS* } cp1252 { JAVA }         # Windows Latin1 (w/ euro update)
646ibm-5349_P100-1998 { UTR22* }   ibm-5349 { IBM* } windows-1253 { IANA* JAVA* WINDOWS* } cp1253 { JAVA }         # Windows Greek (w/ euro update)
647
648#CHROME : Make ISO-8859-9 an alias to windows-1254 per HTML5. Move
649#         other IANA aliases for ISO-8859-9 as well.
650ibm-5350_P100-1998 { UTR22* }   ibm-5350 { IBM* } windows-1254 { MIME* IANA* JAVA* WINDOWS* } cp1254 { JAVA }         # Windows Turkish (w/ euro update)
651                        ISO-8859-9 { MIME }
652                        latin5 { IANA }
653                        csISOLatin5 { IANA }
654                        iso-ir-148 { IANA }
655                        ISO_8859-9:1989 { IANA }
656                        l5 { IANA }
657                        8859_9 { JAVA }
658ibm-9447_P100-2002 { UTR22* }   ibm-9447 { IBM* } windows-1255 { IANA* JAVA* WINDOWS* } cp1255 { JAVA }         # Windows Hebrew (w/ euro update)
659ibm-9448_X100-2005 { UTR22* }   ibm-9448 { IBM* } windows-1256 { IANA* JAVA* WINDOWS* } cp1256 { WINDOWS JAVA } x-windows-1256S { JAVA } # Windows Arabic (w/ euro update)
660ibm-9449_P100-2002 { UTR22* }   ibm-9449 { IBM* } windows-1257 { IANA* JAVA* WINDOWS* } cp1257 { JAVA }         # Windows Baltic (w/ euro update)
661ibm-5354_P100-1998 { UTR22* }   ibm-5354 { IBM* } windows-1258 { IANA* JAVA* WINDOWS* } cp1258 { JAVA }         # Windows Vietnamese (w/ euro update)
662
663# Chrome: Only MacRoman and MacCyrillic are necessary for HTML5.
664macos-0_2-10.2 { UTR22* }       macintosh { IANA* MIME* WINDOWS } mac { IANA } csMacintosh { IANA } windows-10000 { WINDOWS* } macroman { JAVA } x-macroman { JAVA* } # Apple latin 1
665macos-7_3-10.2 { UTR22* }       x-mac-cyrillic { MIME* WINDOWS } windows-10007 { WINDOWS* } mac-cyrillic maccy x-MacCyrillic { JAVA } x-MacUkraine { JAVA* } # Apple Cyrillic
666
667# Partially algorithmic converters
668
669# [U_ENABLE_GENERIC_ISO_2022]
670# The _generic_ ISO-2022 converter is disabled starting 2003-dec-03 (ICU 2.8).
671# For details see the icu mailing list from 2003-dec-01 and the ucnv2022.c file.
672# Language-specific variants of ISO-2022 continue to be available as listed below.
673# ISO_2022                         ISO-2022
674
675# Chrome: The encoding standard only supports ISO-2022-JP and HZ-GB. 
676# Keep ISO-2022-{KR,CN,CN-Ext} until we're sure what to do about
677# replacement encodings. See crbug.com/277037
678# TODO(jshin): Remove them when the bug is resolved.
679ISO_2022,locale=ja,version=0    ISO-2022-JP { IANA* MIME* JAVA* } csISO2022JP { IANA JAVA } x-windows-iso2022jp { JAVA } x-windows-50220 { JAVA }
680ISO_2022,locale=ko,version=0    ISO-2022-KR { IANA* MIME* JAVA* } csISO2022KR { IANA JAVA } # This uses ibm-949
681ISO_2022,locale=zh,version=0    ISO-2022-CN { IANA* JAVA* } csISO2022CN { JAVA } x-ISO-2022-CN-GB { JAVA }
682ISO_2022,locale=zh,version=1    ISO-2022-CN-EXT { IANA* }
683HZ                              HZ-GB-2312 { IANA* }
684
685# Chrome: HTML5 does not need ISCII.
686#         Remove all Lotus entries as well.
687
688# EBCDIC codepages according to the CDRA
689# Chrome: Removed all EBCDIC code pages.
690
691# These are not installed by default. They are rarely used.
692# Many of them can be added through the online ICU Data Library Customization tool
693# Chrome: Removed all these entries except for ISO-8859-16 required by HTML5.
694
695iso-8859_16-2001 { UTR22* }     ISO-8859-16 { IANA* } iso-ir-226 { IANA } ISO_8859-16:2001 { IANA } latin10 { IANA } l10 { IANA }
696
697