• Home
  • History
  • Annotate
  • only in /external/chromium_org/third_party/sqlite/src/ext/icu/
NameDateSize

..05-Nov-20144 KiB

icu.c05-Nov-201414.3 KiB

README.txt05-Nov-20146.6 KiB

sqliteicu.h05-Nov-2014705

README.txt

1
2This directory contains source code for the SQLite "ICU" extension, an
3integration of the "International Components for Unicode" library with
4SQLite. Documentation follows.
5
6    1. Features
7    
8        1.1  SQL Scalars upper() and lower()
9        1.2  Unicode Aware LIKE Operator
10        1.3  ICU Collation Sequences
11        1.4  SQL REGEXP Operator
12    
13    2. Compilation and Usage
14    
15    3. Bugs, Problems and Security Issues
16    
17        3.1  The "case_sensitive_like" Pragma
18        3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
19        3.3  Collation Sequence Security Issue
20
21
221. FEATURES
23
24  1.1  SQL Scalars upper() and lower()
25
26    SQLite's built-in implementations of these two functions only 
27    provide case mapping for the 26 letters used in the English
28    language. The ICU based functions provided by this extension
29    provide case mapping, where defined, for the full range of 
30    unicode characters.
31
32    ICU provides two types of case mapping, "general" case mapping and
33    "language specific". Refer to ICU documentation for the differences
34    between the two. Specifically:
35
36       http://www.icu-project.org/userguide/caseMappings.html
37       http://www.icu-project.org/userguide/posix.html#case_mappings
38
39    To utilise "general" case mapping, the upper() or lower() scalar 
40    functions are invoked with one argument:
41
42        upper('ABC') -> 'abc'
43        lower('abc') -> 'ABC'
44
45    To access ICU "language specific" case mapping, upper() or lower()
46    should be invoked with two arguments. The second argument is the name
47    of the locale to use. Passing an empty string ("") or SQL NULL value
48    as the second argument is the same as invoking the 1 argument version
49    of upper() or lower():
50
51        lower('I', 'en_us') -> 'i'
52        lower('I', 'tr_tr') -> 'ı' (small dotless i)
53
54  1.2  Unicode Aware LIKE Operator
55
56    Similarly to the upper() and lower() functions, the built-in SQLite LIKE
57    operator understands case equivalence for the 26 letters of the English
58    language alphabet. The implementation of LIKE included in this
59    extension uses the ICU function u_foldCase() to provide case
60    independent comparisons for the full range of unicode characters.  
61
62    The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
63    dotless 'I' character used in the Turkish language is considered
64    to be in the same equivalence class as the dotted 'I' character
65    used by many languages (including English).
66
67  1.3  ICU Collation Sequences
68
69    A special SQL scalar function, icu_load_collation() is provided that 
70    may be used to register ICU collation sequences with SQLite. It
71    is always called with exactly two arguments, the ICU locale 
72    identifying the collation sequence to ICU, and the name of the
73    SQLite collation sequence to create. For example, to create an
74    SQLite collation sequence named "turkish" using Turkish language
75    sorting rules, the SQL statement:
76
77        SELECT icu_load_collation('tr_TR', 'turkish');
78
79    Or, for Australian English:
80
81        SELECT icu_load_collation('en_AU', 'australian');
82
83    The identifiers "turkish" and "australian" may then be used
84    as collation sequence identifiers in SQL statements:
85
86        CREATE TABLE aust_turkish_penpals(
87          australian_penpal_name TEXT COLLATE australian,
88          turkish_penpal_name    TEXT COLLATE turkish
89        );
90  
91  1.4 SQL REGEXP Operator
92
93    This extension provides an implementation of the SQL binary
94    comparision operator "REGEXP", based on the regular expression functions
95    provided by the ICU library. The syntax of the operator is as described
96    in SQLite documentation:
97
98        <string> REGEXP <re-pattern>
99
100    This extension uses the ICU defaults for regular expression matching
101    behaviour. Specifically, this means that:
102
103        * Matching is case-sensitive,
104        * Regular expression comments are not allowed within patterns, and
105        * The '^' and '$' characters match the beginning and end of the
106          <string> argument, not the beginning and end of lines within
107          the <string> argument.
108
109    Even more specifically, the value passed to the "flags" parameter
110    of ICU C function uregex_open() is 0.
111
112
1132  COMPILATION AND USAGE
114
115  The easiest way to compile and use the ICU extension is to build
116  and use it as a dynamically loadable SQLite extension. To do this
117  using gcc on *nix:
118
119    gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
120
121  You may need to add "-I" flags so that gcc can find sqlite3ext.h
122  and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
123  loaded into sqlite in the same way as any other dynamically loadable
124  extension.
125
126
1273 BUGS, PROBLEMS AND SECURITY ISSUES
128
129  3.1 The "case_sensitive_like" Pragma
130
131    This extension does not work well with the "case_sensitive_like"
132    pragma. If this pragma is used before the ICU extension is loaded,
133    then the pragma has no effect. If the pragma is used after the ICU
134    extension is loaded, then SQLite ignores the ICU implementation and
135    always uses the built-in LIKE operator.
136
137    The ICU extension LIKE operator is always case insensitive.
138
139  3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
140
141    Passing very long patterns to the built-in SQLite LIKE operator can
142    cause excessive CPU usage. To curb this problem, SQLite defines the
143    SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
144    pattern in bytes (irrespective of encoding). The default value is
145    defined in internal header file "limits.h".
146    
147    The ICU extension LIKE implementation suffers from the same 
148    problem and uses the same solution. However, since the ICU extension
149    code does not include the SQLite file "limits.h", modifying
150    the default value therein does not affect the ICU extension.
151    The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
152    the ICU extension LIKE operator is 50000, defined in source 
153    file "icu.c".
154
155  3.3 Collation Sequence Security Issue
156
157    Internally, SQLite assumes that indices stored in database files
158    are sorted according to the collation sequence indicated by the
159    SQL schema. Changing the definition of a collation sequence after
160    an index has been built is therefore equivalent to database
161    corruption. The SQLite library is not very well tested under
162    these conditions, and may contain potential buffer overruns
163    or other programming errors that could be exploited by a malicious
164    programmer.
165
166    If the ICU extension is used in an environment where potentially
167    malicious users may execute arbitrary SQL (i.e. gears), they
168    should be prevented from invoking the icu_load_collation() function,
169    possibly using the authorisation callback.
170