README.utf8 revision e46c9386c4f79aa40185f79a19fc5b2a7ef528b3
1  Here is a description of how you can use STLport to read/write utf8 files.
2utf8 is a way of encoding wide characters. As so, management of encoding in
3the C++ Standard library is handle by the codecvt locale facet which is part
4of the ctype category. However utf8 only describe how encoding must be
5performed, it cannot be used to classify characters so it is not enough info
6to know how to generate the whole ctype category facets of a locale
7instance.
8
9In C++ it means that the following code will throw an exception to
10signal that creation failed:
11
12#include <locale>
13// Will throw a std::runtime_error exception.
14std::locale loc(".utf8");
15
16For the same reason building a locale with the ctype facets based on
17UTF8 is also wrong:
18
19// Will throw a std::runtime_error exception:
20std::locale loc(locale::classic(), ".utf8", std::locale::ctype);
21
22The only solution to get a locale instance that will handle utf8 encoding
23is to specifically signal that the codecvt facet should be based on utf8
24encoding:
25
26// Will succeed if there is necessary platform support.
27locale loc(locale::classic(), new codecvt_byname<wchar_t, char, mbstate_t>(".utf8"));
28
29  Once you have obtain a locale instance you can inject it in a file stream to
30read/write utf8 files:
31
32std::fstream fstr("file.utf8");
33fstr.imbue(loc);
34
35You can also access the facet directly to perform utf8 encoding/decoding operations:
36
37typedef std::codecvt<wchar_t, char, mbstate_t> codecvt_t;
38const codecvt_t& encoding = use_facet<codecvt_t>(loc);
39
40Notes:
41
421. The dot ('.') is mandatory in front of utf8. This is a POSIX convention, locale
43names have the following format:
44language[_country[.encoding]]
45
46Ex: 'fr_FR'
47    'french'
48    'ru_RU.koi8r'
49
502. utf8 encoding is only supported for the moment under Windows. The less common
51utf7 encoding is also supported.
52