History log of /libcore/libart/src/main/java/java/lang/StringFactory.java
Revision Date Author Comments
2259a71ba68c8b2d0d74396277fd25cdc0692350 22-Dec-2017 Victor Chang <vichang@google.com> Make the Android fast-path UTF-8 decoder follow the Unicode Standard and the W3C Encoding standard.

The behavior of UTF-8 decoder in the RI has been made to strictly
follow the Unicode standard since OpenJDK 8. JDK-7096080
Essentially, it rejects
1. 3-byte surrogate/6-byte surrogate pair (CESU-8 sequence)
2. treats an ill-formed sequence, e.g. a surrogate, as individual ill-formed bytes.

This change updates Android's fast-path UTF-8 decoder to
- follow the Unicode standard
- have a behavior closer to RI OpenJDK 8
- have consistent behavior between java.nio.charset.CharsetDecoder and fast-path code

It implements the W3C recommended UTF-8 decoder.
https://www.w3.org/TR/encoding/#utf-8-decoder

Behavior change of the fast-path UTF-8 decoder
- No longer behaves like a decoder for Modified UTF-8 and CESU-8 sequence
-- If an app needs to decode a Modified UTF-8 / CESU-8 sequence,
the app can use public API DataInputStream.readUTF or JNI function NewStringUTF
See example at StringTest.decodeModifiedUTF8
- Treat overlong sequence as ill-formed.
For example, byte sequence "c0 b1" is over-long form of character '1' U+0031.
- Treat surrogate (U+D800..U+DFFF) as ill-formed
- Maximal subpart should be replaced by a single U+FFFD.
For example, in byte sequence "41 C0 AF 41 F4 80 80 41", the maximal subparts are
"C0", "AF", and "F4 80 80". "F4 80 80" can be the initial subsequence of "F4 80 80 80",
but "C0 AF" can't be the initial subsequence of any well-formed code unit sequence.
Thus, the output should be "A\ufffd\ufffdA\ufffdA".

Test change:
- CharsetEncoder2Test.testUtf8Encoding: UTF-8 encoded Surrogate is treated as invalid
- X500PrincipalTest.testValidDN: Overlong sequence is now treated as invalid.
According to my test, Android Conscrypt (and BoringSSL) has rejected a certificate with
such overlong sequence in CN since OC MR1. Thus, it has little use case to create
X500Principal with overlong UTF-8 sequence.
Also, RI doesn't pass this test either.
Context: From my understanding, certificate and X500 principal are stored
in ASN.1 format. The RFC standards quoted in X500Principal don't
prohibit overlong UTF-8 sequences. But the new standards RFC5280 for X.509
and RFC3629 for UTF-8 explicitly prohibits any overlong UTF-8 sequences.

Performance change:
The performance of the fast-path decoder is similar before and after the change.

=== Before the change ===
CharsetBenchmark
Experiment {instrument=runtime, benchmarkMethod=time_new_String_BString, vm=default, parameters={length=10000, name=UTF-8}}
Results:
runtime(ns): min=574795.84, 1st qu.=574795.84, median=574795.84, mean=574795.84, 3rd qu.=574795.84, max=574795.84
Trial Report (1 of 4):

CharsetUtf8Benchmark
Experiment {instrument=runtime, benchmarkMethod=time_ascii, vm=default, parameters={}}
Results:
runtime(ns): min=58290943.00, 1st qu.=58290943.00, median=58290943.00, mean=58290943.00, 3rd qu.=58290943.00, max=58290943.00
Trial Report (2 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_bmp2, vm=default, parameters={}}
Results:
runtime(ns): min=77581414.00, 1st qu.=77581414.00, median=77581414.00, mean=77581414.00, 3rd qu.=77581414.00, max=77581414.00
Trial Report (3 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_bmp3, vm=default, parameters={}}
Results:
runtime(ns): min=57457297.00, 1st qu.=57457297.00, median=57457297.00, mean=57457297.00, 3rd qu.=57457297.00, max=57457297.00
Trial Report (4 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_supplementary, vm=default, parameters={}}
Results:
runtime(ns): min=60723183.00, 1st qu.=60723183.00, median=60723183.00, mean=60723183.00, 3rd qu.=60723183.00, max=60723183.00

=== After the change ===
CharsetBenchmark
Experiment {instrument=runtime, benchmarkMethod=time_new_String_BString, vm=default, parameters={length=10000, name=UTF-8}}
Results:
runtime(ns): min=523638.25, 1st qu.=523638.25, median=523638.25, mean=523638.25, 3rd qu.=523638.25, max=523638.25
CharsetUtf8Benchmark
Trial Report (1 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_ascii, vm=default, parameters={}}
Results:
runtime(ns): min=57101725.00, 1st qu.=57101725.00, median=57101725.00, mean=57101725.00, 3rd qu.=57101725.00, max=57101725.00
Trial Report (2 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_bmp2, vm=default, parameters={}}
Results:
runtime(ns): min=76573080.00, 1st qu.=76573080.00, median=76573080.00, mean=76573080.00, 3rd qu.=76573080.00, max=76573080.00
Trial Report (3 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_bmp3, vm=default, parameters={}}
Results:
runtime(ns): min=59655214.00, 1st qu.=59655214.00, median=59655214.00, mean=59655214.00, 3rd qu.=59655214.00, max=59655214.00
Trial Report (4 of 4):
Experiment {instrument=runtime, benchmarkMethod=time_supplementary, vm=default, parameters={}}
Results:
runtime(ns): min=67283548.00, 1st qu.=67283548.00, median=67283548.00, mean=67283548.00, 3rd qu.=67283548.00, max=67283548.00

Test: cts-tradefed run cts-dev -m CtsLibcoreTestCases
Test: cts-tradefed run cts-dev -m CtsLibcoreOjTestCases
Bug: 69599767
Bug: 70511691
Change-Id: I2c3e84808b19c969905813f6654ba552b6745354
fa5b565a3f6c6d7cbd6106ee8d360304c3a939a3 17-Feb-2017 Igor Murashkin <iam@google.com> jni: Switch to @FastNative for all JNI functions.

Switches all (248) methods that previously used !bang JNI in art/libcore
to all use @FastNative.

As a nice benefit, this should be about 1.5x faster than before for those method calls.
This measures out to a 3% startup time improvement for system_server.

Test: make test-art-host
Bug: 34955272
Change-Id: I0881f401c7660c79f275235362777bfa58241deb
04b80a24b7d6c36367529c606ca28b9a3729eeb6 29-Feb-2016 Roland Levillain <rpl@google.com> Improve documentation about StringFactory.newStringFromChars.

Make it clear that the native method requires its third
argument to be non-null.

Bug: 27378573
Change-Id: I4c42d5cb8f8f4ff20c42dbdf1e600f40be10607a
9c733c7f353c888889651beae9f30c7c42d73e05 26-Feb-2016 Roland Levillain <rpl@google.com> Fix a typo in a comment.

Change-Id: I8969820fb38776422d6d00b7f8c0f1f658ec4591
83c7414449bc406b581f0cb81ae06e7bce91403c 15-Jan-2014 Jeff Hao <jeffhao@google.com> Removed offset and value from String and added StringFactory.

Change-Id: I55314ceb906d0bf7e78545dcd9bc3489a5baf03f