c15a2f4f45661a7f5f542e406282c146ea1a968d |
|
21-Apr-2017 |
Andreas Gampe <agampe@google.com> |
ART: Add object-readbarrier-inl.h Move some read-barrier code into a new header. This prunes the include tree for the concurrent-copying collector. Clean up other related includes. Test: mmma art Change-Id: I40ce4e74f2e5d4c692529ffb4df933230b6fd73e
|
c6ea7d00ad069a2736f603daa3d8eaa9a1f8ea11 |
|
02-Feb-2017 |
Andreas Gampe <agampe@google.com> |
ART: Clean up art_method.h Clean up the header. Fix up other headers including the -inl file, in an effort to prune the include graph. Fix broken transitive includes by making includes explicit. Introduce new -inl files for method handles and reference visiting. Test: source build/envsetup.sh && lunch aosp_angler-userdebug && mmma art Test: source build/envsetup.sh && lunch aosp_mips64-userdebug && mmma art Change-Id: I8f60f1160c2a702fdf3598149dae38f6fa6bc851
|
9cc0ea8140e0106e132efc3c1c5c458fa196ae41 |
|
16-Mar-2017 |
Roland Levillain <rpl@google.com> |
Refactor SystemArrayCopy intrinsics. Test: m test-art-host Test: m test-art-target Change-Id: I2f9ccdbb831030e670996b97e0c422f505b3abf6
|
0bd97173fab66572c95ce18fa785e00271adc014 |
|
16-Mar-2017 |
Colin Cross <ccross@android.com> |
Fix sign extension issues in x86_64 code generation movl expects an Immediate int64_t that is in the range -2GB to 2GB. Cast uint32_t addresses to int32_t before passing as an Immediate to movl. In VisitIntegerValueOf, the base address may not fit in the disp32 field. Fall back to storing the base address in a temporary register if it is larger than 2GB. Bug: 36281983 Test: m -j test-art-host with LibartImgHostBaseAddress == 0xa0000000 Change-Id: I5f8cc4f5a6220afc577707e3831113b0ead1d2b2
|
331605a7ba842573b3876e14c933175382b923c8 |
|
01-Mar-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Intrinsify Integer.valueOf."" Fix heap poisoning. LOG INFO instead of ERROR to avoid run-test failures with --no-image. bug:30933338 Test: ART_HEAP_POISONING=true test-art-host test-art-target This reverts commit db7b44ac3ea80a722aaed12e913ebc1661a57998. Change-Id: I0b7d4f1eb11c62c9a3df8e0de0b1a5d8af760181
|
db7b44ac3ea80a722aaed12e913ebc1661a57998 |
|
28-Feb-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Intrinsify Integer.valueOf." Heap poisoning missing jit-gcstress not optimizing it. bug:30933338 This reverts commit cd0b27287843cfd904dd163056322579ab4bbf27. Change-Id: I5ece1818afbca5214babb6803f62614a649aedeb
|
cd0b27287843cfd904dd163056322579ab4bbf27 |
|
23-Feb-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Intrinsify Integer.valueOf. Improves performance of ArrayListStress and Ritz by ~10% and ~3%. Test: test-art-host test-art-target bug: 30933338 Change-Id: I639046e3a18dae50069d3a7ecb538a900bb590a1
|
71bf7b43380eb445973f32a7f789d9670f8cc97d |
|
16-Nov-2016 |
Aart Bik <ajcbik@google.com> |
Optimizations around escape analysis. With tests. Details: (1) added new intrinsics (2) implemented optimizations more !can be null information more null check removals replace return-this uses with incoming parameter remove dead StringBuffer/Builder calls (with escape analysis) (3) Fixed exposed bug in CanBeMoved() Performance gain: This improves CafeineString by about 360% (removes null check from first loop, eliminates second loop completely) Test: test-art-host Change-Id: Iaf16a1b9cab6a7386f43d71c6b51dd59600e81c1
|
ff7d89c0364f6ebd0f0798eb18ef8bd62917de6a |
|
07-Nov-2016 |
Aart Bik <ajcbik@google.com> |
Allow read side effects for removing dead instructions. Rationale: Instructions that only have the harmless read side effect may be removed when dead as well, we were too strict previously. As proof of concept, this cl also provides more accurate information on a few string related intrinsics. This removes the dead indexOf from CaffeineString (17% performance improvement, big bottleneck of the StringBuffer's toString() still remains in loop). Test: test-art-host Change-Id: Id835a8e287e13e1f09be6b46278a039b8865802e
|
fdaf0f45510374d3a122fdc85d68793e2431175e |
|
13-Oct-2016 |
Vladimir Marko <vmarko@google.com> |
Change string compression encoding. Encode the string compression flag as the least significant bit of the "count" field, with 0 meaning compressed and 1 meaning uncompressed. The main vdex file is a tiny bit larger (+28B for prebuilt boot images, +32 for on-device built images) and the oat file sizes change. Measured on Nexus 9, AOSP ToT, these changes are insignificant when string compression is disabled (-200B for the 32-bit boot*.oat for prebuilt boot image, -4KiB when built on the device attributable to rounding, -16B for 64-bit boot*.oat for prebuilt boot image, no change when built on device) but with string compression enabled we get significant differences: prebuilt multi-part boot image: - 32-bit boot*.oat: -28KiB - 64-bit boot*.oat: -24KiB on-device built single boot image: - 32-bit boot.oat: -32KiB - 64-bit boot.oat: -28KiB The boot image oat file overhead for string compression: prebuilt multi-part boot image: - 32-bit boot*.oat: before: ~80KiB after: ~52KiB - 64-bit boot*.oat: before: ~116KiB after: ~92KiB on-device built single boot image: - 32-bit boot.oat: before: 92KiB after: 60KiB - 64-bit boot.oat: before: 116KiB after: 92KiB The differences in the SplitStringBenchmark seem to be lost in the noise. Test: Run ART test suite on host and Nexus 9 with Optimizing. Test: Run ART test suite on host and Nexus 9 with interpreter. Test: All of the above with string compression enabled. Bug: 31040547 Change-Id: I7570c2b700f1a31004a2d3c18b1cc30046d35a74
|
12b58b23de974232e991c650405f929f8b0dcc9f |
|
01-Nov-2016 |
Hiroshi Yamauchi <yamauchi@google.com> |
Clean up the runtime read barrier and fix fake address dependency. - Rename GetReadBarrierPointer to GetReadBarrierState. - Change its return type to uint32_t. - Fix the runtime fake address dependency for arm/arm64 using inline asm. - Drop ReadBarrier::black_ptr_ and some brooks code. Bug: 12687968 Test: test-art with CC, Ritz EAAC, libartd boot on N9. Change-Id: I595970db825db5be2e98ee1fcbd7696d5501af55
|
a1aa3b1f40e496d6f8b3b305a4f956ddf2e425fc |
|
26-Oct-2016 |
Roland Levillain <rpl@google.com> |
Add support for Baker read barriers in UnsafeCASObject intrinsics. Prior to doing the compare-and-swap operation, ensure the expected reference stored in the holding object's field is in the to-space by loading it, emitting a read barrier and updating that field with a strong compare-and-set operation with relaxed memory synchronization ordering (if needed). Test: ART host and target tests and Nexus 5X boot test with Baker read barriers. Bug: 29516905 Bug: 12687968 Change-Id: I480f6a9b59547f11d0a04777406b9bfeb905bfd2
|
4877b7986c9ba5c69be8f80692c260b4952f69be |
|
09-Sep-2016 |
jessicahandojo <jessicahandojo@google.com> |
String compression on intrinsics x86 and x86_64 Changes on intrinsics and Code Generation (x86 and x86_64) for string compression feature. Currently the feature is off. The size of boot.oat and boot.art for x86 before and after the changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.83% and boot.art decreased by 19.32%. Meanwhile for x86_64, size of boot.oat and boot.art before and after changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.87% and boot.art decreased by 6.59%. Turn feature on: runtime/mirror/string.h (kUseStringCompression = true) runtime/asm_support.h (STRING_COMPRESSION_FEATURE 1) Test: m -j31 test-art-host All tests passed both when the mirror::kUseStringCompression is ON and OFF. The jni_internal_test changed to assert an empty string length to be equal -(1 << 31) as it is compressed. Bug: 31040547 Change-Id: Ia447c9b147cabb6a69e6ded86be1fe0c46d9638d
|
804b03ffb9b9dc6cc3153e004c2cd38667508b13 |
|
14-Sep-2016 |
Vladimir Marko <vmarko@google.com> |
Change remaining slow path throw entrypoints to save everything. Change DivZeroCheck, BoundsCheck and explicit NullCheck slow path entrypoints to conform to kSaveEverything. On Nexus 9, AOSP ToT, the boot.oat size reduction is prebuilt multi-part boot image: - 32-bit boot.oat: -12KiB (-0.04%) - 64-bit boot.oat: -24KiB (-0.06%) on-device built single boot image: - 32-bit boot.oat: -8KiB (-0.03%) - 64-bit boot.oat: -16KiB (-0.04%) Test: Run ART test suite including gcstress on host and Nexus 9. Test: Manually disable implicit null checks and test as above. Change-Id: If82a8082ea9ae571c5d03b5e545e67fcefafb163
|
70e97462116a47ef2e582ea29a037847debcc029 |
|
09-Aug-2016 |
Vladimir Marko <vmarko@google.com> |
Avoid excessive spill slots for slow paths. Reducing the frame size makes stack maps smaller as we need fewer bits for stack masks and some dex register locations may use short location kind rather than long. On Nexus 9, AOSP ToT, the boot.oat size reduction is prebuilt multi-part boot image: - 32-bit boot.oat: -416KiB (-0.6%) - 64-bit boot.oat: -635KiB (-0.9%) prebuilt multi-part boot image with read barrier: - 32-bit boot.oat: -483KiB (-0.7%) - 64-bit boot.oat: -703KiB (-0.9%) on-device built single boot image: - 32-bit boot.oat: -380KiB (-0.6%) - 64-bit boot.oat: -632KiB (-0.9%) on-device built single boot image with read barrier: - 32-bit boot.oat: -448KiB (-0.6%) - 64-bit boot.oat: -692KiB (-0.9%) The other benefit is that at runtime, threads may need fewer pages for their stacks, reducing overall memory usage. We defer the calculation of the maximum spill size from the main register allocator (linear scan or graph coloring) to the RegisterAllocationResolver and do it based on the live registers at slow path safepoints. The old notion of an artificial slow path safepoint interval is removed as it is no longer needed. Test: Run ART test suite on host and Nexus 9. Bug: 30212852 Change-Id: I40b3d114e278e2c5807982904fa49bf6642c6275
|
ba45db072c48783e19a2a73ab4e45ae143c1c7c9 |
|
12-Jul-2016 |
Serban Constantinescu <serban.constantinescu@linaro.org> |
Extend the InvokeRuntime() changes to x86 and x86_64. Also fix the LocationSummary for intrinsics that call on main and slowpath. Test: test-art-host Change-Id: I437ffd433ee87b1754dbd8c075ec54f00d7d4ccb
|
953437bd51059801d92079295f728d0260efca31 |
|
24-Aug-2016 |
Vladimir Marko <vmarko@google.com> |
Revert "Revert "x86/x86-64: Avoid temporary for read barrier field load."" Fixed the fault handler recognizing the TEST instruction and fault address within the lock word. Added tests to 439-npe. Bug: 29966877 Bug: 12687968 Test: Tested with ART_USE_READ_BARRIER=true on host. Test: Tested with ART_USE_READ_BARRIER=true ART_HEAP_POISONING=true on host. This reverts commit ccf15bca330f9a23337b1a4b5850f7fcc6c1bf15. Change-Id: I8990def5f719c9205bf6e5fdba32027fa82bec50
|
ccf15bca330f9a23337b1a4b5850f7fcc6c1bf15 |
|
23-Aug-2016 |
Vladimir Marko <vmarko@google.com> |
Revert "x86/x86-64: Avoid temporary for read barrier field load." Fault handler does not recognize the instruction F6 /0 ib TEST r/m8, imm8 so we get crashes instead of NPEs. Bug: 29966877 Bug: 12687968 This reverts commit ccf06d8f19a37432de4a3b768747090adfbd18ec. Change-Id: Ib7db3b59f44c0d3ed5e24a20b6c6ee596a89d709
|
ccf06d8f19a37432de4a3b768747090adfbd18ec |
|
12-Aug-2016 |
Vladimir Marko <vmarko@google.com> |
x86/x86-64: Avoid temporary for read barrier field load. Add TEST instructions for memory and immediate. Use the byte version to avoid a temporary in read barrier field load. Test: Tested with ART_USE_READ_BARRIER=true on host. Test: Tested with ART_USE_READ_BARRIER=true ART_HEAP_POISONING=true on host. Bug: 29966877 Bug: 12687968 Change-Id: Ia415d3c2e1ae1ff6dff11d72bbb7d96d5deed6ee
|
0b671c0408e98824e1f92b1ee951b210c090fe7a |
|
19-Aug-2016 |
Roland Levillain <rpl@google.com> |
Add support for Baker read barriers in SystemArrayCopy intrinsics. Benchmarks (ARM64) score variations on Nexus 5X with CPU cores clamped at 960000 Hz (aosp_bullhead-userdebug build): - Ritzperf - average (lower is better): -3.03% (slightly better) - CaffeineMark - average (higher is better): +1.26% (slightly better) - DeltaBlue (lower is better): -10.50% (better) - Richards - average (lower is better): -3.36% (slightly better) - SciMark2 - average (higher is better): +0.26% (virtually unchanged) Details about Ritzperf benchmarks with meaningful variations (lower is better): - FormulaEvaluationActions.EvaluateAndApplyChanges: -13.26% (better) - FormulaEvaluationActions.EvaluateCascadingSums: -10.94% (better) - FormulaEvaluationActions.EvaluateComplexFormulas: -15.50% (better) - FormulaEvaluationActions.EvaluateFibonacci: -10.41% (better) - FormulaEvaluationActions.EvaluateLargeSums: +6.02% (worse) Boot image code size variation on Nexus 5X (aosp_bullhead-userdebug build): - total ARM64 framework Oat files size change: 107047632 bytes -> 107154128 bytes (+0.10%) - total ARM framework Oat files size change: 90932028 bytes -> 91009852 bytes (+0.09%) Test: ART host and target (ARM, ARM64) tests + Nexus 5X boot. Bug: 29516905 Bug: 29506760 Bug: 12687968 Change-Id: I85431368d09965687a0301ae2eb3c991f276ce5d
|
0cf8d9c08bb77b0f527121b83e6a9dbb36d602f3 |
|
10-Aug-2016 |
Aart Bik <ajcbik@google.com> |
Full enable new round implementation on x86/x86_64 Rationale: Running JIT on Fugu does not always provide a constant area. In such cases, we need to construct FP constants through stack. This only applies to x86. Test: 580-checker-round BUG=26327751 Change-Id: I7e2c80dafbafbe647cfe9ecb039920bb534c666a
|
7ad310dabb9c9b60f35f72c7b0736da602e7541a |
|
04-Aug-2016 |
Aart Bik <ajcbik@google.com> |
Temporary disable new round implementation on x86/x86_64 Rationale: FUGU is not happy Test: 580-checker-round BUG=26327751 Change-Id: If0ddea47a88e14b86d37080b8a18a6f8defcc8e6
|
349f388b8c75c674f3337e6474affb2bff91d507 |
|
03-Aug-2016 |
Aart Bik <ajcbik@google.com> |
Implement single-/double-precision round intrinsic in x86_64 Rationale: X86_64 does not provide a direct instruction for the required rounding and NaN and large positive numbers must be dealt with too. This CL generates code that correctly implements SP and DP round. Test: 580-checker-round BUG=26327751 Change-Id: Ia7518e2c30afafba4e037e2d0c21e0ce926f0425
|
542451cc546779f5c67840e105c51205a1b0a8fd |
|
26-Jul-2016 |
Andreas Gampe <agampe@google.com> |
ART: Convert pointer size to enum Move away from size_t to dedicated enum (class). Bug: 30373134 Bug: 30419309 Test: m test-art-host Change-Id: Id453c330f1065012e7d4f9fc24ac477cc9bb9269
|
806f0122e923581f559043e82cf958bab5defc87 |
|
09-Mar-2016 |
Serban Constantinescu <serban.constantinescu@linaro.org> |
Add support for CallKind::kCallOnMainAndSlowPath Some of the intrinsics call on both the main and slowpath. This patch adds support for such a CallKind and marks the intrinsics accordingly. This will be exercised by a later patch that refactors all the runtime calls to use InvokeRuntime(). Please note that without this patch, the calls to ValidateInvokeRuntime() exercised by the following patches would fail. Change-Id: I450571b8b47280a004b714996189ba6db13fb57d
|
54ff482710910929900f8348a19c5b875e519237 |
|
07-Jul-2016 |
Serban Constantinescu <serban.constantinescu@linaro.org> |
Rename kCall to kCallOnMainOnly This patch renames kCall to kCallOnMainOnly in preparation for the next patch in this series which will be adding kCallOnMainAndSlowPath. Note: With this patch there will be places where we use kCallOnMainOnly even though we call on the slow path too. The next patch in this series will fix that. Test: ART host tests. Change-Id: Iabfdb0901990d163be5d780f3bdd2fab6fa17b32
|
b198b013ae7bd2da85e007414fc028cd51a13883 |
|
07-Jul-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix System.arraycopy when doing same array copying. At compile time, if constant source < constant destination, and we don't know if the arrays are the same, then we must emit code that checks if the two arrays are the same. If so, we jump to the slow path. test:610-arraycopy bug:30030084 (cherry picked from commit 9f65db89353c46f6b189656f7f55a99054e5cfce) Change-Id:Ida67993d472b0ba4056d9c21c68f6e5239421f7d
|
fea1abd660cc89b31d121c8700fae8d804178391 |
|
06-Jul-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement System.arraycopy intrinsic on x86. Also remove wrong comments when doing the raw copying. test:run-test, 537-checker-arraycopy, 610-arraycopy Change-Id: I2495bc03cde8ccad03c93f7722dd29bf85138525
|
9f65db89353c46f6b189656f7f55a99054e5cfce |
|
07-Jul-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix System.arraycopy when doing same array copying. At compile time, if constant source < constant destination, and we don't know if the arrays are the same, then we must emit code that checks if the two arrays are the same. If so, we jump to the slow path. test:610-arraycopy Change-Id: Ida67993d472b0ba4056d9c21c68f6e5239421f7d
|
7f7f6dae98b0b1959348ffc84d8e347a26561226 |
|
21-Jun-2016 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
ART: OneBit intrinsics should use 1ULL for 64-bit shift Change-Id: I91cbe769081045e6a45a95154a8a8acf1ec352ef Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
|
3d31242300c3525e5c85665d9e34acf87002db63 |
|
23-Jun-2016 |
Roland Levillain <rpl@google.com> |
Re-enable most intrinsics with read barriers. Also extend sun.misc.Unsafe test coverage to exercise sun.misc.Unsafe.{get,put}{Int,Long,Object}Volatile. Bug: 26205973 Bug: 29516905 Change-Id: I4d8da7cee5c8a310c8825c1631f71e5cb2b80b30 Test: Covered by ART's run-tests.
|
0fcd2b84210db2bcf8b2d7a2b98a1a2bca367cac |
|
05-Apr-2016 |
Sang, Chunlei <chunlei.sang@intel.com> |
Fix x86 & x86-64 UnsafeGetObject intrinsics with read barriers. The implementation was incorrectly interpreting the 'offset' input as an index in a (4-byte) object reference array, whereas it is a (1-byte) offset to an object reference field within the 'base' (object) input. Bug: 29516905 Change-Id: Idfbead8289222b55069816a81284401eff791e85 Test: Covered by test/004-UnsafeTest.
|
87f3fcbd0db352157fc59148e94647ef21b73bce |
|
28-Apr-2016 |
Vladimir Marko <vmarko@google.com> |
Replace String.charAt() with HIR. Replace String.charAt() with HArrayLength, HBoundsCheck and HArrayGet. This allows GVN on the HArrayLength and BCE on the HBoundsCheck as well as using the infrastructure for HArrayGet, i.e. better handling of constant indexes than the old intrinsic and using the HArm64IntermediateAddress. Bug: 28330359 Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
53b52005ddae649e6f1cf475de7c259bc6885e6d |
|
24-May-2016 |
Vladimir Marko <vmarko@google.com> |
Apply String.equals() optimizations on arm, arm64 and x86-64. This is a follow-up to https://android-review.googlesource.com/174192 Change-Id: Ie71197df22548d6eb0ca773de6f19fcbb975f065
|
da05108b17b020b28555a8d5db5caa42e4435981 |
|
17-May-2016 |
Vladimir Marko <vmarko@google.com> |
Clean up String.indexOf() intrinsics. Additional cleanup after https://android-review.googlesource.com/223260 Bug: 28330359 Change-Id: I88def196babec70123896ef581ec8d61bb1b9a9a
|
288c7a8664e516d7486ab85267050e676e84cc39 |
|
16-May-2016 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Revert "Revert "ART: Reference.getReferent intrinsic for x86 and x86_64"" This reverts commit 0997d24e67d78f2146ebae2888eda0d7d254789a. ART_HEAP_POISONING=true mode is fixed. Change-Id: I83f6d5c101ea6a86802753f81b3e4348a263fb21 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
0997d24e67d78f2146ebae2888eda0d7d254789a |
|
13-May-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "ART: Reference.getReferent intrinsic for x86 and x86_64" Fails heap poisoning configuration. This reverts commit afdc97ebcb4e58afb7cf54d846d30314e6499d83. Change-Id: I50e53756a2b85059b89cfb8950f8c9e2b032743c
|
afdc97ebcb4e58afb7cf54d846d30314e6499d83 |
|
05-May-2016 |
Serguei Katkov <serguei.i.katkov@intel.com> |
ART: Reference.getReferent intrinsic for x86 and x86_64 Change-Id: I7a7ac9244847dd80d9fa4e4b5ebc5bf451c628ff Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
fb6c90a1d7c8333c74c09f64f7eb2f402c3ea002 |
|
06-May-2016 |
Vladimir Marko <vmarko@google.com> |
Improve String.indexOf() intrinsics. If the code point input is a char, we don't need the slow path. Also improve the slow-path check (if we do need it) on arm and arm64 to avoid loading 0xffff into a register. Bug: 28330359 Change-Id: Ie6514c16126717bb0b11e3c7ab2b60eaa70fed4c
|
ebea3d2cce6aa34216502bb6b83d155d4c92e4ff |
|
12-Apr-2016 |
Roland Levillain <rpl@google.com> |
Small changes in ARM and x86-64 SystemArrayCopy intrinsics. Have these intrinsics share a more uniform style with the ARM64 SystemArrayCopy intrinsic. Also make some changes/improvements in: - art::IntrinsicOptimizations - art::arm64::GenSystemArrayCopyAddresses Change-Id: Ieeb224795229580f8e5f7219c586d04786d8c705
|
fa3912edfac60a9f0a9b95a5862c7361b403fcc2 |
|
01-Apr-2016 |
Roland Levillain <rpl@google.com> |
Fix BitCount intrinsics assertions. Bug: 27852035 Change-Id: Iba43039aadd9ba288b476d53cc2306a58356465f
|
f969a209c30e3af636342d2fb7851d82a2529bf7 |
|
09-Mar-2016 |
Roland Levillain <rpl@google.com> |
Fix and enable java.lang.StringFactory intrinsics. The following intrinsics were not considered by the intrinsics recognizer: - StringNewStringFromBytes - StringNewStringFromChars - StringNewStringFromString This CL enables them and add tests for them. This CL also: - Fixes the locations of the ARM64 & MIPS64 StringNewStringFromString intrinsics. - Fixes the definitions of the FOUR_ARG_DOWNCALL macros on ARM and x86, which are used to implement the art_quick_alloc_string_from_bytes* runtime entry points. - Fixes PC info (stack maps) recording in the StringNewStringFromBytes, StringNewStringFromChars and StringNewStringFromString ARM, ARM64 & MIPS64 intrinsics. Bug: 27425743 Change-Id: I38c00d3f0b2e6b64f7d3fe9146743493bef9e45c
|
1193259cb37c9763a111825aa04718a409d07145 |
|
08-Mar-2016 |
Aart Bik <ajcbik@google.com> |
Implement the 1.8 unsafe memory fences directly in HIR. Rationale: More efficient since it exposes full semantics to all operations on the graph and allows for proper code generation for all architectures. bug=26264765 Change-Id: Ic435886cf0645927a101a8502f0623fa573989ff
|
0e54c0160c84894696c05af6cad9eae3690f9496 |
|
04-Mar-2016 |
Aart Bik <ajcbik@google.com> |
Unsafe: Recognize intrinsics for 1.8 java.util.concurrent With unit test. Rationale: Recognizing the 1.8 methods as intrinsics is the first step towards providing efficient implementation on all architectures. Where not implemented (everywhere for now), the methods fall back to the JNI native or reference implementation. NOTE: needs iam's CL first! bug=26264765 Change-Id: Ife65e81689821a16cbcdd2bb2d35641c6de6aeb6
|
2f9fcc999fab4ba6cd86c30e664325b47b9618e5 |
|
02-Mar-2016 |
Aart Bik <ajcbik@google.com> |
Simplified intrinsic macro mechanism. Rationale: Reduces boiler-plate code in all intrinsics code generators. Also, the newly introduced "unreachable" macro provides a static verifier that we do not have unreachable and thus redundant code in the generators. In fact, this change exposes that the MIPS32 and MIPS64 rotation intrinsics (IntegerRotateRight, LongRotateRight, IntegerRotateLeft, LongRotateLeft) are unreachable, since they are handled as HIR constructs for all architectures. Thus the code can be removed. Change-Id: I0309799a0db580232137ded72bb8a7bbd45440a8
|
cc3839c15555a2751e13980638fc40e4d3da633e |
|
29-Feb-2016 |
Roland Levillain <rpl@google.com> |
Improve documentation about StringFactory.newStringFromChars. Make it clear that the native method requires its third argument to be non-null, and therefore that the intrinsics do not need a null check for it. Bug: 27378573 Change-Id: Id2f78ceb0f7674f1066bc3f216b738358ca25542
|
2a6aad9d388bd29bff04aeec3eb9429d436d1873 |
|
25-Feb-2016 |
Aart Bik <ajcbik@google.com> |
Implement fp to bits methods as intrinsics. Rationale: Better optimization, better performance. Results on libcore benchmark: Most gain is from moving the invariant call out of the loop after we detect everything is a side-effect free intrinsic. But generated code in general case is much cleaner too. Before: timeFloatToIntBits() in 181 ms. timeFloatToRawIntBits() in 35 ms. timeDoubleToLongBits() in 208 ms. timeDoubleToRawLongBits() in 35 ms. After: timeFloatToIntBits() in 36 ms. timeFloatToRawIntBits() in 35 ms. timeDoubleToLongBits() in 35 ms. timeDoubleToRawLongBits() in 34 ms. bug=11548336 Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
|
75a38b24801bd4d27c95acef969930f626dd11da |
|
17-Feb-2016 |
Aart Bik <ajcbik@google.com> |
Implement isNaN intrinsic through HIR equivalent. Rationale: Efficient implementation on all platforms. Subject to better compiler optimizations. Change-Id: Ie8876bf5943cbe1138491a25d32ee9fee554043c
|
a19616e3363276e7f2c471eb2839fb16f1d43f27 |
|
02-Feb-2016 |
Aart Bik <ajcbik@google.com> |
Implemented compare/signum intrinsics as HCompare (with all code generation for all) Rationale: At HIR level, many more optimizations are possible, while ultimately generated code can take advantage of full semantics. Change-Id: I6e2ee0311784e5e336847346f7f3c4faef4fd17e
|
c5d4754198aadb2ada2d3f5daacd10d79bc13f38 |
|
28-Jan-2016 |
Aart Bik <ajcbik@google.com> |
Implementation of integer intrinsics on x86_64 Rationale: Efficient implementations of common integer operations. Already tested in: 564-checker-bitcount 565-checker-rotate: 566-checker-signum 567-checker-compare 568-checker-onebit (extended to deal with run-time zero) Change-Id: Ib48c76eee751e7925056d7f26797e9a9b5ae60dd
|
59c9454b92c2096a30a2bbdffb64edf33dbdd916 |
|
25-Jan-2016 |
Aart Bik <ajcbik@google.com> |
Recognize common utilities as intrinsics. Rationale: Recognizing these method calls as intrinsics already has major advantages (compiler knows about no-side-effects/no-throw properties). Next step is, of course, to implement these with native instructions on each architecture. Change-Id: I06fd12973238caec00d67b31b195d7f8807a538e
|
3f67e692860d281858485d48a4f1f81b907f1444 |
|
15-Jan-2016 |
Aart Bik <ajcbik@google.com> |
Implemented BitCount as an intrinsic. With unit test. Rationale: Recognizing this important operation as an intrinsic has various advantages: (1) having the no-side-effects/no-throw allows for much more GVN/LICM/BCE. (2) Some architectures, like x86_64, provide direct support for this operation. Performance improvements on X86_64: CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35% ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69% Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
|
e6d0d8de85f79c8702ee722a04cd89ee7e89aeb7 |
|
28-Dec-2015 |
Andreas Gampe <agampe@google.com> |
ART: Disable Math.round intrinsics The move to OpenJDK means that Android has caught up with the definition change of Math.round. Disable intrinsics. Bug: 26327751 Change-Id: I00dc6cfca12bd7c95e56a4ab76ffee707d3822dc
|
391b866ce55b8e78b1f9a6b98321d837256e8d66 |
|
18-Dec-2015 |
Roland Levillain <rpl@google.com> |
Disable the UnsafeCASObject intrinsic with read barriers. The current implementations of the UnsafeCASObject intrinsics are missing a read barrier. Temporarily disable them when read barriers are enabled. Also re-enable the jsr166.LinkedTransferQueueTest tests that were failing on the concurrent collector configuration, as the UnsafeCASObject JNI implementation now correctly implements the read barrier which was missing. Bug: 25883050 Bug: 26205973 Change-Id: Iaf5d515532949662d0ac6702c9452a00aa0a23e6
|
17077d888a6752a2e5f8161eee1b2c3285783d12 |
|
16-Dec-2015 |
Mark P Mendell <mark.p.mendell@intel.com> |
Revert "Revert "X86: Use locked add rather than mfence"" This reverts commit 0da3b9117706760e8722029f407da6d0297cc943. Fix a compilation failure that slipped in somehow. Change-Id: Ide8681cdc921febb296ea47aa282cc195f154049
|
0da3b9117706760e8722029f407da6d0297cc943 |
|
16-Dec-2015 |
Aart Bik <ajcbik@google.com> |
Revert "X86: Use locked add rather than mfence" This reverts commit 7b3e4f99b25c31048a33a08688557b133ad345ab. Reason: build error on sdk (linux) in git_mirror-aosp-master-with-vendor , please fix first art/compiler/optimizing/code_generator_x86_64.cc:4032:7: error: use of undeclared identifier 'codegen_' codegen_->MemoryFence(); Change-Id: I91f8542cfd944b7425d1981c35872dcdcb901e18
|
7b3e4f99b25c31048a33a08688557b133ad345ab |
|
19-Nov-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
X86: Use locked add rather than mfence Java semantics for memory ordering can be satisfied using lock addl $0,0(SP) rather than mfence. The locked add synchronizes the memory caches, but doesn't affect device memory. Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop with 600000000 iterations: time ./mfence real 0m5.411s user 0m5.408s sys 0m0.000s time ./locked_add real 0m3.552s user 0m3.550s sys 0m0.000s Implement this as an instruction-set-feature lock_add. This is off by default (uses mfence), and enabled for atom & silvermont variants. Generation of mfence can be forced by a parameter to MemoryFence. Change-Id: I5cb4fded61f4cbbd7b7db42a1b6902e43e458911 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
1e7f8db01a929ac816ca122868edc067c3c6cd17 |
|
15-Dec-2015 |
Roland Levillain <rpl@google.com> |
x86-64 Baker's read barrier fast path implementation. Introduce an x86-64 fast path implementation in Optimizing for Baker's read barriers (for both heap reference loads and GC root loads). The marking phase of the read barrier is performed by a slow path, invoking the runtime entry point artReadBarrierMark. Other read barrier algorithms continue to use the original slow path based implementation, which has been renamed as GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow. Bug: 12687968 Change-Id: I9329293ddca7f9bcb512132bde6675aa202b98b2
|
40a04bf64e5837fa48aceaffe970c9984c94084a |
|
11-Dec-2015 |
Scott Wakeling <scott.wakeling@linaro.org> |
Replace rotate patterns and invokes with HRor IR. Replace constant and register version bitfield rotate patterns, and rotateRight/Left intrinsic invokes, with new HRor IR. Where k is constant and r is a register, with the UShr and Shl on either side of a |, +, or ^, the following patterns are replaced: x >>> #k OP x << #(reg_size - k) x >>> #k OP x << #-k x >>> r OP x << (#reg_size - r) x >>> (#reg_size - r) OP x << r x >>> r OP x << -r x >>> -r OP x << r Implemented for ARM/ARM64 & X86/X86_64. Tests changed to not be inlined to prevent optimization from folding them out. Additional tests added for constant rotate amounts. Change-Id: I5847d104c0a0348e5792be6c5072ce5090ca2c34
|
a4f1220c1518074db18ca1044e9201492975750b |
|
06-Aug-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Optimizing: Add direct calls to math intrinsics Support the double forms of: cos, sin, acos, asin, atan, atan2, cbrt, cosh, exp, expm1, hypot, log, log10, nextAfter, sinh, tan, tanh Add these entries to the vector addressed off the thread pointer. Call the libc routines directly, which means that we have to implement the native ABI, not the ART one. For x86_64, that includes saving XMM12-15 as the native ABI considers them caller-save, while the ART ABI considers them callee-save. We save them by marking them as used by the call to the math function. For x86, this is not an issue, as all the XMM registers are caller-save. Other architectures will call Java as before until they are ready to implement the new intrinsics. Bump the OAT version since we are incompatible with old boot.oat files. Change-Id: Ic6332c3555c09393a17d1ad4daf62932488722fb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bf84a3d2aa29c0975b4ac0f6f983d56724b2cc57 |
|
04-Dec-2015 |
Roland Levillain <rpl@google.com> |
Annotate Boolean literals more uniformly in Optimizing's intrinsics. Change-Id: Ida40309b4bc170a18b4e5db552b77f021a7b89df
|
0d5a281c671444bfa75d63caf1427a8c0e6e1177 |
|
13-Nov-2015 |
Roland Levillain <rpl@google.com> |
x86/x86-64 read barrier support for concurrent GC in Optimizing. This first implementation uses slow paths to instrument heap reference loads and GC root loads for the concurrent copying collector, respectively calling the artReadBarrierSlow and artReadBarrierForRootSlow (new) runtime entry points. Notes: - This implementation does not instrument HInvokeVirtual nor HInvokeInterface instructions (for class reference loads), as the corresponding read barriers are not stricly required with the current concurrent copying collector. - Intrinsics which may eventually call (on slow path) are disabled when read barriers are enabled, as the current slow path infrastructure does not support this case. - When read barriers are enabled, the code generated for a HArraySet instruction always go into the array set slow path for object arrays (delegating the operation to the runtime), as we are lacking a mechanism to keep a temporary register live accross a runtime call (needed for the instrumentation of type checking code, which requires two successive read barriers). Bug: 12687968 Change-Id: I14cd6107233c326389120336f93955b28ffbb329
|
ea5af68d6dda832bdfb5978a0c5d6f86a3f67e80 |
|
22-Oct-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
X86-64: Split long/double constant array/field set A long constant needs to be in a register to store to memory. By allowing stores of constants that are outside of the range of int32_t, we reduce register usage. Also support sets of float/double constants by using integer stores. Rename RegisterOrInt32LongConstant to RegisterOrInt32Constant as it now handles any type of constant. Change-Id: I025d9ef889a5a433e45aa03b376bae40f14197d2 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
b488b7864b7bf9cade82d45c8bdda2372f48a10c |
|
22-Oct-2015 |
Roland Levillain <rpl@google.com> |
Fix heap poisoning in UnsafeCASObject x86/x86-64 intrinsic. Properly handle the case when the same object is passed to sun.misc.Unsafe.compareAndSwapObject for the `obj` and `newValue` arguments (named `base` and `value` in the intrinsic implementation) and re-enable this intrinsic. Also convert some reinterpret_casts to down_casts. Bug: 12687968 Change-Id: I82167cfa77840ae2cdb45b9f19f5f530858fe7e8
|
cfea7d54dc8902d93c3fd535294d6c364f823887 |
|
20-Oct-2015 |
Roland Levillain <rpl@google.com> |
Disable the x86 & x86-64 UnsafeCASObject intrinsic with heap poisoning. The current heap poisoning instrumentation of this intrinsic does not always work properly when heap poisoning in enabled, hence this quick fix to let the build & test infrastructure turn green again. Bug: 12687968 Change-Id: I03702a057fb6f07134e926e2c1c2780f47e3a50a
|
5bd05a5c9492189ec28edaf6396d6a39ddf03367 |
|
13-Oct-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement System.arraycopy intrinsic for arm. Change-Id: I58ae1af5103e281fe59fbe022b718d6d8f293a5e
|
ee3cf0731d0ef0787bc2947c8e3ca432b513956b |
|
06-Oct-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Intrinsify System.arraycopy. Currently on x64, will do the other architectures in different changes. Change-Id: I15fbbadb450dd21787809759a8b14b21b1e42624
|
cde4d272fdb1ac4d4eb8a0b58090b375a1fb50b5 |
|
18-Sep-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Fix x86_64 round intrinsic duplicate load When I changed the code to use Load64BitValue, I forgot to delete the original load instruction(s). Remove them now. Change-Id: I76aeccf88576507f2fbcf463ae1e503827a20fe2 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
85b62f23fc6dfffe2ddd3ddfa74611666c9ff41d |
|
09-Sep-2015 |
Andreas Gampe <agampe@google.com> |
ART: Refactor intrinsics slow-paths Refactor slow paths so that there is a default implementation for common cases (only arm64 with vixl is special). Write a generic intrinsic slow-path that can be reused for the specific architectures. Move helper functions into CodeGenerator so that they are accessible. Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
|
8f8926a5c7ea332ab387c2b3ebc6fd378a5761bc |
|
17-Aug-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Implement StringGetCharsNoCheck intrinsic for X86 Generate inline code for String.GetChars internal no checking form for X86 and X86_64. Use REP MOVSW to copy the characters, rather than memcpy as Quick does. Change-Id: Ia67aff248461b394f97c48053f216880381945ff Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2d554795420be0be88bb4600ea81d1ec293217c4 |
|
16-Sep-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
X86/X86_64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight Implement {Long,Integer}NumberOfTrailingZeros and {Long,Integer}Rotate{Left,Right}. X86 32 bit mode doesn't implement the LongRotate{Left,Right} intrinsics at this time. Change-Id: Ie25c1dca15ee2d17fbdf0c15c758bde431034d35 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
9ee23f4273efed8d6378f6ad8e63c65e30a17139 |
|
23-Jul-2015 |
Scott Wakeling <scott.wakeling@linaro.org> |
ARM/ARM64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight Change-Id: I2a07c279756ee804fb7c129416bdc4a3962e93ed
|
bfb5ba90cd6425ce49c2125a87e3b12222cc2601 |
|
01-Sep-2015 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "Do a second check for testing intrinsic types."" This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7. When an intrinsic with invoke-type virtual is recognized, replace the instruction with a new HInvokeStaticOrDirect. Minimal update for dex-cache rework. Fix includes. Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
|
4ab02352db4051d590b793f34d166a0b5c633c4a |
|
12-Aug-2015 |
Serban Constantinescu <serban.constantinescu@linaro.org> |
Use CodeGenerator::RecordPcInfo instead of SlowPathCode::RecordPcInfo. Part of a clean-up and refactoring series. SlowPathCode::RecordPcInfo is currently just a wrapper around CodGenerator::RecordPcInfo. Change-Id: Iffabef4ef37c365051130bf98a6aa6dc0a0fb254 Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
|
0c9497da9485ba688c592e5f452b7b1305a519c0 |
|
21-Aug-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
X86: Use short forward jumps if possible The optimizing compiler uses 32 bit relative jumps for all forward jumps, just in case the offset is too large to fit in one byte. Some of the generated code knows that the jumps will in fact fit. Use the 'NearLabel' class for the code generator and intrinsics. Use the jecxz/jrcxz instruction for string intrinsics. Unfortunately, conditional jumps to basic blocks don't know enough to use this, as we don't know how much code will be generated. This saves a whopping 0.24% for core.oat and boot.oat sizes, but every little bit helps, and it reduces icache footprint slightly. Change-Id: I633fe3b2e0e810b4ce12fdad8c02135644b63506 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
d5897678eb555a797d4e84e07814d79f0e0bb465 |
|
13-Aug-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Implement CountLeadingZeros for x86 Generate Long and Integer numberOfLeadingZeros for x86 and x86_64. Uses 'bsr' instruction to find the first one bit, and then corrects the result. Added some more tests with constant values to test constant folding. Also add a runtime test with 0 as the input. Change-Id: I920b21bb00069bccf5f921f8f87a77e334114926 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
f8cfb20cfa00f8987227204211e99486bc38572f |
|
14-Aug-2015 |
Agi Csaki <agicsaki@google.com> |
Optimizing String.Equals as an intrinsic (x86_64) The fourth implementation of String.Equals. I added an intrinsic in x86_64 which is similar to the original java implementation of String.equals: an instanceof check, null check,length check, and reference equality check followed by a loop comparing strings four characters at a time. Interesting Benchmarking Values: Optimizing Compiler on 64-bit Emulator Intrinsic 1-5 Character Strings: 48 ns Original 1-5 Character Strings: 56 ns Intrinsic 1000+ Character Strings: 4009 ns Original 1000+ Character Strings: 4704 ns Intrinsic Non-String Argument: 35 ns Original Non-String Argument: 42 ns Bug: 21481923 Change-Id: I17d0d2e24a670a898ab1729669d3990403b9a853
|
6bc53a9d884265e0a0b14c4383bef0aa47824e64 |
|
01-Jul-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Support X86 intrinsic System.arraycopy char This is an implementation of arraycopy for X86 and X86_64 versions of intrinsic System.arraycopy(char[], int, char[], int, int). The implementations use rep movsw to copy the chars. Change-Id: Icf9d0efb9986bc3e0794238a74f94fe02f9b42be Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
7da072feb160079734331e994ea52760cb2a3243 |
|
13-Aug-2015 |
agicsaki <agicsaki@google.com> |
Structure for String.Equals intrinsic Added structure for implementing String.Equals intrinsics. There is no functional change at this point- the intrinsic is marked as unimplemented for all instruction sets and compilers. Bug: 21481923 Change-Id: Ic2a1e22a113ff6091581126f12e926478c011340
|
cfa410b0ea561318f74a76c5323f0f6cd8eaaa50 |
|
25-May-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] More x86_64 code improvements Use the constant area some more, use 32-bit immediates in movq instructions when possible, and other small tweaks. Remove the commented out code for Math.Abs(float/double) as it would fail for baseline compiler due to the output being the same as the input. Change-Id: Ifa39f1865b94cec2e1c0a99af3066a645e9d3618 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
ce4b1329ca903d6b98734a27a46b54bb9cfd6d5b |
|
31-Jul-2015 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
ART: x86_64 RoundDouble/Float intrinsics should initialize out value. x86_64 RoundDouble intrinsic should initialize output register for the case of "inPlusPointFive >= maxLong" as expected. The same for the RoundFloat intrinsic. Fixed also the out register type in CreateSSE41FPToIntLocations provoking a DCHECK failure. Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com> (cherry picked from commit 9ca257196b46fd7629bce0b338580e571e4113a8) Bug: 22973442 Change-Id: If974e79d33311587d0b541a01ca8a4c9c11b9468
|
9ca257196b46fd7629bce0b338580e571e4113a8 |
|
31-Jul-2015 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
ART: x86_64 RoundDouble/Float intrinsics should initialize out value. x86_64 RoundDouble intrinsic should initialize output register for the case of "inPlusPointFive >= maxLong" as expected. The same for the RoundFloat intrinsic. Fixed also the out register type in CreateSSE41FPToIntLocations provoking a DCHECK failure. Change-Id: I0a910682e2917214861683c678ffba8e0f4bfed8 Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
|
611d3395e9efc0ab8dbfa4a197fa022fbd8c7204 |
|
10-Jul-2015 |
Scott Wakeling <scott.wakeling@linaro.org> |
ARM/ARM64: Implement numberOfLeadingZeros intrinsic. Change-Id: I4042fb7a0b75140475dcfca23e8f79d310f5333b
|
aabdf8ad2e8d3de953dff5c7591e7b3df4d4f60b |
|
03-Aug-2015 |
Roland Levillain <rpl@google.com> |
Revert "Optimizing String.Equals as an intrinsic (x86)" Reverted as it breaks the compilation of boot.{oat,art} on x86 (although this CL may not be the culprit, as the issue seems to come from Optimizing's register allocator). This reverts commit 8ab7bd6c8b10ad58758c33a1dc9326212bd200e9. Change-Id: If7c8b6258d1e690f4d2a06bcc82c92563ac6cdef
|
8ab7bd6c8b10ad58758c33a1dc9326212bd200e9 |
|
27-Jul-2015 |
agicsaki <agicsaki@google.com> |
Optimizing String.Equals as an intrinsic (x86) The third implementation of String.Equals. I added an intrinsic in x86 which is similar to the original java implementation of String.equals: an instanceof check, null check, length check, and reference equality check followed by a loop comparing strings character by character. Interesting Benchmarking Values: Optimizing Compiler on Nexus Player Intrinsic 15-30 Character Strings: 177 ns Original 15-30 Character Strings: 275 ns Intrinsic Null Argument: 59 ns Original Null Argument: 137 ns Intrinsic 100-1000 Character Strings: 1812 ns Original 100-1000 Character Strings: 6334 ns Bug: 21481923 Change-Id: Ia386e19b9dbfe0dac688b20ec93d8f90f67af47e
|
4d02711ea578dbb789abb30cbaf12f9926e13d81 |
|
01-Jul-2015 |
Roland Levillain <rpl@google.com> |
Implement heap poisoning in ART's Optimizing compiler. - Instrument ARM, ARM64, x86 and x86-64 code generators. - Note: To turn heap poisoning on in Optimizing, set the environment variable `ART_HEAP_POISONING' to "true" before compiling ART. Bug: 12687968 Change-Id: Ib3120b38cf805a8a50207a314b9ccc90c8d93740
|
9931f319cf86c56c2855d800339a3410697633a6 |
|
19-Jun-2015 |
Alexandre Rames <alexandre.rames@linaro.org> |
Opt compiler: Add a description to slow paths. Change-Id: I22160d90de3fe0ab3e6a2acc440bda8daa00e0f0
|
94015b939060f5041d408d48717f22443e55b6ad |
|
04-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect."" Fix was to special case baseline for x86, which does not have enough registers to allocate the current method. This reverts commit c345f141f11faad177aa9635a78088d00cf66086. Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
|
c345f141f11faad177aa9635a78088d00cf66086 |
|
04-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Use HCurrentMethod in HInvokeStaticOrDirect." Fails on baseline/x86. This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a. Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
|
38207af82afb6f99c687f64b15601ed20d82220a |
|
01-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Use HCurrentMethod in HInvokeStaticOrDirect. Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
|
3d21bdf8894e780d349c481e5c9e29fe1556051c |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
e401d146407d61eeb99f8d6176b2ac13c4df1e33 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
|
07276db28d654594e0e86e9e467cad393f752e6e |
|
18-May-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Don't do a null test in MarkGCCard if the value cannot be null. Change-Id: I45687f6d3505178e2fc3689eac9cb6ab1b2c1e29
|
21030dd59b1e350f6f43de39e3c4ce0886ff539c |
|
07-May-2015 |
Andreas Gampe <agampe@google.com> |
ART: x86 indexOf intrinsics for the optimizing compiler Add intrinsics implementations for indexOf in the optimizing compiler. These are mostly ported from Quick. Add instruction support to assemblers where necessary. Change-Id: Ife90ed0245532a5c436a26fe84715dc357f353c8
|
92e83bf8c0b2df8c977ffbc527989631d94b1819 |
|
07-May-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Tune some x86_64 moves Generate Moves of constant FP values by loading from the constant table. Use 'movl' to load a 64 bit register for positive 32-bit values, saving a byte in the generated code by taking advantage of the implicit zero extension. Change a couple of xorq(reg, reg) to xorl to (potentially) save a byte of code per xor. Change-Id: I5b2a807f0d3b29294fd4e7b8ef6d654491fa0b01 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
ec525fc30848189051b888da53ba051bc0878b78 |
|
28-Apr-2015 |
Roland Levillain <rpl@google.com> |
Factor MoveArguments methods in Optimizing's intrinsics handlers. Also add a precondition similar to the one present in code generators, regarding static invoke related explicit clinit check elimination in non-baseline compilations. Change-Id: I26f4dcb5d02824d7556f90b4b0c85b08b737fa53
|
2d27c8e338af7262dbd4aaa66127bb8fa1758b86 |
|
28-Apr-2015 |
Roland Levillain <rpl@google.com> |
Refactor InvokeDexCallingConventionVisitor in Optimizing. Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
|
3e3d73349a2de81d14e2279f60ffbd9ab3f3ac28 |
|
28-Apr-2015 |
Roland Levillain <rpl@google.com> |
Have HInvoke instructions know their number of actual arguments. Add an art::HInvoke::GetNumberOfArguments routine so that art::HInvoke and its subclasses can return the number of actual arguments of the called method. Use it in code generators and intrinsics handlers. Consequently, no longer remove a clinit check as last input of a static invoke if it is still present during baseline code generation, but ensure that static invokes have no such check as last input in optimized compilations. Change-Id: Iaf9e07d1057a3b15b83d9638538c02b70211e476
|
848f70a3d73833fc1bf3032a9ff6812e429661d9 |
|
15-Jan-2014 |
Jeff Hao <jeffhao@google.com> |
Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
|
4c0eb42259d790fddcd9978b66328dbb3ab65615 |
|
24-Apr-2015 |
Roland Levillain <rpl@google.com> |
Ensure inlined static calls perform clinit checks in Optimizing. Calls to static methods have implicit class initialization (clinit) checks of the method's declaring class in Optimizing. However, when such a static call is inlined, the implicit clinit check vanishes, possibly leading to an incorrect behavior. To ensure that inlining static methods does not change the behavior of a program, add explicit class initialization checks (art::HClinitCheck) as well as load class instructions (art::HLoadClass) as last input of static calls (art::HInvokeStaticOrDirect) in Optimizing' control flow graphs, when the declaring class is reachable and not known to be already initialized. Then when considering the inlining of a static method call, proceed only if the method has no implicit clinit check requirement. The added explicit clinit checks are already removed by the art::PrepareForRegisterAllocation visitor. This CL also extends this visitor to turn explicit clinit checks from static invokes into implicit ones after the inlining step, by removing the added art::HLoadClass nodes mentioned hereinbefore. Change-Id: I9ba452b8bd09ae1fdd9a3797ef556e3e7e19c651
|
641547a5f18ca2ea54469cceadcfef64f132e5e0 |
|
21-Apr-2015 |
Calin Juravle <calin@google.com> |
[optimizing] Fix a bug in moving the null check to the user. When taking the decision to move a null check to the user we did not verify if the next instruction checks the same object. Change-Id: I2f4533a4bb18aa4b0b6d5e419f37dcccd60354d2
|
40741f394b2737e503f2c08be0ae9dd490fb106b |
|
21-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Use more X86_64 addressing modes Allow constant and memory addresses to more X86_64 instructions. Add memory formats to X86_64 instructions to match. Fix a bug in cmpq(CpuRegister, const Address&). Allow mov <addr>,immediate (instruction 0xC7) to be a valid faulting instruction. Change-Id: I5b8a409444426633920cd08e09f687a7afc88a39 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
9021825d1e73998b99c81e89c73796f6f2845471 |
|
15-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Type MoveOperands. The ParallelMoveResolver implementation needs to know if a move is for 64bits or not, to handle swaps correctly. Bug found, and test case courtesy of Serguei I. Katkov. Change-Id: I9a0917a1cfed398c07e57ad6251aea8c9b0b8506
|
39dcf55a56da746e04f477f89e7b00ba1de03880 |
|
10-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Address x86_64 RIP patch comments Nicolas had some comments after the patch https://android-review.googlesource.com/#/c/144100 had merged. Fix the problems that he found. Change-Id: I40e8a4273997860db7511dc8f1986281b72bead2 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
f55c3e0825cdfc4c5a27730031177d1a0198ec5a |
|
27-Mar-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Add RIP support for x86_64 Support a constant area addressed using RIP on x86_64. Use it for FP operations to avoid loading constants into a CPU register and moving to a XMM register. Change-Id: I58421759ef2a8475538876c20e696ec787015a72 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
58d25fd052e999a24734b0cf856a1563e3d1b2d0 |
|
03-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Implement more x86/x86_64 intrinsics Implement CAS and bit reverse and byte reverse intrinsics that were missing from x86 and x86_64 implementations. Add assembler tests and compareAndSwapLong test. Change-Id: Iabb2ff46036645df0a91f640288ef06090a64ee3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
fb8d279bc011b31d0765dc7ca59afea324fd0d0c |
|
01-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Implement x86/x86_64 math intrinsics Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
3e90a96f403cbc353731e6687fe12a088f996cee |
|
27-Mar-2015 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
[optimizing] Do not inline intrinsics The intrinsics generally have specialized code and the code for them may be faster than what can be achieved with inlining. Thus inliner should skip intrinsics. At the same time, easy methods are not worth intrinsifying: ie String length and isEmpty. Those can be handled by inliner with no problem and can actually lead to better code since call is not kept around through all of the optimizations. Change-Id: Iab38e6c33f79efa54d845d4871cf26fa9b235ab0 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
512e04d1ea7fb33e3992715fe55be8a834d4a79c |
|
27-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix typos spotted by Andreas. Change-Id: I564b4bc5995d91f4c6c4e4f2427ed7c279cb8740
|
d75948ac93a4a317feaf136cae78823071234ba5 |
|
27-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Intrinsify String.compareTo. Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
|
a8ac9130b872c080299afacf5dcaab513d13ea87 |
|
13-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Refactor code in preparation of correct stack maps in slow path. Move the logic of saving/restoring live registers in slow path in the SlowPathCode method. Also add a RecordPcInfo helper to SlowPathCode, that will act as the placeholder of saving correct stack maps. Change-Id: I25c2bc7a642ef854bbc8a3eb570e5c8c8d2d030c
|
878d58cbaf6b17a9e3dcab790754527f3ebc69e5 |
|
16-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Arm64 optimizing compiler intrinsics Implement most intrinsics for the optimizing compiler for Arm64. Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
|
42d1f5f006c8bdbcbf855c53036cd50f9c69753e |
|
16-Jan-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Do not use register pair in a parallel move. The ParallelMoveResolver does not work with pairs. Instead, decompose the pair into two individual moves. Change-Id: Ie9d3f0b078cef8dc20640c98b20bb20cc4971a7f
|
71fb52fee246b7d511f520febbd73dc7a9bbca79 |
|
30-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Optimizing compiler intrinsics Add intrinsics infrastructure to the optimizing compiler. Add almost all intrinsics supported by Quick to the x86-64 backend. Further intrinsics require more assembler support. Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807
|