cd09e1f4f9902b82fa62cb2da984ea499e3b2d70 |
|
24-Nov-2017 |
Vladimir Marko <vmarko@google.com> |
Fix stats reporting over 100% methods compiled. Add statistics for intrinsic and native stub compilation and JIT failing to allocate memory for committing the code. Clean up recording of compilation statistics. New statistics when building aosp_taimen-userdebug boot image with --dump-stats: Attempted compilation of 94304 methods: 99.99% (94295) compiled. OptStat#AttemptBytecodeCompilation: 89487 OptStat#AttemptIntrinsicCompilation: 160 OptStat#CompiledNativeStub: 4733 OptStat#CompiledIntrinsic: 84 OptStat#CompiledBytecode: 89478 ... where 94304=89487+4733+84 and 94295=89478+4733+84. Test: testrunner.py -b --host --optimizing Test: Manually inspect output of building boot image with --dump-stats. Bug: 69627511 Change-Id: I15eb2b062a96f09a7721948bcc77b83ee4f18efd
|
33bff25bcd7a02d35c54f63740eadb1a4833fc92 |
|
01-Nov-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Make InstructionSet an enum class and add kLast. Adding InstructionSet::kLast shall make it easier to encode the InstructionSet in fewer bits using BitField<>. However, introducing `kLast` into the `art` namespace is not a good idea, so we change the InstructionSet to an enum class. This also uncovered a case of InstructionSet::kNone being erroneously used instead of vixl32::Condition::None(), so it's good to remove `kNone` from the `art` namespace. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
ca6fff898afcb62491458ae8bcd428bfb3043da1 |
|
03-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Use ScopedArenaAllocator for pass-local data. Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
0f689e773c49536208d40a2e23410deea4acc184 |
|
02-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
ARM/ARM64: Move simplifier visitors to .cc files. Test: Rely on TreeHugger. Change-Id: Ib2cad20a4d6252812aaf6fa09a576bdfca423b70
|
0ebe0d83138bba1996e9c8007969b5381d972b32 |
|
21-Sep-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
bc5460b850a0fa2d8dcf6c8d36b0eb86f8fe46a8 |
|
20-Jul-2017 |
Lena Djokic <Lena.Djokic@imgtec.com> |
MIPS: Support MultiplyAccumulate for SIMD. Moved support for multiply accumulate from arm64-specific to general instruction simplification. Also extended 550-checker-multiply-accumulate test. Test: test-art-host, test-art-target Change-Id: If113f0f0d5cb48e8a76273c919cfa2f49fce667d
|
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0 |
|
27-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Share address computation across SIMD LDRs/STRs. For array accesses the element address has the following structure: Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT Taking into account ARM64 LDR/STR addressing modes address part (CONST_OFFSET + index << ELEM_SHIFT) can be shared across array access with the same data type and index. For example, for the following loop 5 accesses can share address computation: void foo(int[] a, int[] b, int[] c) { for (i...) { a[i] = a[i] + 5; b[i] = b[i] + c[i]; } } Test: test-art-host, test-art-target Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
|
f34dd206d0073fb3949be872224420a8488f551f |
|
10-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Support MultiplyAccumulate for SIMD. Test: test-art-host, test-art-target. Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
|
74234daabb28a4b9c804bf8bf908e7334bd4d400 |
|
13-Jan-2017 |
Anton Kirilov <anton.kirilov@linaro.org> |
ARM: Merge data-processing instructions and shifts/(un)signed extensions This commit mirrors the work that has already been done for ARM64. Test: m test-art-target-run-test-551-checker-shifter-operand Change-Id: Iec8c1563b035f40f0e18dcffde28d91dc21922f8
|
fdaf0f45510374d3a122fdc85d68793e2431175e |
|
13-Oct-2016 |
Vladimir Marko <vmarko@google.com> |
Change string compression encoding. Encode the string compression flag as the least significant bit of the "count" field, with 0 meaning compressed and 1 meaning uncompressed. The main vdex file is a tiny bit larger (+28B for prebuilt boot images, +32 for on-device built images) and the oat file sizes change. Measured on Nexus 9, AOSP ToT, these changes are insignificant when string compression is disabled (-200B for the 32-bit boot*.oat for prebuilt boot image, -4KiB when built on the device attributable to rounding, -16B for 64-bit boot*.oat for prebuilt boot image, no change when built on device) but with string compression enabled we get significant differences: prebuilt multi-part boot image: - 32-bit boot*.oat: -28KiB - 64-bit boot*.oat: -24KiB on-device built single boot image: - 32-bit boot.oat: -32KiB - 64-bit boot.oat: -28KiB The boot image oat file overhead for string compression: prebuilt multi-part boot image: - 32-bit boot*.oat: before: ~80KiB after: ~52KiB - 64-bit boot*.oat: before: ~116KiB after: ~92KiB on-device built single boot image: - 32-bit boot.oat: before: 92KiB after: 60KiB - 64-bit boot.oat: before: 116KiB after: 92KiB The differences in the SplitStringBenchmark seem to be lost in the noise. Test: Run ART test suite on host and Nexus 9 with Optimizing. Test: Run ART test suite on host and Nexus 9 with interpreter. Test: All of the above with string compression enabled. Bug: 31040547 Change-Id: I7570c2b700f1a31004a2d3c18b1cc30046d35a74
|
0576575d075e97a227010b4adf74ad5c8a920bde |
|
10-Sep-2016 |
jessicahandojo <jessicahandojo@google.com> |
String Compression for ARM and ARM64 Changes on intrinsics and Code Generation on ARM and ARM64 for string compression feature. Currently the feature is off. The size of boot.oat and boot.art for ARM before and after the changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.60% and boot.art decreased by 9.38%. Meanwhile for ARM64, size of boot.oat and boot.art before and after changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.48% and boot.art decreased by 6.58%. Turn feature on: runtime/mirror/string.h (kUseStringCompression = true) runtime/asm_support.h (STRING_COMPRESSION_FEATURE 1) Test: m -j31 test-art-target All tests passed both when the mirror::kUseStringCompression is ON and OFF. Bug: 31040547 Change-Id: I24e86b99391df33ba27df747779b648c5a820649
|
328429ff48d06e2cad4ebdd3568ab06de916a10a |
|
06-Jul-2016 |
Artem Serov <artem.serov@linaro.org> |
ARM: Port instr simplification of array accesses. After changing the addressing mode for array accesses (in https://android-review.googlesource.com/248406) the 'add' instruction that calculates the base address for the array can be shared across accesses to the same array. Before https://android-review.googlesource.com/248406: add IP, r[Array], r[Index0], LSL #2 ldr r0, [IP, #12] add IP, r[Array], r[Index1], LSL #2 ldr r0, [IP, #12] Before this CL: add IP. r[Array], #12 ldr r0, [IP, r[Index0], LSL #2] add IP. r[Array], #12 ldr r0, [IP, r[Index1], LSL #2] After this CL: add IP. r[Array], #12 ldr r0, [IP, r[Index0], LSL #2] ldr r0, [IP, r[Index1], LSL #2] Link to the original optimization: https://android-review.googlesource.com/#/c/127310/ Test: Run ART test suite on Nexus 6. Change-Id: Iee26f9a0a7ca46abb90e3f60d19d22dc8dee4d8f
|
87f3fcbd0db352157fc59148e94647ef21b73bce |
|
28-Apr-2016 |
Vladimir Marko <vmarko@google.com> |
Replace String.charAt() with HIR. Replace String.charAt() with HArrayLength, HBoundsCheck and HArrayGet. This allows GVN on the HArrayLength and BCE on the HBoundsCheck as well as using the infrastructure for HArrayGet, i.e. better handling of constant indexes than the old intrinsic and using the HArm64IntermediateAddress. Bug: 28330359 Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Bug: 27856014 (cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c) Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
|
46817b876ab00d6b78905b80ed12b4344c522b6c |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
|
7fc6350f6f1ab04b52b9cd7542e0790528296cbe |
|
09-Feb-2016 |
Artem Serov <artem.serov@linaro.org> |
Integrate BitwiseNegated into shared framework. Share implementation between arm and arm64. Change-Id: I0dd12e772cb23b4c181fd0b1e2a447470b1d8702
|
9ff0d205fd60cba6753a91f613b198ca2d67f04d |
|
11-Jan-2016 |
Kevin Brodsky <kevin.brodsky@linaro.org> |
Optimizing: ARM64 negated bitwise operations simplification Use negated instructions on ARM64 to replace [bitwise operation + not] patterns, that is: a & ~b (BIC) a | ~b (ORN) a ^ ~b (EON) The simplification only happens if the Not is only used by the bitwise operation. It does not happen if both inputs are Not's (this should be handled by a generic simplification applying De Morgan's laws). Change-Id: I0e112b23fd8b8e10f09bfeff5994508a8ff96e9c
|
4a0dad67867f389e01a5a6c0fe381d210f687c0d |
|
25-Jan-2016 |
Artem Udovichenko <artem.u@samsung.com> |
Revert "Revert "ARM/ARM64: Extend support of instruction combining."" This reverts commit 6b5afdd144d2bb3bf994240797834b5666b2cf98. Change-Id: Ic27a10f02e21109503edd64e6d73d1bb0c6a8ac6
|
6b5afdd144d2bb3bf994240797834b5666b2cf98 |
|
22-Jan-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "ARM/ARM64: Extend support of instruction combining." The test fails its checker parts. This reverts commit debeb98aaa8950caf1a19df490f2ac9bf563075b. Change-Id: I49929e15950c7814da6c411ecd2b640d12de80df
|
debeb98aaa8950caf1a19df490f2ac9bf563075b |
|
11-Dec-2015 |
Ilmir Usmanov <i.usmanov@samsung.com> |
ARM/ARM64: Extend support of instruction combining. Combine multiply instructions in the following way: ARM64: MUL/NEG -> MNEG ARM32 (32-bit integers only): MUL/ADD -> MLA MUL/SUB -> MLS Change-Id: If20f2d8fb060145ab6fbceeb5a8f1a3d02e0ecdb
|
cd3d0fb5a4c113cfdb610454d133762a2ab0e6de |
|
15-Jan-2016 |
Roland Levillain <rpl@google.com> |
Do not use HArm64IntermediateAddress with read barriers. This ARM64 instruction simplification does not yet work correctly with the read barrier compiler instrumentation. Bug: 26601270 Bug: 12687968 Change-Id: I0c3c5d0043ebd936e00984740efbae8b3025c7ca
|
295abc1a3aec98868544dfd4e0eeab797c3d60c2 |
|
31-Dec-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Set RTI of HArm64IntermediateAddress Change-Id: I2145bc249cc940d7b133fd6cbbd133cc62fee187
|
dce90b9198d523488b8f9a04dfb3834311ff3554 |
|
16-Dec-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "ART: Set RTI of Arm64IntermediateAddress" This reverts commit e36ae9435da21542891ceeebb3328f5066c8301e. Change-Id: If675b02db04bee78cc95da4ed58e545da5085da1
|
e36ae9435da21542891ceeebb3328f5066c8301e |
|
14-Dec-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Set RTI of Arm64IntermediateAddress Fixes the arm64 build after I7a3aee1ff66c82d64b4846611c547af17e91d260. Change-Id: Ic2c72df59e0ddbdf2edc8519a6954d078a5ef596
|
8626b741716390a0119ffeb88b5b9fcf08e13010 |
|
25-Nov-2015 |
Alexandre Rames <alexandre.rames@linaro.org> |
ARM64: Use the shifter operands. This introduces architecture-specific instruction simplification. On ARM64 we try to merge shifts and sign-extension operations into arithmetic and logical instructions. For example for the Java code int res = a + (b << 5); we would generate lsl w3, w2, #5 add w0, w1, w3 and we now generate add w0, w1, w2, lsl #5 Change-Id: Ic03bdff44a1c12e21ddff1b0513bd32a730742b7
|
418318f4d50e0cfc2d54330d7623ee030d4d727d |
|
20-Nov-2015 |
Alexandre Rames <alexandre.rames@linaro.org> |
ARM64: Add support for multiply-accumulate. Change-Id: I88dc313df520480f3fd16bbabda27f9435d25368
|
e6dbf48d7a549e58a3d798bbbdc391e4d091b432 |
|
19-Oct-2015 |
Alexandre Rames <alexandre.rames@linaro.org> |
ARM64: Instruction simplification for array accesses. HArrayGet and HArraySet with variable indexes generate two instructions on arm64, like add temp, obj, #data_offset ldr out, [temp, index LSL #shift_amount] When we have multiple accesses to the same array, the initial `add` instruction is redundant. This patch introduces the first instruction simplification in the arm64-specific instruction simplification pass. It splits HArrayGet and HArraySet using the new arm64-specific IR HIntermediateAddress. After that we run GVN again to squash the multiple occurrences of HIntermediateAddress. Change-Id: I2e3d12fbb07fed07b2cb2f3f47f99f5a032f8312
|
44b9cf937836bb33139123e15ca8b586b5853268 |
|
19-Aug-2015 |
Alexandre Rames <alexandre.rames@linaro.org> |
Put in place the ARM64 instruction simplification framework. This commit introduces and runs the empty InstructionSimplifierArm64 pass. Further commits will introduce arm64-specific transformations in that pass. Change-Id: I458f8a2b15470297b87fc1f7ff85bd52155d93ef
|