Cross Reference: /art/compiler/optimizing/instruction_simplifier

History log of /art/compiler/optimizing/instruction_simplifier_arm64.cc
Revision	Date	Author	Comments
cd09e1f4f9902b82fa62cb2da984ea499e3b2d70	24-Nov-2017	Vladimir Marko <vmarko@google.com>	Fix stats reporting over 100% methods compiled. Add statistics for intrinsic and native stub compilation and JIT failing to allocate memory for committing the code. Clean up recording of compilation statistics. New statistics when building aosp_taimen-userdebug boot image with --dump-stats: Attempted compilation of 94304 methods: 99.99% (94295) compiled. OptStat#AttemptBytecodeCompilation: 89487 OptStat#AttemptIntrinsicCompilation: 160 OptStat#CompiledNativeStub: 4733 OptStat#CompiledIntrinsic: 84 OptStat#CompiledBytecode: 89478 ... where 94304=89487+4733+84 and 94295=89478+4733+84. Test: testrunner.py -b --host --optimizing Test: Manually inspect output of building boot image with --dump-stats. Bug: 69627511 Change-Id: I15eb2b062a96f09a7721948bcc77b83ee4f18efd
33bff25bcd7a02d35c54f63740eadb1a4833fc92	01-Nov-2017	Vladimir Marko <vmarko@google.com>	ART: Make InstructionSet an enum class and add kLast. Adding InstructionSet::kLast shall make it easier to encode the InstructionSet in fewer bits using BitField<>. However, introducing `kLast` into the `art` namespace is not a good idea, so we change the InstructionSet to an enum class. This also uncovered a case of InstructionSet::kNone being erroneously used instead of vixl32::Condition::None(), so it's good to remove `kNone` from the `art` namespace. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
ca6fff898afcb62491458ae8bcd428bfb3043da1	03-Oct-2017	Vladimir Marko <vmarko@google.com>	ART: Use ScopedArenaAllocator for pass-local data. Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
0f689e773c49536208d40a2e23410deea4acc184	02-Oct-2017	Vladimir Marko <vmarko@google.com>	ARM/ARM64: Move simplifier visitors to .cc files. Test: Rely on TreeHugger. Change-Id: Ib2cad20a4d6252812aaf6fa09a576bdfca423b70
0ebe0d83138bba1996e9c8007969b5381d972b32	21-Sep-2017	Vladimir Marko <vmarko@google.com>	ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
bc5460b850a0fa2d8dcf6c8d36b0eb86f8fe46a8	20-Jul-2017	Lena Djokic <Lena.Djokic@imgtec.com>	MIPS: Support MultiplyAccumulate for SIMD. Moved support for multiply accumulate from arm64-specific to general instruction simplification. Also extended 550-checker-multiply-accumulate test. Test: test-art-host, test-art-target Change-Id: If113f0f0d5cb48e8a76273c919cfa2f49fce667d
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0	27-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Share address computation across SIMD LDRs/STRs. For array accesses the element address has the following structure: Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT Taking into account ARM64 LDR/STR addressing modes address part (CONST_OFFSET + index << ELEM_SHIFT) can be shared across array access with the same data type and index. For example, for the following loop 5 accesses can share address computation: void foo(int[] a, int[] b, int[] c) { for (i...) { a[i] = a[i] + 5; b[i] = b[i] + c[i]; } } Test: test-art-host, test-art-target Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
f34dd206d0073fb3949be872224420a8488f551f	10-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Support MultiplyAccumulate for SIMD. Test: test-art-host, test-art-target. Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
74234daabb28a4b9c804bf8bf908e7334bd4d400	13-Jan-2017	Anton Kirilov <anton.kirilov@linaro.org>	ARM: Merge data-processing instructions and shifts/(un)signed extensions This commit mirrors the work that has already been done for ARM64. Test: m test-art-target-run-test-551-checker-shifter-operand Change-Id: Iec8c1563b035f40f0e18dcffde28d91dc21922f8
fdaf0f45510374d3a122fdc85d68793e2431175e	13-Oct-2016	Vladimir Marko <vmarko@google.com>	Change string compression encoding. Encode the string compression flag as the least significant bit of the "count" field, with 0 meaning compressed and 1 meaning uncompressed. The main vdex file is a tiny bit larger (+28B for prebuilt boot images, +32 for on-device built images) and the oat file sizes change. Measured on Nexus 9, AOSP ToT, these changes are insignificant when string compression is disabled (-200B for the 32-bit boot.oat for prebuilt boot image, -4KiB when built on the device attributable to rounding, -16B for 64-bit boot.oat for prebuilt boot image, no change when built on device) but with string compression enabled we get significant differences: prebuilt multi-part boot image: - 32-bit boot.oat: -28KiB - 64-bit boot.oat: -24KiB on-device built single boot image: - 32-bit boot.oat: -32KiB - 64-bit boot.oat: -28KiB The boot image oat file overhead for string compression: prebuilt multi-part boot image: - 32-bit boot.oat: before: ~80KiB after: ~52KiB - 64-bit boot.oat: before: ~116KiB after: ~92KiB on-device built single boot image: - 32-bit boot.oat: before: 92KiB after: 60KiB - 64-bit boot.oat: before: 116KiB after: 92KiB The differences in the SplitStringBenchmark seem to be lost in the noise. Test: Run ART test suite on host and Nexus 9 with Optimizing. Test: Run ART test suite on host and Nexus 9 with interpreter. Test: All of the above with string compression enabled. Bug: 31040547 Change-Id: I7570c2b700f1a31004a2d3c18b1cc30046d35a74
0576575d075e97a227010b4adf74ad5c8a920bde	10-Sep-2016	jessicahandojo <jessicahandojo@google.com>	String Compression for ARM and ARM64 Changes on intrinsics and Code Generation on ARM and ARM64 for string compression feature. Currently the feature is off. The size of boot.oat and boot.art for ARM before and after the changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.60% and boot.art decreased by 9.38%. Meanwhile for ARM64, size of boot.oat and boot.art before and after changes (feature OFF) are still. When the feature ON, boot.oat increased by 0.48% and boot.art decreased by 6.58%. Turn feature on: runtime/mirror/string.h (kUseStringCompression = true) runtime/asm_support.h (STRING_COMPRESSION_FEATURE 1) Test: m -j31 test-art-target All tests passed both when the mirror::kUseStringCompression is ON and OFF. Bug: 31040547 Change-Id: I24e86b99391df33ba27df747779b648c5a820649
328429ff48d06e2cad4ebdd3568ab06de916a10a	06-Jul-2016	Artem Serov <artem.serov@linaro.org>	ARM: Port instr simplification of array accesses. After changing the addressing mode for array accesses (in https://android-review.googlesource.com/248406) the 'add' instruction that calculates the base address for the array can be shared across accesses to the same array. Before https://android-review.googlesource.com/248406: add IP, r[Array], r[Index0], LSL #2 ldr r0, [IP, #12] add IP, r[Array], r[Index1], LSL #2 ldr r0, [IP, #12] Before this CL: add IP. r[Array], #12 ldr r0, [IP, r[Index0], LSL #2] add IP. r[Array], #12 ldr r0, [IP, r[Index1], LSL #2] After this CL: add IP. r[Array], #12 ldr r0, [IP, r[Index0], LSL #2] ldr r0, [IP, r[Index1], LSL #2] Link to the original optimization: https://android-review.googlesource.com/#/c/127310/ Test: Run ART test suite on Nexus 6. Change-Id: Iee26f9a0a7ca46abb90e3f60d19d22dc8dee4d8f
87f3fcbd0db352157fc59148e94647ef21b73bce	28-Apr-2016	Vladimir Marko <vmarko@google.com>	Replace String.charAt() with HIR. Replace String.charAt() with HArrayLength, HBoundsCheck and HArrayGet. This allows GVN on the HArrayLength and BCE on the HBoundsCheck as well as using the infrastructure for HArrayGet, i.e. better handling of constant indexes than the old intrinsic and using the HArm64IntermediateAddress. Bug: 28330359 Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede	29-Mar-2016	Vladimir Marko <vmarko@google.com>	Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Bug: 27856014 (cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c) Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
46817b876ab00d6b78905b80ed12b4344c522b6c	29-Mar-2016	Vladimir Marko <vmarko@google.com>	Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
7fc6350f6f1ab04b52b9cd7542e0790528296cbe	09-Feb-2016	Artem Serov <artem.serov@linaro.org>	Integrate BitwiseNegated into shared framework. Share implementation between arm and arm64. Change-Id: I0dd12e772cb23b4c181fd0b1e2a447470b1d8702
9ff0d205fd60cba6753a91f613b198ca2d67f04d	11-Jan-2016	Kevin Brodsky <kevin.brodsky@linaro.org>	Optimizing: ARM64 negated bitwise operations simplification Use negated instructions on ARM64 to replace [bitwise operation + not] patterns, that is: a & ~b (BIC) a \| ~b (ORN) a ^ ~b (EON) The simplification only happens if the Not is only used by the bitwise operation. It does not happen if both inputs are Not's (this should be handled by a generic simplification applying De Morgan's laws). Change-Id: I0e112b23fd8b8e10f09bfeff5994508a8ff96e9c
4a0dad67867f389e01a5a6c0fe381d210f687c0d	25-Jan-2016	Artem Udovichenko <artem.u@samsung.com>	Revert "Revert "ARM/ARM64: Extend support of instruction combining."" This reverts commit 6b5afdd144d2bb3bf994240797834b5666b2cf98. Change-Id: Ic27a10f02e21109503edd64e6d73d1bb0c6a8ac6
6b5afdd144d2bb3bf994240797834b5666b2cf98	22-Jan-2016	Nicolas Geoffray <ngeoffray@google.com>	Revert "ARM/ARM64: Extend support of instruction combining." The test fails its checker parts. This reverts commit debeb98aaa8950caf1a19df490f2ac9bf563075b. Change-Id: I49929e15950c7814da6c411ecd2b640d12de80df
debeb98aaa8950caf1a19df490f2ac9bf563075b	11-Dec-2015	Ilmir Usmanov <i.usmanov@samsung.com>	ARM/ARM64: Extend support of instruction combining. Combine multiply instructions in the following way: ARM64: MUL/NEG -> MNEG ARM32 (32-bit integers only): MUL/ADD -> MLA MUL/SUB -> MLS Change-Id: If20f2d8fb060145ab6fbceeb5a8f1a3d02e0ecdb
cd3d0fb5a4c113cfdb610454d133762a2ab0e6de	15-Jan-2016	Roland Levillain <rpl@google.com>	Do not use HArm64IntermediateAddress with read barriers. This ARM64 instruction simplification does not yet work correctly with the read barrier compiler instrumentation. Bug: 26601270 Bug: 12687968 Change-Id: I0c3c5d0043ebd936e00984740efbae8b3025c7ca
295abc1a3aec98868544dfd4e0eeab797c3d60c2	31-Dec-2015	David Brazdil <dbrazdil@google.com>	ART: Set RTI of HArm64IntermediateAddress Change-Id: I2145bc249cc940d7b133fd6cbbd133cc62fee187
dce90b9198d523488b8f9a04dfb3834311ff3554	16-Dec-2015	Nicolas Geoffray <ngeoffray@google.com>	Revert "ART: Set RTI of Arm64IntermediateAddress" This reverts commit e36ae9435da21542891ceeebb3328f5066c8301e. Change-Id: If675b02db04bee78cc95da4ed58e545da5085da1
e36ae9435da21542891ceeebb3328f5066c8301e	14-Dec-2015	David Brazdil <dbrazdil@google.com>	ART: Set RTI of Arm64IntermediateAddress Fixes the arm64 build after I7a3aee1ff66c82d64b4846611c547af17e91d260. Change-Id: Ic2c72df59e0ddbdf2edc8519a6954d078a5ef596
8626b741716390a0119ffeb88b5b9fcf08e13010	25-Nov-2015	Alexandre Rames <alexandre.rames@linaro.org>	ARM64: Use the shifter operands. This introduces architecture-specific instruction simplification. On ARM64 we try to merge shifts and sign-extension operations into arithmetic and logical instructions. For example for the Java code int res = a + (b << 5); we would generate lsl w3, w2, #5 add w0, w1, w3 and we now generate add w0, w1, w2, lsl #5 Change-Id: Ic03bdff44a1c12e21ddff1b0513bd32a730742b7
418318f4d50e0cfc2d54330d7623ee030d4d727d	20-Nov-2015	Alexandre Rames <alexandre.rames@linaro.org>	ARM64: Add support for multiply-accumulate. Change-Id: I88dc313df520480f3fd16bbabda27f9435d25368
e6dbf48d7a549e58a3d798bbbdc391e4d091b432	19-Oct-2015	Alexandre Rames <alexandre.rames@linaro.org>	ARM64: Instruction simplification for array accesses. HArrayGet and HArraySet with variable indexes generate two instructions on arm64, like add temp, obj, #data_offset ldr out, [temp, index LSL #shift_amount] When we have multiple accesses to the same array, the initial `add` instruction is redundant. This patch introduces the first instruction simplification in the arm64-specific instruction simplification pass. It splits HArrayGet and HArraySet using the new arm64-specific IR HIntermediateAddress. After that we run GVN again to squash the multiple occurrences of HIntermediateAddress. Change-Id: I2e3d12fbb07fed07b2cb2f3f47f99f5a032f8312
44b9cf937836bb33139123e15ca8b586b5853268	19-Aug-2015	Alexandre Rames <alexandre.rames@linaro.org>	Put in place the ARM64 instruction simplification framework. This commit introduces and runs the empty InstructionSimplifierArm64 pass. Further commits will introduce arm64-specific transformations in that pass. Change-Id: I458f8a2b15470297b87fc1f7ff85bd52155d93ef