History log of /art/compiler/optimizing/instruction_simplifier_arm64.cc
Revision Date Author Comments
cd09e1f4f9902b82fa62cb2da984ea499e3b2d70 24-Nov-2017 Vladimir Marko <vmarko@google.com> Fix stats reporting over 100% methods compiled.

Add statistics for intrinsic and native stub compilation
and JIT failing to allocate memory for committing the
code. Clean up recording of compilation statistics.

New statistics when building aosp_taimen-userdebug boot
image with --dump-stats:
Attempted compilation of 94304 methods: 99.99% (94295) compiled.
OptStat#AttemptBytecodeCompilation: 89487
OptStat#AttemptIntrinsicCompilation: 160
OptStat#CompiledNativeStub: 4733
OptStat#CompiledIntrinsic: 84
OptStat#CompiledBytecode: 89478
...
where 94304=89487+4733+84 and 94295=89478+4733+84.

Test: testrunner.py -b --host --optimizing
Test: Manually inspect output of building boot image
with --dump-stats.
Bug: 69627511
Change-Id: I15eb2b062a96f09a7721948bcc77b83ee4f18efd
33bff25bcd7a02d35c54f63740eadb1a4833fc92 01-Nov-2017 Vladimir Marko <vmarko@google.com> ART: Make InstructionSet an enum class and add kLast.

Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
ca6fff898afcb62491458ae8bcd428bfb3043da1 03-Oct-2017 Vladimir Marko <vmarko@google.com> ART: Use ScopedArenaAllocator for pass-local data.

Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.

This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
0f689e773c49536208d40a2e23410deea4acc184 02-Oct-2017 Vladimir Marko <vmarko@google.com> ARM/ARM64: Move simplifier visitors to .cc files.

Test: Rely on TreeHugger.
Change-Id: Ib2cad20a4d6252812aaf6fa09a576bdfca423b70
0ebe0d83138bba1996e9c8007969b5381d972b32 21-Sep-2017 Vladimir Marko <vmarko@google.com> ART: Introduce compiler data type.

Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.

Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
bc5460b850a0fa2d8dcf6c8d36b0eb86f8fe46a8 20-Jul-2017 Lena Djokic <Lena.Djokic@imgtec.com> MIPS: Support MultiplyAccumulate for SIMD.

Moved support for multiply accumulate from arm64-specific to
general instruction simplification.
Also extended 550-checker-multiply-accumulate test.

Test: test-art-host, test-art-target

Change-Id: If113f0f0d5cb48e8a76273c919cfa2f49fce667d
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0 27-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Share address computation across SIMD LDRs/STRs.

For array accesses the element address has the following structure:
Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT

Taking into account ARM64 LDR/STR addressing modes address part
(CONST_OFFSET + index << ELEM_SHIFT) can be shared across array
access with the same data type and index.

For example, for the following loop 5 accesses can share address
computation:

void foo(int[] a, int[] b, int[] c) {
for (i...) {
a[i] = a[i] + 5;
b[i] = b[i] + c[i];
}
}

Test: test-art-host, test-art-target

Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
f34dd206d0073fb3949be872224420a8488f551f 10-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Support MultiplyAccumulate for SIMD.

Test: test-art-host, test-art-target.

Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
74234daabb28a4b9c804bf8bf908e7334bd4d400 13-Jan-2017 Anton Kirilov <anton.kirilov@linaro.org> ARM: Merge data-processing instructions and shifts/(un)signed extensions

This commit mirrors the work that has already been done for ARM64.

Test: m test-art-target-run-test-551-checker-shifter-operand
Change-Id: Iec8c1563b035f40f0e18dcffde28d91dc21922f8
fdaf0f45510374d3a122fdc85d68793e2431175e 13-Oct-2016 Vladimir Marko <vmarko@google.com> Change string compression encoding.

Encode the string compression flag as the least significant
bit of the "count" field, with 0 meaning compressed and 1
meaning uncompressed.

The main vdex file is a tiny bit larger (+28B for prebuilt
boot images, +32 for on-device built images) and the oat
file sizes change. Measured on Nexus 9, AOSP ToT, these
changes are insignificant when string compression is
disabled (-200B for the 32-bit boot*.oat for prebuilt boot
image, -4KiB when built on the device attributable to
rounding, -16B for 64-bit boot*.oat for prebuilt boot image,
no change when built on device) but with string compression
enabled we get significant differences:
prebuilt multi-part boot image:
- 32-bit boot*.oat: -28KiB
- 64-bit boot*.oat: -24KiB
on-device built single boot image:
- 32-bit boot.oat: -32KiB
- 64-bit boot.oat: -28KiB
The boot image oat file overhead for string compression:
prebuilt multi-part boot image:
- 32-bit boot*.oat: before: ~80KiB after: ~52KiB
- 64-bit boot*.oat: before: ~116KiB after: ~92KiB
on-device built single boot image:
- 32-bit boot.oat: before: 92KiB after: 60KiB
- 64-bit boot.oat: before: 116KiB after: 92KiB

The differences in the SplitStringBenchmark seem to be lost
in the noise.

Test: Run ART test suite on host and Nexus 9 with Optimizing.
Test: Run ART test suite on host and Nexus 9 with interpreter.
Test: All of the above with string compression enabled.
Bug: 31040547

Change-Id: I7570c2b700f1a31004a2d3c18b1cc30046d35a74
0576575d075e97a227010b4adf74ad5c8a920bde 10-Sep-2016 jessicahandojo <jessicahandojo@google.com> String Compression for ARM and ARM64

Changes on intrinsics and Code Generation on ARM and ARM64
for string compression feature. Currently the feature is off.

The size of boot.oat and boot.art for ARM before and after the
changes (feature OFF) are still. When the feature ON,
boot.oat increased by 0.60% and boot.art decreased by 9.38%.

Meanwhile for ARM64, size of boot.oat and boot.art before and
after changes (feature OFF) are still. When the feature ON,
boot.oat increased by 0.48% and boot.art decreased by 6.58%.

Turn feature on: runtime/mirror/string.h (kUseStringCompression = true)
runtime/asm_support.h (STRING_COMPRESSION_FEATURE 1)

Test: m -j31 test-art-target
All tests passed both when the mirror::kUseStringCompression
is ON and OFF.

Bug: 31040547
Change-Id: I24e86b99391df33ba27df747779b648c5a820649
328429ff48d06e2cad4ebdd3568ab06de916a10a 06-Jul-2016 Artem Serov <artem.serov@linaro.org> ARM: Port instr simplification of array accesses.

After changing the addressing mode for array accesses (in
https://android-review.googlesource.com/248406) the 'add'
instruction that calculates the base address for the array can be
shared across accesses to the same array.

Before https://android-review.googlesource.com/248406:
add IP, r[Array], r[Index0], LSL #2
ldr r0, [IP, #12]
add IP, r[Array], r[Index1], LSL #2
ldr r0, [IP, #12]

Before this CL:
add IP. r[Array], #12
ldr r0, [IP, r[Index0], LSL #2]
add IP. r[Array], #12
ldr r0, [IP, r[Index1], LSL #2]

After this CL:
add IP. r[Array], #12
ldr r0, [IP, r[Index0], LSL #2]
ldr r0, [IP, r[Index1], LSL #2]

Link to the original optimization:
https://android-review.googlesource.com/#/c/127310/

Test: Run ART test suite on Nexus 6.
Change-Id: Iee26f9a0a7ca46abb90e3f60d19d22dc8dee4d8f
87f3fcbd0db352157fc59148e94647ef21b73bce 28-Apr-2016 Vladimir Marko <vmarko@google.com> Replace String.charAt() with HIR.

Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.

Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede 29-Mar-2016 Vladimir Marko <vmarko@google.com> Use iterators "before" the use node in HUserRecord<>.

Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.

Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:

Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...

Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.

Bug: 28173563
Bug: 27856014

(cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c)

Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
46817b876ab00d6b78905b80ed12b4344c522b6c 29-Mar-2016 Vladimir Marko <vmarko@google.com> Use iterators "before" the use node in HUserRecord<>.

Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.

Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:

Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...

Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.

Bug: 28173563
Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
7fc6350f6f1ab04b52b9cd7542e0790528296cbe 09-Feb-2016 Artem Serov <artem.serov@linaro.org> Integrate BitwiseNegated into shared framework.

Share implementation between arm and arm64.

Change-Id: I0dd12e772cb23b4c181fd0b1e2a447470b1d8702
9ff0d205fd60cba6753a91f613b198ca2d67f04d 11-Jan-2016 Kevin Brodsky <kevin.brodsky@linaro.org> Optimizing: ARM64 negated bitwise operations simplification

Use negated instructions on ARM64 to replace [bitwise operation + not]
patterns, that is:
a & ~b (BIC)
a | ~b (ORN)
a ^ ~b (EON)

The simplification only happens if the Not is only used by the bitwise
operation. It does not happen if both inputs are Not's (this should be
handled by a generic simplification applying De Morgan's laws).

Change-Id: I0e112b23fd8b8e10f09bfeff5994508a8ff96e9c
4a0dad67867f389e01a5a6c0fe381d210f687c0d 25-Jan-2016 Artem Udovichenko <artem.u@samsung.com> Revert "Revert "ARM/ARM64: Extend support of instruction combining.""

This reverts commit 6b5afdd144d2bb3bf994240797834b5666b2cf98.

Change-Id: Ic27a10f02e21109503edd64e6d73d1bb0c6a8ac6
6b5afdd144d2bb3bf994240797834b5666b2cf98 22-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "ARM/ARM64: Extend support of instruction combining."

The test fails its checker parts.

This reverts commit debeb98aaa8950caf1a19df490f2ac9bf563075b.

Change-Id: I49929e15950c7814da6c411ecd2b640d12de80df
debeb98aaa8950caf1a19df490f2ac9bf563075b 11-Dec-2015 Ilmir Usmanov <i.usmanov@samsung.com> ARM/ARM64: Extend support of instruction combining.

Combine multiply instructions in the following way:
ARM64:
MUL/NEG -> MNEG
ARM32 (32-bit integers only):
MUL/ADD -> MLA
MUL/SUB -> MLS

Change-Id: If20f2d8fb060145ab6fbceeb5a8f1a3d02e0ecdb
cd3d0fb5a4c113cfdb610454d133762a2ab0e6de 15-Jan-2016 Roland Levillain <rpl@google.com> Do not use HArm64IntermediateAddress with read barriers.

This ARM64 instruction simplification does not yet work
correctly with the read barrier compiler instrumentation.

Bug: 26601270
Bug: 12687968
Change-Id: I0c3c5d0043ebd936e00984740efbae8b3025c7ca
295abc1a3aec98868544dfd4e0eeab797c3d60c2 31-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Set RTI of HArm64IntermediateAddress

Change-Id: I2145bc249cc940d7b133fd6cbbd133cc62fee187
dce90b9198d523488b8f9a04dfb3834311ff3554 16-Dec-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "ART: Set RTI of Arm64IntermediateAddress"

This reverts commit e36ae9435da21542891ceeebb3328f5066c8301e.

Change-Id: If675b02db04bee78cc95da4ed58e545da5085da1
e36ae9435da21542891ceeebb3328f5066c8301e 14-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Set RTI of Arm64IntermediateAddress

Fixes the arm64 build after I7a3aee1ff66c82d64b4846611c547af17e91d260.

Change-Id: Ic2c72df59e0ddbdf2edc8519a6954d078a5ef596
8626b741716390a0119ffeb88b5b9fcf08e13010 25-Nov-2015 Alexandre Rames <alexandre.rames@linaro.org> ARM64: Use the shifter operands.

This introduces architecture-specific instruction simplification.
On ARM64 we try to merge shifts and sign-extension operations into
arithmetic and logical instructions.

For example for the Java code

int res = a + (b << 5);

we would generate

lsl w3, w2, #5
add w0, w1, w3

and we now generate

add w0, w1, w2, lsl #5

Change-Id: Ic03bdff44a1c12e21ddff1b0513bd32a730742b7
418318f4d50e0cfc2d54330d7623ee030d4d727d 20-Nov-2015 Alexandre Rames <alexandre.rames@linaro.org> ARM64: Add support for multiply-accumulate.

Change-Id: I88dc313df520480f3fd16bbabda27f9435d25368
e6dbf48d7a549e58a3d798bbbdc391e4d091b432 19-Oct-2015 Alexandre Rames <alexandre.rames@linaro.org> ARM64: Instruction simplification for array accesses.

HArrayGet and HArraySet with variable indexes generate two
instructions on arm64, like

add temp, obj, #data_offset
ldr out, [temp, index LSL #shift_amount]

When we have multiple accesses to the same array, the initial `add`
instruction is redundant.

This patch introduces the first instruction simplification in the
arm64-specific instruction simplification pass. It splits HArrayGet
and HArraySet using the new arm64-specific IR HIntermediateAddress.
After that we run GVN again to squash the multiple occurrences of
HIntermediateAddress.

Change-Id: I2e3d12fbb07fed07b2cb2f3f47f99f5a032f8312
44b9cf937836bb33139123e15ca8b586b5853268 19-Aug-2015 Alexandre Rames <alexandre.rames@linaro.org> Put in place the ARM64 instruction simplification framework.

This commit introduces and runs the empty InstructionSimplifierArm64
pass. Further commits will introduce arm64-specific transformations in
that pass.

Change-Id: I458f8a2b15470297b87fc1f7ff85bd52155d93ef