History log of /art/compiler/optimizing/code_generator_vector_arm64.cc
Revision Date Author Comments
66c158ef6b2a16257f1590b3ace78848a7c2407b 31-Jan-2018 Aart Bik <ajcbik@google.com> Clean up signed/unsigned in vectorizer.

Rationale:
Currently we have some remaining ugliness around signed and unsigned
SIMD operations due to lack of kUint32 and kUint64 in the HIR. By
"softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE
operations can solely rely on the packed data types to distinguish
between signed and unsigned operations. Cleaner, and also allows for
some code removal in the current loop optimizer.

Bug: 72709770

Test: test-art-host test-art-target
Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
61b922847403ac0e74b6477114c81a28ac2e01a0 11-Oct-2017 Vladimir Marko <vmarko@google.com> ART: Introduce Uint8 loads in compiled code.

Some vectorization patterns are not recognized anymore.
This shall be fixed later.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: testrunner.py --target --optimizing on Nexus 5X
Test: Nexus 5X boots.
Bug: 23964345
Bug: 67935418
Change-Id: I587a328d4799529949c86fa8045c6df21e3a8617
e764d2e50c544c2cb98ee61a15d613161ac6bd17 05-Oct-2017 Vladimir Marko <vmarko@google.com> Use ScopedArenaAllocator for register allocation.

Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB
BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB
This is because all the memory previously used by Scheduler
is reused by the register allocator; the register allocator
has a higher peak usage of the ArenaStack.

And continue the "arena"->"allocator" renaming.

Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
ca6fff898afcb62491458ae8bcd428bfb3043da1 03-Oct-2017 Vladimir Marko <vmarko@google.com> ART: Use ScopedArenaAllocator for pass-local data.

Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.

This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
d5d2f2ce627aa0f6920d7ae05197abd1a396e035 26-Sep-2017 Vladimir Marko <vmarko@google.com> ART: Introduce Uint8 compiler data type.

This CL adds all the necessary codegen for the Uint8 type
but does not add code transformations that use that code.
Vectorization codegens are modified to use Uint8 as the
packed type when appropriate. The side effects are now
disconnected from the instruction's type after the graph has
been built to allow changing HArrayGet/H*FieldGet/HVecLoad
to use a type different from the underlying field or array.

Note: HArrayGet for String.charAt() is modified to have
no side effects whatsoever; Strings are immutable.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --target --optimizing on Nexus 6P
Test: Nexus 6P boots.
Bug: 23964345
Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
0ebe0d83138bba1996e9c8007969b5381d972b32 21-Sep-2017 Vladimir Marko <vmarko@google.com> ART: Introduce compiler data type.

Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.

Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
dbbac8f812a866b1b53f3007721f66038d208549 01-Sep-2017 Aart Bik <ajcbik@google.com> Implement Sum-of-Abs-Differences idiom recognition.

Rationale:
Currently just on ARM64 (x86 lacks proper support),
using the SAD idiom yields great speedup on loops
that compute the sum-of-abs-difference operation.
Also includes some refinements around type conversions.

Speedup ExoPlayerAudio (golem run):
1.3x on ARM64
1.1x on x86

Test: test-art-host test-art-target

Bug: 64091002

Change-Id: Ia2b711d2bc23609a2ed50493dfe6719eedfe0130
0148de41a5c77c2f61252c219f1a02413c7c4a32 05-Sep-2017 Aart Bik <ajcbik@google.com> Basic SIMD reduction support.

Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.

This is a revert of Icb5d6c805516db0a1d911c3ede9a246ccef89a22
and thus a revert^2 of I2454778dd0ef1da915c178c7274e1cf33e271d0f
and thus a revert^3 of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
and thus a revert^4 of I7880c135aee3ed0a39da9ae5b468cbf80e613766

PS1-2 shows what needed to change

Test: test-art-host test-art-target

Bug: 64091002
Change-Id: I647889e0da0959ca405b70081b79c7d3c9bcb2e9
982334cef17d47ef2477d88a97203a9587a4b86f 02-Sep-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "Basic SIMD reduction support."

Fails 530-checker-lse on arm64.

Bug: 64091002, 65212948

This reverts commit cfa59b49cde265dc5329a7e6956445f9f7a75f15.

Change-Id: Icb5d6c805516db0a1d911c3ede9a246ccef89a22
cfa59b49cde265dc5329a7e6956445f9f7a75f15 31-Aug-2017 Aart Bik <ajcbik@google.com> Basic SIMD reduction support.

Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.

This is a revert^2 of I7880c135aee3ed0a39da9ae5b468cbf80e613766
and thus a revert of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc

PS1-2 shows what needed to change, with regression tests

Test: test-art-host test-art-target

Bug: 64091002, 65212948
Change-Id: I2454778dd0ef1da915c178c7274e1cf33e271d0f
a57b4ee7b15ce6abfb5fa88c8dc8a516fe40e0d9 30-Aug-2017 Aart Bik <ajcbik@google.com> Revert "Basic SIMD reduction support."

This reverts commit 9879d0eac8fe2aae19ca6a4a2a83222d6383afc2.

Getting these type check failures in some builds. Need time to look at this better, so reverting for now :-(


dex2oatd F 08-30 21:14:29 210122 226218
code_generator.cc:115] Check failed: CheckType(instruction->GetType(), locations->InAt(0)) PrimDouble C

Change-Id: I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
9879d0eac8fe2aae19ca6a4a2a83222d6383afc2 15-Aug-2017 Aart Bik <ajcbik@google.com> Basic SIMD reduction support.

Rationale:
Enables vectorization of x += .... for very basic (simple, same-type)
constructs. Paves the way for more complex (narrower and/or mixed-type)
constructs, which will be handled by the next CL.

Test: test-art-host test-art-target

Bug: 64091002

Change-Id: I7880c135aee3ed0a39da9ae5b468cbf80e613766
895f92218f705ff8ad9c47b8be0c093130d9fbbc 05-Jul-2017 Andreas Gampe <agampe@google.com> ART: Fix up small header includes

Test: m
Change-Id: I6978d6eb4b95a6ee810e5a48ca6f5d6c590d4ce1
8dfe746dc969b61416a2906bea8c176427457efc 01-Jun-2017 Artem Serov <artem.serov@linaro.org> ARM64: Encode constants when it is possible.

Small optimization which improves HVecReplicateScalar by encoding
immediates directly into NEON instruction when possible instead of
generating constant in GPR and transferring it into NEON register.

Test: test-art-target, test-art-host.
Change-Id: I2113bbd98c0dc8433d2b7048921b9ed7c35ef1c5
c8e93c736c149ce41be073dd24324fb08afb9ae4 10-May-2017 Aart Bik <ajcbik@ajcbik2.mtv.corp.google.com> Min/max SIMDization support.

Rationale:
The more vectorized, the better!

Test: test-art-target, test-art-host

Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0 27-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Share address computation across SIMD LDRs/STRs.

For array accesses the element address has the following structure:
Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT

Taking into account ARM64 LDR/STR addressing modes address part
(CONST_OFFSET + index << ELEM_SHIFT) can be shared across array
access with the same data type and index.

For example, for the following loop 5 accesses can share address
computation:

void foo(int[] a, int[] b, int[] c) {
for (i...) {
a[i] = a[i] + 5;
b[i] = b[i] + c[i];
}
}

Test: test-art-host, test-art-target

Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
472821b210a7fc7a4d2e3d45762c7b5b9628a35b 28-Apr-2017 Aart Bik <ajcbik@google.com> Enable string "array get" vectorization.

Rationale:
Like its scalar counterpart, the SIMD implementation of array get from
a string needs to deal with compressed and uncompressed cases.
Micro benchmarks shows 2x to 3x speedup for just copying data!

Test: test-art-target, test-art-host
Change-Id: I2fd714e50715b263123c215cd181f19194456d2b
0225b7712202d95ac7ba40ec96e95e14c4ce0895 19-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Improve SIMD LDR/STR.

Test: 640-checker-*-simd
Test: test-art-target, test-art-host

Change-Id: I2bcdef3f5cb7c0e7d1b3d02910fbf89ac694d89a
f34dd206d0073fb3949be872224420a8488f551f 10-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Support MultiplyAccumulate for SIMD.

Test: test-art-host, test-art-target.

Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
f3e61ee363fe7f82ef56704f06d753e2034a67dd 13-Apr-2017 Aart Bik <ajcbik@google.com> Implement halving add idiom (with checker tests).

Rationale:
First of several idioms that map to very efficient SIMD instructions.
Note that the is-zero-ext and is-sign-ext are general-purpose utilities
that will be widely used in the vectorizer to detect low precision
idioms, so expect that code to be shared with many CLs to come.

Test: test-art-host, test-art-target
Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
b31f91fd1811c9047591282dd003cf22b54938a1 05-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Support vectorization for double and long.

Test: test-art-host, test-art-target
Change-Id: I1d4db1763b64737766f9756e5d0f85c5736e3522
d4bccf1ece319a3a99e03ecbcbbf40bb82b9e331 03-Apr-2017 Artem Serov <artem.serov@linaro.org> ARM64: Support 128-bit registers for SIMD.

Test: test-art-host, test-art-target

Change-Id: Ifb931a99d34ea77602a0e0781040ed092de9faaa
6daebeba6ceab4e7dff5a3d65929eeac9a334004 03-Apr-2017 Aart Bik <ajcbik@google.com> Implemented ABS vectorization.

Rationale:
This CL adds the concept of vectorizing intrinsics
to the ART vectorizer. More can follow (MIN, MAX, etc).

Test: test-art-host, test-art-target (angler)
Change-Id: Ieed8aa83ec64c1250ac0578570249cce338b5d36
f8f5a16ed7bad1e18179e38453e59c96a944de10 07-Feb-2017 Aart Bik <ajcbik@google.com> ART vectorizer.

Rationale:
Make SIMD great again with a retargetable and easily extendable vectorizer.

Provides a full x86/x86_64 and a proof-of-concept ARM implementation. Sample
improvement (without any perf tuning yet) for Linpack on x86 is about 20% to 50%.

Test: test-art-host, test-art-target (angler)
Bug: 34083438, 30933338

Change-Id: Ifb77a0f25f690a87cd65bf3d5e9f6be7ea71d6c1