66c158ef6b2a16257f1590b3ace78848a7c2407b |
|
31-Jan-2018 |
Aart Bik <ajcbik@google.com> |
Clean up signed/unsigned in vectorizer. Rationale: Currently we have some remaining ugliness around signed and unsigned SIMD operations due to lack of kUint32 and kUint64 in the HIR. By "softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE operations can solely rely on the packed data types to distinguish between signed and unsigned operations. Cleaner, and also allows for some code removal in the current loop optimizer. Bug: 72709770 Test: test-art-host test-art-target Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
|
61b922847403ac0e74b6477114c81a28ac2e01a0 |
|
11-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce Uint8 loads in compiled code. Some vectorization patterns are not recognized anymore. This shall be fixed later. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Test: testrunner.py --target --optimizing on Nexus 5X Test: Nexus 5X boots. Bug: 23964345 Bug: 67935418 Change-Id: I587a328d4799529949c86fa8045c6df21e3a8617
|
e764d2e50c544c2cb98ee61a15d613161ac6bd17 |
|
05-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
Use ScopedArenaAllocator for register allocation. Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB This is because all the memory previously used by Scheduler is reused by the register allocator; the register allocator has a higher peak usage of the ArenaStack. And continue the "arena"->"allocator" renaming. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
|
ca6fff898afcb62491458ae8bcd428bfb3043da1 |
|
03-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Use ScopedArenaAllocator for pass-local data. Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
d5d2f2ce627aa0f6920d7ae05197abd1a396e035 |
|
26-Sep-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce Uint8 compiler data type. This CL adds all the necessary codegen for the Uint8 type but does not add code transformations that use that code. Vectorization codegens are modified to use Uint8 as the packed type when appropriate. The side effects are now disconnected from the instruction's type after the graph has been built to allow changing HArrayGet/H*FieldGet/HVecLoad to use a type different from the underlying field or array. Note: HArrayGet for String.charAt() is modified to have no side effects whatsoever; Strings are immutable. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing --jit Test: testrunner.py --target --optimizing on Nexus 6P Test: Nexus 6P boots. Bug: 23964345 Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
|
0ebe0d83138bba1996e9c8007969b5381d972b32 |
|
21-Sep-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
dbbac8f812a866b1b53f3007721f66038d208549 |
|
01-Sep-2017 |
Aart Bik <ajcbik@google.com> |
Implement Sum-of-Abs-Differences idiom recognition. Rationale: Currently just on ARM64 (x86 lacks proper support), using the SAD idiom yields great speedup on loops that compute the sum-of-abs-difference operation. Also includes some refinements around type conversions. Speedup ExoPlayerAudio (golem run): 1.3x on ARM64 1.1x on x86 Test: test-art-host test-art-target Bug: 64091002 Change-Id: Ia2b711d2bc23609a2ed50493dfe6719eedfe0130
|
0148de41a5c77c2f61252c219f1a02413c7c4a32 |
|
05-Sep-2017 |
Aart Bik <ajcbik@google.com> |
Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert of Icb5d6c805516db0a1d911c3ede9a246ccef89a22 and thus a revert^2 of I2454778dd0ef1da915c178c7274e1cf33e271d0f and thus a revert^3 of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc and thus a revert^4 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 PS1-2 shows what needed to change Test: test-art-host test-art-target Bug: 64091002 Change-Id: I647889e0da0959ca405b70081b79c7d3c9bcb2e9
|
982334cef17d47ef2477d88a97203a9587a4b86f |
|
02-Sep-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Basic SIMD reduction support." Fails 530-checker-lse on arm64. Bug: 64091002, 65212948 This reverts commit cfa59b49cde265dc5329a7e6956445f9f7a75f15. Change-Id: Icb5d6c805516db0a1d911c3ede9a246ccef89a22
|
cfa59b49cde265dc5329a7e6956445f9f7a75f15 |
|
31-Aug-2017 |
Aart Bik <ajcbik@google.com> |
Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert^2 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 and thus a revert of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc PS1-2 shows what needed to change, with regression tests Test: test-art-host test-art-target Bug: 64091002, 65212948 Change-Id: I2454778dd0ef1da915c178c7274e1cf33e271d0f
|
a57b4ee7b15ce6abfb5fa88c8dc8a516fe40e0d9 |
|
30-Aug-2017 |
Aart Bik <ajcbik@google.com> |
Revert "Basic SIMD reduction support." This reverts commit 9879d0eac8fe2aae19ca6a4a2a83222d6383afc2. Getting these type check failures in some builds. Need time to look at this better, so reverting for now :-( dex2oatd F 08-30 21:14:29 210122 226218 code_generator.cc:115] Check failed: CheckType(instruction->GetType(), locations->InAt(0)) PrimDouble C Change-Id: I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
|
9879d0eac8fe2aae19ca6a4a2a83222d6383afc2 |
|
15-Aug-2017 |
Aart Bik <ajcbik@google.com> |
Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. Test: test-art-host test-art-target Bug: 64091002 Change-Id: I7880c135aee3ed0a39da9ae5b468cbf80e613766
|
895f92218f705ff8ad9c47b8be0c093130d9fbbc |
|
05-Jul-2017 |
Andreas Gampe <agampe@google.com> |
ART: Fix up small header includes Test: m Change-Id: I6978d6eb4b95a6ee810e5a48ca6f5d6c590d4ce1
|
8dfe746dc969b61416a2906bea8c176427457efc |
|
01-Jun-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Encode constants when it is possible. Small optimization which improves HVecReplicateScalar by encoding immediates directly into NEON instruction when possible instead of generating constant in GPR and transferring it into NEON register. Test: test-art-target, test-art-host. Change-Id: I2113bbd98c0dc8433d2b7048921b9ed7c35ef1c5
|
c8e93c736c149ce41be073dd24324fb08afb9ae4 |
|
10-May-2017 |
Aart Bik <ajcbik@ajcbik2.mtv.corp.google.com> |
Min/max SIMDization support. Rationale: The more vectorized, the better! Test: test-art-target, test-art-host Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
|
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0 |
|
27-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Share address computation across SIMD LDRs/STRs. For array accesses the element address has the following structure: Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT Taking into account ARM64 LDR/STR addressing modes address part (CONST_OFFSET + index << ELEM_SHIFT) can be shared across array access with the same data type and index. For example, for the following loop 5 accesses can share address computation: void foo(int[] a, int[] b, int[] c) { for (i...) { a[i] = a[i] + 5; b[i] = b[i] + c[i]; } } Test: test-art-host, test-art-target Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
|
472821b210a7fc7a4d2e3d45762c7b5b9628a35b |
|
28-Apr-2017 |
Aart Bik <ajcbik@google.com> |
Enable string "array get" vectorization. Rationale: Like its scalar counterpart, the SIMD implementation of array get from a string needs to deal with compressed and uncompressed cases. Micro benchmarks shows 2x to 3x speedup for just copying data! Test: test-art-target, test-art-host Change-Id: I2fd714e50715b263123c215cd181f19194456d2b
|
0225b7712202d95ac7ba40ec96e95e14c4ce0895 |
|
19-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Improve SIMD LDR/STR. Test: 640-checker-*-simd Test: test-art-target, test-art-host Change-Id: I2bcdef3f5cb7c0e7d1b3d02910fbf89ac694d89a
|
f34dd206d0073fb3949be872224420a8488f551f |
|
10-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Support MultiplyAccumulate for SIMD. Test: test-art-host, test-art-target. Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
|
f3e61ee363fe7f82ef56704f06d753e2034a67dd |
|
13-Apr-2017 |
Aart Bik <ajcbik@google.com> |
Implement halving add idiom (with checker tests). Rationale: First of several idioms that map to very efficient SIMD instructions. Note that the is-zero-ext and is-sign-ext are general-purpose utilities that will be widely used in the vectorizer to detect low precision idioms, so expect that code to be shared with many CLs to come. Test: test-art-host, test-art-target Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
|
b31f91fd1811c9047591282dd003cf22b54938a1 |
|
05-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Support vectorization for double and long. Test: test-art-host, test-art-target Change-Id: I1d4db1763b64737766f9756e5d0f85c5736e3522
|
d4bccf1ece319a3a99e03ecbcbbf40bb82b9e331 |
|
03-Apr-2017 |
Artem Serov <artem.serov@linaro.org> |
ARM64: Support 128-bit registers for SIMD. Test: test-art-host, test-art-target Change-Id: Ifb931a99d34ea77602a0e0781040ed092de9faaa
|
6daebeba6ceab4e7dff5a3d65929eeac9a334004 |
|
03-Apr-2017 |
Aart Bik <ajcbik@google.com> |
Implemented ABS vectorization. Rationale: This CL adds the concept of vectorizing intrinsics to the ART vectorizer. More can follow (MIN, MAX, etc). Test: test-art-host, test-art-target (angler) Change-Id: Ieed8aa83ec64c1250ac0578570249cce338b5d36
|
f8f5a16ed7bad1e18179e38453e59c96a944de10 |
|
07-Feb-2017 |
Aart Bik <ajcbik@google.com> |
ART vectorizer. Rationale: Make SIMD great again with a retargetable and easily extendable vectorizer. Provides a full x86/x86_64 and a proof-of-concept ARM implementation. Sample improvement (without any perf tuning yet) for Linpack on x86 is about 20% to 50%. Test: test-art-host, test-art-target (angler) Bug: 34083438, 30933338 Change-Id: Ifb77a0f25f690a87cd65bf3d5e9f6be7ea71d6c1
|