Cross Reference: /art/compiler/optimizing/code_generator_vector

History log of /art/compiler/optimizing/code_generator_vector_arm64.cc
Revision	Date	Author	Comments
66c158ef6b2a16257f1590b3ace78848a7c2407b	31-Jan-2018	Aart Bik <ajcbik@google.com>	Clean up signed/unsigned in vectorizer. Rationale: Currently we have some remaining ugliness around signed and unsigned SIMD operations due to lack of kUint32 and kUint64 in the HIR. By "softly" introducing these types, ABS/MIN/MAX/HALVING_ADD/SAD_ACCUMULATE operations can solely rely on the packed data types to distinguish between signed and unsigned operations. Cleaner, and also allows for some code removal in the current loop optimizer. Bug: 72709770 Test: test-art-host test-art-target Change-Id: I68e4cdfba325f622a7256adbe649735569cab2a3
61b922847403ac0e74b6477114c81a28ac2e01a0	11-Oct-2017	Vladimir Marko <vmarko@google.com>	ART: Introduce Uint8 loads in compiled code. Some vectorization patterns are not recognized anymore. This shall be fixed later. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Test: testrunner.py --target --optimizing on Nexus 5X Test: Nexus 5X boots. Bug: 23964345 Bug: 67935418 Change-Id: I587a328d4799529949c86fa8045c6df21e3a8617
e764d2e50c544c2cb98ee61a15d613161ac6bd17	05-Oct-2017	Vladimir Marko <vmarko@google.com>	Use ScopedArenaAllocator for register allocation. Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB This is because all the memory previously used by Scheduler is reused by the register allocator; the register allocator has a higher peak usage of the ArenaStack. And continue the "arena"->"allocator" renaming. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
ca6fff898afcb62491458ae8bcd428bfb3043da1	03-Oct-2017	Vladimir Marko <vmarko@google.com>	ART: Use ScopedArenaAllocator for pass-local data. Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
d5d2f2ce627aa0f6920d7ae05197abd1a396e035	26-Sep-2017	Vladimir Marko <vmarko@google.com>	ART: Introduce Uint8 compiler data type. This CL adds all the necessary codegen for the Uint8 type but does not add code transformations that use that code. Vectorization codegens are modified to use Uint8 as the packed type when appropriate. The side effects are now disconnected from the instruction's type after the graph has been built to allow changing HArrayGet/H*FieldGet/HVecLoad to use a type different from the underlying field or array. Note: HArrayGet for String.charAt() is modified to have no side effects whatsoever; Strings are immutable. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing --jit Test: testrunner.py --target --optimizing on Nexus 6P Test: Nexus 6P boots. Bug: 23964345 Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
0ebe0d83138bba1996e9c8007969b5381d972b32	21-Sep-2017	Vladimir Marko <vmarko@google.com>	ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
dbbac8f812a866b1b53f3007721f66038d208549	01-Sep-2017	Aart Bik <ajcbik@google.com>	Implement Sum-of-Abs-Differences idiom recognition. Rationale: Currently just on ARM64 (x86 lacks proper support), using the SAD idiom yields great speedup on loops that compute the sum-of-abs-difference operation. Also includes some refinements around type conversions. Speedup ExoPlayerAudio (golem run): 1.3x on ARM64 1.1x on x86 Test: test-art-host test-art-target Bug: 64091002 Change-Id: Ia2b711d2bc23609a2ed50493dfe6719eedfe0130
0148de41a5c77c2f61252c219f1a02413c7c4a32	05-Sep-2017	Aart Bik <ajcbik@google.com>	Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert of Icb5d6c805516db0a1d911c3ede9a246ccef89a22 and thus a revert^2 of I2454778dd0ef1da915c178c7274e1cf33e271d0f and thus a revert^3 of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc and thus a revert^4 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 PS1-2 shows what needed to change Test: test-art-host test-art-target Bug: 64091002 Change-Id: I647889e0da0959ca405b70081b79c7d3c9bcb2e9
982334cef17d47ef2477d88a97203a9587a4b86f	02-Sep-2017	Nicolas Geoffray <ngeoffray@google.com>	Revert "Basic SIMD reduction support." Fails 530-checker-lse on arm64. Bug: 64091002, 65212948 This reverts commit cfa59b49cde265dc5329a7e6956445f9f7a75f15. Change-Id: Icb5d6c805516db0a1d911c3ede9a246ccef89a22
cfa59b49cde265dc5329a7e6956445f9f7a75f15	31-Aug-2017	Aart Bik <ajcbik@google.com>	Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. This is a revert^2 of I7880c135aee3ed0a39da9ae5b468cbf80e613766 and thus a revert of I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc PS1-2 shows what needed to change, with regression tests Test: test-art-host test-art-target Bug: 64091002, 65212948 Change-Id: I2454778dd0ef1da915c178c7274e1cf33e271d0f
a57b4ee7b15ce6abfb5fa88c8dc8a516fe40e0d9	30-Aug-2017	Aart Bik <ajcbik@google.com>	Revert "Basic SIMD reduction support." This reverts commit 9879d0eac8fe2aae19ca6a4a2a83222d6383afc2. Getting these type check failures in some builds. Need time to look at this better, so reverting for now :-( dex2oatd F 08-30 21:14:29 210122 226218 code_generator.cc:115] Check failed: CheckType(instruction->GetType(), locations->InAt(0)) PrimDouble C Change-Id: I1c1c87b6323e01442e8fbd94869ddc9e760ea1fc
9879d0eac8fe2aae19ca6a4a2a83222d6383afc2	15-Aug-2017	Aart Bik <ajcbik@google.com>	Basic SIMD reduction support. Rationale: Enables vectorization of x += .... for very basic (simple, same-type) constructs. Paves the way for more complex (narrower and/or mixed-type) constructs, which will be handled by the next CL. Test: test-art-host test-art-target Bug: 64091002 Change-Id: I7880c135aee3ed0a39da9ae5b468cbf80e613766
895f92218f705ff8ad9c47b8be0c093130d9fbbc	05-Jul-2017	Andreas Gampe <agampe@google.com>	ART: Fix up small header includes Test: m Change-Id: I6978d6eb4b95a6ee810e5a48ca6f5d6c590d4ce1
8dfe746dc969b61416a2906bea8c176427457efc	01-Jun-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Encode constants when it is possible. Small optimization which improves HVecReplicateScalar by encoding immediates directly into NEON instruction when possible instead of generating constant in GPR and transferring it into NEON register. Test: test-art-target, test-art-host. Change-Id: I2113bbd98c0dc8433d2b7048921b9ed7c35ef1c5
c8e93c736c149ce41be073dd24324fb08afb9ae4	10-May-2017	Aart Bik <ajcbik@ajcbik2.mtv.corp.google.com>	Min/max SIMDization support. Rationale: The more vectorized, the better! Test: test-art-target, test-art-host Change-Id: I758becca5beaa5b97fab2ab70f2e00cb53458703
e1811ed6b57a54dc8ebd327e4bd2c4422092a3a0	27-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Share address computation across SIMD LDRs/STRs. For array accesses the element address has the following structure: Address = CONST_OFFSET + base_addr + index << ELEM_SHIFT Taking into account ARM64 LDR/STR addressing modes address part (CONST_OFFSET + index << ELEM_SHIFT) can be shared across array access with the same data type and index. For example, for the following loop 5 accesses can share address computation: void foo(int[] a, int[] b, int[] c) { for (i...) { a[i] = a[i] + 5; b[i] = b[i] + c[i]; } } Test: test-art-host, test-art-target Change-Id: I46af3b4e4a55004336672cdba3296b7622d815ca
472821b210a7fc7a4d2e3d45762c7b5b9628a35b	28-Apr-2017	Aart Bik <ajcbik@google.com>	Enable string "array get" vectorization. Rationale: Like its scalar counterpart, the SIMD implementation of array get from a string needs to deal with compressed and uncompressed cases. Micro benchmarks shows 2x to 3x speedup for just copying data! Test: test-art-target, test-art-host Change-Id: I2fd714e50715b263123c215cd181f19194456d2b
0225b7712202d95ac7ba40ec96e95e14c4ce0895	19-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Improve SIMD LDR/STR. Test: 640-checker-*-simd Test: test-art-target, test-art-host Change-Id: I2bcdef3f5cb7c0e7d1b3d02910fbf89ac694d89a
f34dd206d0073fb3949be872224420a8488f551f	10-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Support MultiplyAccumulate for SIMD. Test: test-art-host, test-art-target. Change-Id: I06af8415e15352d09d176cae828163cbe99ae7a7
f3e61ee363fe7f82ef56704f06d753e2034a67dd	13-Apr-2017	Aart Bik <ajcbik@google.com>	Implement halving add idiom (with checker tests). Rationale: First of several idioms that map to very efficient SIMD instructions. Note that the is-zero-ext and is-sign-ext are general-purpose utilities that will be widely used in the vectorizer to detect low precision idioms, so expect that code to be shared with many CLs to come. Test: test-art-host, test-art-target Change-Id: If7dc2926c72a2e4b5cea15c44ef68cf5503e9be9
b31f91fd1811c9047591282dd003cf22b54938a1	05-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Support vectorization for double and long. Test: test-art-host, test-art-target Change-Id: I1d4db1763b64737766f9756e5d0f85c5736e3522
d4bccf1ece319a3a99e03ecbcbbf40bb82b9e331	03-Apr-2017	Artem Serov <artem.serov@linaro.org>	ARM64: Support 128-bit registers for SIMD. Test: test-art-host, test-art-target Change-Id: Ifb931a99d34ea77602a0e0781040ed092de9faaa
6daebeba6ceab4e7dff5a3d65929eeac9a334004	03-Apr-2017	Aart Bik <ajcbik@google.com>	Implemented ABS vectorization. Rationale: This CL adds the concept of vectorizing intrinsics to the ART vectorizer. More can follow (MIN, MAX, etc). Test: test-art-host, test-art-target (angler) Change-Id: Ieed8aa83ec64c1250ac0578570249cce338b5d36
f8f5a16ed7bad1e18179e38453e59c96a944de10	07-Feb-2017	Aart Bik <ajcbik@google.com>	ART vectorizer. Rationale: Make SIMD great again with a retargetable and easily extendable vectorizer. Provides a full x86/x86_64 and a proof-of-concept ARM implementation. Sample improvement (without any perf tuning yet) for Linpack on x86 is about 20% to 50%. Test: test-art-host, test-art-target (angler) Bug: 34083438, 30933338 Change-Id: Ifb77a0f25f690a87cd65bf3d5e9f6be7ea71d6c1