0ebe0d83138bba1996e9c8007969b5381d972b32 |
|
21-Sep-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
91d65e024846717fce3572106cffe9b957b8902c |
|
19-Jan-2016 |
Roland Levillain <rpl@google.com> |
Fix various typos in ART's comments and string literals. Change-Id: I85d628055b1a61647a77fef730c9631c234e22a2
|
c928591f5b2c544751bb3fb26dc614d3c2e67bef |
|
18-Dec-2015 |
Roland Levillain <rpl@google.com> |
ARM Baker's read barrier fast path implementation. Introduce an ARM fast path implementation in Optimizing for Baker's read barriers (for both heap reference loads and GC root loads). The marking phase of the read barrier is performed by a slow path, invoking the runtime entry point artReadBarrierMark. Other read barrier algorithms continue to use the original slow path based implementation, which has been renamed as GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow. Bug: 12687968 Change-Id: Ie7ee85b1b4c0564148270cebdd3cbd4c3da51b3a
|
3e3e4a762c23c3de66436b30e9fc65f35dad344c |
|
17-Dec-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix braino in parallel move resolver. Reiterating over the moves needs to set i to -1, not 0. bug:26241132 Change-Id: Iaae7eac5b421b0ee1b1ce89577c8b951b2d4dae8
|
ec7802a102d49ab5c17495118d4fe0bcc7287beb |
|
01-Oct-2015 |
Vladimir Marko <vmarko@google.com> |
Add DCHECKs to ArenaVector and ScopedArenaVector. Implement dchecked_vector<> template that DCHECK()s element access and insert()/emplace()/erase() positions. Change the ArenaVector<> and ScopedArenaVector<> aliases to use the new template instead of std::vector<>. Remove DCHECK()s that have now become unnecessary from the Optimizing compiler. Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
|
225b6464a58ebe11c156144653f11a1c6607f4eb |
|
28-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag arena allocations in code generators. And completely remove the deprecated GrowableArray. Replace GrowableArray with ArenaVector in code generators and related classes and tag arena allocations. Label arrays use direct allocations from ArenaAllocator because Label is non-copyable and non-movable and as such cannot be really held in a container. The GrowableArray never actually constructed them, instead relying on the zero-initialized storage from the arena allocator to be correct. We now actually construct the labels. Also avoid StackMapStream::ComputeDexRegisterMapSize() being passed null references, even though unused. Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
|
6e18dcb5d2c35c646f2c95cb776abb79799f52ae |
|
28-Jul-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
Parallel Move Resolver: Perform Stack/Stack first On machines like x86, by the time other parallel moves are done, there may be no free registers available to move/swap without having to save and restore a register. To avoid this, perform stack/stack first, while there is a good chance that there is a destination register that we can use. On the X86, this avoids a lot of push eax/pop eax code. Change-Id: I57076271b5672c931a93888ff23e30b2567f43b8 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
ad4450e5c3ffaa9566216cc6fafbf5c11186c467 |
|
17-Apr-2015 |
Zheng Xu <zheng.xu@arm.com> |
Opt compiler: Implement parallel move resolver without using swap. The algorithm of ParallelMoveResolverNoSwap() is almost the same with ParallelMoveResolverWithSwap(), except the way we resolve the circular dependency. NoSwap() uses additional scratch register to resolve the circular dependency. For example, (0->1) (1->2) (2->0) will be performed as (2->scratch) (1->2) (0->1) (scratch->0). On architectures without swap register support, NoSwap() can reduce the number of moves from 3x(N-1) to (N+1) when there is circular dependency with N moves. And also, NoSwap() algorithm does not depend on architecture register layout information, which means it can support register pairs on arm32 and X/W, D/S registers on arm64 without additional modification. Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
|
e14590bdfed24df30e6b7545fc819ba03ff8bba1 |
|
15-Apr-2015 |
Guillaume Sanchez <guillaumesa@google.com> |
Revert "[optimizing] Improve x86 parallel moves/swaps" This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933. This commit introduces a performance regression on CaffeineLogic of 30%. Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
|
9021825d1e73998b99c81e89c73796f6f2845471 |
|
15-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Type MoveOperands. The ParallelMoveResolver implementation needs to know if a move is for 64bits or not, to handle swaps correctly. Bug found, and test case courtesy of Serguei I. Katkov. Change-Id: I9a0917a1cfed398c07e57ad6251aea8c9b0b8506
|
a5c19ce8d200d68a528f2ce0ebff989106c4a933 |
|
01-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Improve x86 parallel moves/swaps Add a new constructor to ScratchRegisterScope that will supply a register if there is a free one, but not spill to force one. Use this to generated alternate code that doesn't use a temporary, as the spill/restore of a register generates extra instructions that aren't necessary on x86. Here is the benefit for a 32 bit memory-to-memory exchange with no free registers: < 50 push eax < 53 push ebx < 8B44244C mov eax, [esp + 76] < 8B5C246C mov ebx, [esp + 108] < 8944246C mov [esp + 108], eax < 895C244C mov [esp + 76], ebx < 5B pop ebx < 58 pop eax --- > FF742444 push [esp + 68] > FF742468 push [esp + 104] > 8F44244C pop [esp + 72] > 8F442468 pop [esp + 100] Avoid using xchg instruction, as it is slow on smaller processors. Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
a2d15b5e486021bef330b70c21e99557cb116ee5 |
|
31-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix wrong assumptions about ParallelMove. Registers involved in single and double operations can drag stack locations as well, so it is possible to update a single stack location with a slot from a double stack location. bug:19999189 Change-Id: Ibeec7d6f1b3126c4ae226fca56e84dccf798d367
|
f7a0c4e421b5edaad5b7a15bfff687da28d0b287 |
|
10-Feb-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Improve ParallelMoveResolver to work with pairs. Change-Id: Ie2a540ffdb78f7f15d69c16a08ca2d3e794f65b9
|
42d1f5f006c8bdbcbf855c53036cd50f9c69753e |
|
16-Jan-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Do not use register pair in a parallel move. The ParallelMoveResolver does not work with pairs. Instead, decompose the pair into two individual moves. Change-Id: Ie9d3f0b078cef8dc20640c98b20bb20cc4971a7f
|
48c310c431b110f6ab54907da20c4fa39a8f76b8 |
|
14-Jan-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Remove constant moves after emitting them in parallel resolver. This fixes the case where a constant move requires a scratch register. Note that there is no backend that needs this for now, but X86 might with the move to hard float. Change-Id: I37f6b8961b48f2cf6fbc0cd281e70d58466d018e
|
277ccbd200ea43590dfc06a93ae184a765327ad0 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: More warnings Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general, and -Wunused-but-set-parameter for GCC builds. Change-Id: I81bbdd762213444673c65d85edae594a523836e5
|
56b9ee6fe1d6880c5fca0e7feb28b25a1ded2e2f |
|
09-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Stop converting from Location to ManagedRegister. Now the source of truth is the Location object that knows which register (core, pair, fpu) it needs to refer to. Change-Id: I62401343d7479ecfb24b5ed161ec7829cda5a0b1
|
e27f31a81636ad74bd3376ee39cf215941b85c0e |
|
12-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Enable the register allocator on ARM. - Also fixes a few bugs/wrong assumptions in code not hit by x86. - We need to differentiate between moves due to connecting siblings within a block, and moves due to control flow resolution. Change-Id: Idd05cf138a71c8f36f5531c473de613c0166fe38
|
86dbb9a12119273039ce272b41c809fa548b37b6 |
|
04-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Final CL to enable register allocation on x86. This CL implements: 1) Resolution after allocation: connecting the locations allocated to an interval within a block and between blocks. 2) Handling of fixed registers: some instructions require inputs/output to be at a specific location, and the allocator needs to deal with them in a special way. 3) ParallelMoveResolver::EmitNativeCode for x86. Change-Id: I0da6bd7eb66877987148b87c3be6a983b4e3f858
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
4e3d23aa1523718ea1fdf3a32516d2f9d81e84fe |
|
22-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Import Dart's parallel move resolver. And write a few tests while at it. A parallel move resolver will be needed for performing multiple moves that are conceptually parallel, for example moves at a block exit that branches to a block with phi nodes. Change-Id: Ib95b247b4fc3f2c2fcab3b8c8d032abbd6104cd7
|