History log of /art/compiler/optimizing/parallel_move_resolver.h
Revision Date Author Comments
225b6464a58ebe11c156144653f11a1c6607f4eb 28-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in code generators.

And completely remove the deprecated GrowableArray.

Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.

Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.

Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.

Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
41b175aba41c9365a1c53b8a1afbd17129c87c14 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
80afd02024d20e60b197d3adfbb43cc303cf29e0 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192
Change-Id: I704884dab15b41ecf7a1c47d397ab1c3fc7ee0f7
ad4450e5c3ffaa9566216cc6fafbf5c11186c467 17-Apr-2015 Zheng Xu <zheng.xu@arm.com> Opt compiler: Implement parallel move resolver without using swap.

The algorithm of ParallelMoveResolverNoSwap() is almost the same with
ParallelMoveResolverWithSwap(), except the way we resolve the circular
dependency. NoSwap() uses additional scratch register to resolve the
circular dependency. For example, (0->1) (1->2) (2->0) will be performed
as (2->scratch) (1->2) (0->1) (scratch->0).

On architectures without swap register support, NoSwap() can reduce the
number of moves from 3x(N-1) to (N+1) when there is circular dependency
with N moves.

And also, NoSwap() algorithm does not depend on architecture register
layout information, which means it can support register pairs on arm32
and X/W, D/S registers on arm64 without additional modification.

Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
e14590bdfed24df30e6b7545fc819ba03ff8bba1 15-Apr-2015 Guillaume Sanchez <guillaumesa@google.com> Revert "[optimizing] Improve x86 parallel moves/swaps"

This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933.

This commit introduces a performance regression on CaffeineLogic of 30%.

Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
9021825d1e73998b99c81e89c73796f6f2845471 15-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Type MoveOperands.

The ParallelMoveResolver implementation needs to know if a move
is for 64bits or not, to handle swaps correctly.

Bug found, and test case courtesy of Serguei I. Katkov.

Change-Id: I9a0917a1cfed398c07e57ad6251aea8c9b0b8506
a5c19ce8d200d68a528f2ce0ebff989106c4a933 01-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Improve x86 parallel moves/swaps

Add a new constructor to ScratchRegisterScope that will supply a
register if there is a free one, but not spill to force one. Use this
to generated alternate code that doesn't use a temporary, as the
spill/restore of a register generates extra instructions that aren't
necessary on x86.

Here is the benefit for a 32 bit memory-to-memory exchange with no
free registers:
< 50 push eax
< 53 push ebx
< 8B44244C mov eax, [esp + 76]
< 8B5C246C mov ebx, [esp + 108]
< 8944246C mov [esp + 108], eax
< 895C244C mov [esp + 76], ebx
< 5B pop ebx
< 58 pop eax
---
> FF742444 push [esp + 68]
> FF742468 push [esp + 104]
> 8F44244C pop [esp + 72]
> 8F442468 pop [esp + 100]

Avoid using xchg instruction, as it is slow on smaller processors.

Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
f7a0c4e421b5edaad5b7a15bfff687da28d0b287 10-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Improve ParallelMoveResolver to work with pairs.

Change-Id: Ie2a540ffdb78f7f15d69c16a08ca2d3e794f65b9
48c310c431b110f6ab54907da20c4fa39a8f76b8 14-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Remove constant moves after emitting them in parallel resolver.

This fixes the case where a constant move requires a scratch
register. Note that there is no backend that needs this for now,
but X86 might with the move to hard float.

Change-Id: I37f6b8961b48f2cf6fbc0cd281e70d58466d018e
0279ebb3efd653e6bb255470c99d26949c7bcd95 09-Oct-2014 Ian Rogers <irogers@google.com> Tidy ELF builder.

Don't do "if (ptr)". Use const. Use DISALLOW_COPY_AND_ASSIGN. Avoid public
member variables.
Move ValueObject to base and use in ELF builder.
Tidy VectorOutputStream to not use non-const reference arguments.

Change-Id: I2c727c3fc61769c3726de7cfb68b2d6eb4477e53
e27f31a81636ad74bd3376ee39cf215941b85c0e 12-Jun-2014 Nicolas Geoffray <ngeoffray@google.com> Enable the register allocator on ARM.

- Also fixes a few bugs/wrong assumptions in code not hit by x86.
- We need to differentiate between moves due to connecting siblings within
a block, and moves due to control flow resolution.

Change-Id: Idd05cf138a71c8f36f5531c473de613c0166fe38
86dbb9a12119273039ce272b41c809fa548b37b6 04-Jun-2014 Nicolas Geoffray <ngeoffray@google.com> Final CL to enable register allocation on x86.

This CL implements:
1) Resolution after allocation: connecting the locations
allocated to an interval within a block and between blocks.
2) Handling of fixed registers: some instructions require
inputs/output to be at a specific location, and the allocator
needs to deal with them in a special way.
3) ParallelMoveResolver::EmitNativeCode for x86.

Change-Id: I0da6bd7eb66877987148b87c3be6a983b4e3f858
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
aa037b52a2eef463dab7b3a7e3c7cb441d28533a 23-May-2014 Nicolas Geoffray <ngeoffray@google.com> Add virtual destructor to please one of our compilers.

Change-Id: I931d130caa75ab90b677e14f1a2d0c438c43ed4f
4e3d23aa1523718ea1fdf3a32516d2f9d81e84fe 22-May-2014 Nicolas Geoffray <ngeoffray@google.com> Import Dart's parallel move resolver.

And write a few tests while at it.

A parallel move resolver will be needed for performing multiple moves
that are conceptually parallel, for example moves at a block
exit that branches to a block with phi nodes.

Change-Id: Ib95b247b4fc3f2c2fcab3b8c8d032abbd6104cd7