History log of /art/compiler/dex/quick/arm/utility_arm.cc
Revision Date Author Comments
2cebb24bfc3247d3e9be138a3350106737455918 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Replace NULL with nullptr

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
1961b609bfefaedb71cee3651c4f931cc3e7393d 08-Apr-2015 Vladimir Marko <vmarko@google.com> Quick: PC-relative loads from dex cache arrays on x86.

Rewrite all PC-relative addressing on x86 and implement
PC-relative loads from dex cache arrays. Don't adjust the
base to point to the start of the method, let it point to
the anchor, i.e. the target of the "call +0" insn.

Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
cc23481b66fd1f2b459d82da4852073e32f033aa 07-Apr-2015 Vladimir Marko <vmarko@google.com> Promote pointer to dex cache arrays on arm.

Do the use-count analysis on temps (ArtMethod* and the new
PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph
isn't really supposed to know how the ArtMethod* is used by
the backend.

Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
0b9203e7996ee1856f620f95d95d8a273c43a3df 23-Jan-2015 Andreas Gampe <agampe@google.com> ART: Some Quick cleanup

Make several fields const in CompilationUnit. May benefit some Mir2Lir
code that repeats tests, and in general immutability is good.

Remove compiler_internals.h and refactor some other headers to reduce
overly broad imports (and thus forced recompiles on changes).

Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
e7227c628e6f22a823945cc76e554eb2a8b0d925 13-Jan-2015 Vladimir Marko <vmarko@google.com> Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd.

If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and
we need to record the safepoint for the ldrexd rather than
strexd. IGET_WIDE was simply missing the memory barrier.

Bug: 18993519

(cherry picked from commit ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c)

Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c 13-Jan-2015 Vladimir Marko <vmarko@google.com> Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd.

If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and
we need to record the safepoint for the ldrexd rather than
strexd. IGET_WIDE was simply missing the memory barrier.

Bug: 18993519
Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
aed3ad734c47fdccf179ff65971284a0d38583cd 03-Dec-2014 Vladimir Marko <vmarko@google.com> Quick: Use fewer insns for ARM LDR/STR with large offsets.

LDR with large offset is frequently used for reading from
DexCache arrays, for example for static and direct invokes.
STR with large offset is rarely used but it's updated for
consistency.

Change-Id: I75871416cecbfd7fe7de590922cea0376a2f4019
a29f698b1754ee0ea2f46b6f5900e0da840dff79 25-Nov-2014 Vladimir Marko <vmarko@google.com> Implement InexpensiveConstantInt(., opcode) for ARM.

Fix kThumb2{Add,Sub}RRI12 to be used for their full range.
Add ORN for completeness.

Change-Id: I49a51541fa9ea085d4674b9131d8dd94da5337f3
174636dad59068fc6e879b147ae02ac932f38c6f 26-Nov-2014 Vladimir Marko <vmarko@google.com> Quick: Use 16-bit conditional branch in Thumb2.

We were using the 32-bit version because the compilation
time impact of having to change the instruction length and
reassemble instructions when the target is out of range was
too high. However, the assembly phase has been rewritten
since making that decision and the compile time impact is
now insignificant, so we prefer to save space.

Change-Id: Ib90f90d3f4e0c4e310267af272e3b16611026bbe
d582fa4ea62083a7598dded5b82dc2198b3daac7 06-Nov-2014 Ian Rogers <irogers@google.com> Instruction set features for ARM64, MIPS and X86.

Also, refactor how feature strings are handled so they are additive or
subtractive.
Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler
rather than #ifdefs that wouldn't have worked in cross-compilation.
Add SIMD features for x86/x86-64 proposed in:
https://android-review.googlesource.com/#/c/112370/

Bug: 18056890

Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
b28c1c06236751aa5c9e64dcb68b3c940341e496 08-Nov-2014 Ian Rogers <irogers@google.com> Tidy RegStorage for X86.

Don't use global variables initialized in constructors to hold onto constant
values, instead use the TargetReg32 helper. Improve this helper with the use
of lookup tables. Elsewhere prefer to use constexpr values as they will have
less runtime cost.
Add an ostream operator to RegStorage for CHECK_EQ and use.

Change-Id: Ib8d092d46c10dac5909ecdff3cc1e18b7e9b1633
277ccbd200ea43590dfc06a93ae184a765327ad0 04-Nov-2014 Andreas Gampe <agampe@google.com> ART: More warnings

Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general,
and -Wunused-but-set-parameter for GCC builds.

Change-Id: I81bbdd762213444673c65d85edae594a523836e5
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5667fdbb6e441dee7534ade18b628ed396daf593 23-Oct-2014 Zheng Xu <zheng.xu@arm.com> ARM: Use hardfp calling convention between java to java call.

This patch default to use hardfp calling convention. Softfp can be enabled
by setting kArm32QuickCodeUseSoftFloat to true.

We get about -1 ~ +5% performance improvement with different benchmark
tests. Hopefully, we should be able to get more performance by address the left
TODOs, as some part of the code takes the original assumption which is not
optimal.

DONE:
1. Interpreter to quick code
2. Quick code to interpreter
3. Transition assembly and callee-saves
4. Trampoline(generic jni, resolution, invoke with access check and etc.)
5. Pass fp arg reg following aapcs(gpr and stack do not follow aapcs)
6. Quick helper assembly routines to handle ABI differences
7. Quick code method entry
8. Quick code method invocation
9. JNI compiler

TODO:
10. Rework ArgMap, FlushIn, GenDalvikArgs and affected common code.
11. Rework CallRuntimeHelperXXX().

Change-Id: I9965d8a007f4829f2560b63bcbbde271bdcf6ec2
677cd61ad05d993c4d3b22656675874f06d6aabc 15-Oct-2014 Ian Rogers <irogers@google.com> Make ART compile with GCC -O0 again.

Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on
architecture.
Add to instruction_set_test to warn when InstructionSetFeatures don't agree
with ones from system properties, AT_HWCAP and /proc/cpuinfo.
Clean-up class linker entry point logic to not return entry points but to
test whether the passed code is the particular entrypoint. This works around
image trampolines that replicate entrypoints.
Bug: 17993736

(cherry picked from commit 6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3)

Change-Id: I3e7595f437db4828072589d475a5453b7f31003e
6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3 15-Oct-2014 Ian Rogers <irogers@google.com> Make ART compile with GCC -O0 again.

Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on
architecture.
Add to instruction_set_test to warn when InstructionSetFeatures don't agree
with ones from system properties, AT_HWCAP and /proc/cpuinfo.
Clean-up class linker entry point logic to not return entry points but to
test whether the passed code is the particular entrypoint. This works around
image trampolines that replicate entrypoints.
Bug: 17993736

Change-Id: I5f4b49e88c3b02a79f9bee04f83395146ed7be23
fc787ecd91127b2c8458afd94e5148e2ae51a1f5 10-Oct-2014 Ian Rogers <irogers@google.com> Enable -Wimplicit-fallthrough.

Falling through switch cases on a clang build must now annotate the fallthrough
with the FALLTHROUGH_INTENDED macro.
Bug: 17731372

Change-Id: I836451cd5f96b01d1ababdbf9eef677fe8fa8324
63999683329612292d534e6be09dbde9480f1250 15-Jul-2014 Serban Constantinescu <serban.constantinescu@arm.com> Revert "Revert "Enable Load Store Elimination for ARM and ARM64""

This patch refactors the implementation of the LoadStoreElimination
optimisation pass. Please note that this pass was disabled and not
functional for any of the backends.

The current implementation tracks aliases and handles DalvikRegs as well
as Heap memory regions. It has been tested and it is known to optimise
out the following:
* Load - Load
* Store - Load
* Store - Store
* Load Literals

Change-Id: I3aadb12a787164146a95bc314e85fa73ad91e12b
c32447bcc8c36ee8ff265ed678c7df86936a9ebe 27-Jul-2014 Bill Buzbee <buzbee@android.com> Revert "Enable Load Store Elimination for ARM and ARM64"

On extended testing, I'm seeing a CHECK failure at utility_arm.cc:1201.

This reverts commit fcc36ba2a2b8fd10e6eebd21ecb6329606443ded.

Change-Id: Icae3d49cd7c8fcab09f2f989cbcb1d7e5c6d137a
fcc36ba2a2b8fd10e6eebd21ecb6329606443ded 15-Jul-2014 Serban Constantinescu <serban.constantinescu@arm.com> Enable Load Store Elimination for ARM and ARM64

This patch refactors the implementation of the LoadStoreElimination
optimisation pass. Please note that this pass was disabled and not
functional for any of the backends.

The current implementation tracks aliases and handles DalvikRegs as well
as Heap memory regions. It has been tested and it is known to optimise
out the following:
* Load - Load
* Store - Load
* Store - Store
* Load Literals

Change-Id: Iefae9b696f87f833ef35c451ed4d49c5a1b6fde0
984305917bf57b3f8d92965e4715a0370cc5bcfb 28-Jul-2014 Andreas Gampe <agampe@google.com> ART: Rework quick entrypoint code in Mir2Lir, cleanup

To reduce the complexity of calling trampolines in generic code,
introduce an enumeration for entrypoints. Introduce a header that lists
the entrypoint enum and exposes a templatized method that translates an
enum value to the corresponding thread offset value.

Call helpers are rewritten to have an enum parameter instead of the
thread offset. Also rewrite LoadHelper and GenConversionCall this way.
It is now LoadHelper's duty to select the right thread offset size.

Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to
further simplify the call helpers, as well as make OpThreadMem specific
to X86 only (removed from Mir2Lir).

Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify
both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they
are now specific to X86 only.

Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the
X86 backend.

Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend.

Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend.

Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented.

Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
9ee4519afd97121f893f82d41d23164fc6c9ed34 17-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86: GenSelect utility update

The is follow-up https://android-review.googlesource.com/#/c/101396/
to make x86 GenSelectConst32 implementation complete.

Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
48f5c47907654350ce30a8dfdda0e977f5d3d39f 27-Jun-2014 Hans Boehm <hboehm@google.com> Replace memory barriers to better reflect Java needs.

Replaces barriers that enforce ordering of one access type
(e.g. Load) with respect to another (e.g. store) with more general
ones that better reflect both Java requirements and actual hardware
barrier/fence instructions. The old code was inconsistent and
unclear about which barriers implied which others. Sometimes
multiple barriers were generated and then eliminated;
sometimes it was assumed that certain barriers implied others.
The new barriers closely parallel those in C++11, though, for now,
we use something closer to the old naming.

Bug: 14685856

Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 22-Jun-2014 buzbee <buzbee@google.com> Register promotion support for 64-bit targets

Not sufficiently tested for 64-bit targets, but should be
fairly close.

A significant amount of refactoring could stil be done, (in
later CLs).

With this change we are not making any changes to the vmap
scheme. As a result, it is a requirement that if a vreg
is promoted to both a 32-bit view and the low half of a
64-bit view it must share the same physical register. We
may change this restriction later on to allow for more flexibility
for 32-bit Arm.

For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to
promote, we'd end up with something like:

v4 (as an int) -> r10
v4/v5 (as a long) -> r10
v5 (as an int) -> r11
v5/v6 (as a long) -> r11

Fix a couple of ARM64 bugs on the way...

Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
de68676b24f61a55adc0b22fe828f036a5925c41 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"

This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.

Breaks the build.

Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter""

This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.

Fixes an API comment, and differentiates between inserting and appending.

Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d 23-Jun-2014 Andreas Gampe <agampe@google.com> ART: Split out more cases of Load/StoreRef, volatile as parameter

Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.

On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.

Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
37573977769e9068874506050c62acd4e324d246 16-Jun-2014 Vladimir Marko <vmarko@google.com> Clean up ARM load/store with offset imm8 << 2.

Change-Id: I95ed6860131b99eef7ed727f54745976949cbcb3
db9d523ff305721d4ca3f1470d1b2ce64c736e0a 10-Jun-2014 Vladimir Marko <vmarko@google.com> Clean up ArmMirToLir::LoadDispBody()/StoreDispBody().

Refactor the 64-bit load and store code to use a shared
helper function that can be used for any opcode with the
displacement limited to 8-bit value shifted by 2 (i.e. max
1020). Use that function also for 32-bit float load and
store as it is actually better than the old code for
offsets exceeding the 1020 byte limit.

Change-Id: I7dec38bae8cd9891420d2e92b1bac6138af5d64e
082833c8d577db0b2bebc100602f31e4e971613e 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This could result in "out of registers" failures, as well
as other more subtle problems.

This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Cherry-pick of internal change 468162

Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
05d3aeb33683b16837741f9348d6fba9a8432068 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

Fixes b/15024623

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
fe8cf8b1c1b4af0f8b4bb639576f7a5fc59f52ea 15-May-2014 Bill Buzbee <buzbee@google.com> Quick Compiler: fix Arm cts failures

Fixes move_wide_16#testN1, move_wide_16#testN2

Two bugs for the price of one (thanks CTS!)

First, the new stack overflow checking code was broken for very
large frames. For Arm on method entry, we only have 1 available
temp register, r12, until argument registers are flushed.
Previously, for explicit checks on large frames,
r12 was immediately loaded with the stack_end value. However,
later on when the frame is extended, if the frame size exceeds
the range of a reg-reg-imm subtract, the codegen utilities will
allocate a new temporary register to complete the operation. r12
was getting clobbered. Similarly, for medium-large frames r12
could get clobbered during frame creation.

What we should always do when directly using fixed registers like
this is to lock them to prevent them from being allocated as a
temp. The other half of the first bug is easily solved by delaying
the load of stack_end until after the new sp is computed. We'll
increase the stall cost, but this is an uncommon case.

The second bug was likely a typo in LoadValueDisp(). I'm a bit
surprised we hadn't hit this one earlier - but perhaps it was
recently introduced. The wrong base register was being used in
the non-float, wide, excessive offset case (which I suppose is also
somewhat uncommon).

Cherry-pick of internal commit If5b30f729e31d86db604045dd7581fd4626e0b55

Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
56e86eaf73eb3efa029f2dd53b2d21e3597d8e5f 15-May-2014 Bill Buzbee <buzbee@google.com> Revert "Revert "Quick Compiler: fix Arm cts failures""

It turns out the medium-large frame explicit stack
overflow check was also broken in a similar way: r12
was live going into the frame extension, but the frame
extension code sometimes needs a free temp.

This reverts commit 9cf44af1a223f905457688931317a4e4cb086a84.

Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
9cf44af1a223f905457688931317a4e4cb086a84 15-May-2014 Bill Buzbee <buzbee@google.com> Revert "Quick Compiler: fix Arm cts failures"

Error detected on further testing.

This reverts commit 06a4809f271c44ec1491e0b07ae9974aa35bc8ad.

Change-Id: Ia7b6b463f6422abac432f1a9484e4e080d003148
06a4809f271c44ec1491e0b07ae9974aa35bc8ad 14-May-2014 buzbee <buzbee@google.com> Quick Compiler: fix Arm cts failures

Fixes move_wide_16#testN1, move_wide_16#testN2

Two bugs for the price of one (thanks CTS!)

First, the new stack overflow checking code was broken for very
large frames. For Arm on method entry, we only have 1 available
temp register, r12, until argument registers are flushed.
Previously, for explicit checks on large frames,
r12 was immediately loaded with the stack_end value. However,
later on when the frame is extended, if the frame size exceeds
the range of a reg-reg-imm subtract, the codegen utilities will
allocate a new temporary register to complete the operation. r12
was getting clobbered.

What we should always do when directly using fixed registers like
this is to lock them to prevent them from being allocated as a
temp. The other half of the first bug is easily solved by delaying
the load of stack_end until after the new sp is computed. We'll
increase the stall cost, but this is an uncommon case.

The second bug was likely a typo in LoadValueDisp(). I'm a bit
surprised we hadn't hit this one earlier - but perhaps it was
recently introduced. The wrong base register was being used in
the non-float, wide, excessive offset case (which I suppose is also
somewhat uncommon).

Change-Id: I2c5074c9570b022af680f472deac9fe72a2e827e
2f244e9faccfcca68af3c5484c397a01a1c3a342 08-May-2014 Andreas Gampe <agampe@google.com> ART: Add more ThreadOffset in Mir2Lir and backends

This duplicates all methods with ThreadOffset parameters, so that
both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic
checks against the compilation unit's instruction set determine
which pointer size to use and therefore which methods to call.

Methods with unsupported pointer sizes should fatally fail, as
this indicates an issue during method selection.

Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
674744e635ddbdfb311fbd25b5a27356560d30c3 24-Apr-2014 Vladimir Marko <vmarko@google.com> Use atomic load/store for volatile IGET/IPUT/SGET/SPUT.

Bug: 14112919
Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 07-May-2014 Vladimir Marko <vmarko@google.com> Cleanup ARM load/store wide and remove unused param s_reg.

Use a single LDRD/VLDR instruction for wide load/store on
ARM, adjust the base pointer if needed. Remove unused
parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp()
and StoreBaseIndexedDisp() on all architectures.

Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
455759b5702b9435b91d1b4dada22c4cce7cae3c 06-May-2014 Vladimir Marko <vmarko@google.com> Remove LoadBaseDispWide and StoreBaseDispWide.

Just pass k64 or kDouble to non-wide versions.

Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
fd698e67953e40e804d7c9d1a3e8460e9d67382a 28-Apr-2014 buzbee <buzbee@google.com> Quick compiler: fix DCHECKS

The recent change to introduce k32, k64 and kReference operand
sizes missed updating a few DCHECKS.

Change-Id: I66eb617b07766e781b38962dc862fc5b023c2fbd
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb 19-Apr-2014 buzbee <buzbee@google.com> Update load/store utilities for 64-bit backends

This CL replaces the typical use of LoadWord/StoreWord
utilities (which, in practice, were 32-bit load/store) in
favor of a new set that make the size explicit. We now have:

LoadWordDisp/StoreWordDisp:
32 or 64 depending on target. Load or store the natural
word size. Expect this to be used infrequently - generally
when we know we're dealing with a native pointer or flushed
register not holding a Dalvik value (Dalvik values will flush
to home location sizes based on Dalvik, rather than the target).

Load32Disp/Store32Disp:
Load or store 32 bits, regardless of target.

Load64Disp/Store64Disp:
Load or store 64 bits, regardless of target.

LoadRefDisp:
Load a 32-bit compressed reference, and expand it to the
natural word size in the target register.

StoreRefDisp:
Compress a reference held in a register of the natural word
size and store it as a 32-bit compressed reference.

Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b 10-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Use trampolines for calls to helpers"""

This reverts commit f9487c039efb4112616d438593a2ab02792e0304.

Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304 09-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Use trampolines for calls to helpers""

This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff.

Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61

Conflicts:
compiler/dex/quick/mir_to_lir.h
081f73e888b3c246cf7635db37b7f1105cf1a2ff 07-Apr-2014 Dave Allison <dallison@google.com> Revert "Use trampolines for calls to helpers"

This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6.

Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6 19-Feb-2014 Dave Allison <dallison@google.com> Use trampolines for calls to helpers

This is an ARM specific optimization to the compiler
that uses trampoline islands to make calls to runtime
helper functions. The intention is to reduce the size
of the generated code (by 2 bytes per call) without
affecting performance.

By default this is on when generating an OAT file. It is
off when compiling to memory.

To switch this off in dex2oat, use the command line option:
--no-helper-trampolines

Enhances disassembler to print the trampoline entry on the
BL instruction like this:

0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend

Bug: 12607709
Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
dd7624d2b9e599d57762d12031b10b89defc9807 15-Mar-2014 Ian Rogers <irogers@google.com> Allow mixing of thread offsets between 32 and 64bit architectures.

Begin a more full implementation x86-64 REX prefixes.
Doesn't implement 64bit thread offset support for the JNI compiler.

Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
f943914730db8ad2ff03d49a2cacd31885d08fd7 27-Mar-2014 Dave Allison <dallison@google.com> Implement implicit stack overflow checks

This also fixes some failing run tests due to missing
null pointer markers.

The implementation of the implicit stack overflow checks introduces
the ability to have a gap in the stack that is skipped during
stack walk backs. This gap is protected against read/write and
is used to trigger a SIGSEGV at function entry if the stack
will overflow.

Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 28-Mar-2014 Ian Rogers <irogers@google.com> Revert "Revert "Optimize easy multiply and easy div remainder.""

This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088.
Remove the part of the change that confused !is_div with being multiply rather
than implying remainder.

Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.

(cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088)

Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
3654a6f50a948ead89627f398aaf86a2c2db0088 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
08df4b3da75366e5db37e696eaa7e855cba01deb 25-Mar-2014 Zheng Xu <zheng.xu@arm.com> Optimize easy multiply and easy div remainder.

Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters.
Add special cases for *0 and *1. Add more easy multiply special cases for
Arm.
Reuse easy multiply in SmallLiteralDivRem() to support remainder cases.

Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 07-Mar-2014 buzbee <buzbee@google.com> Continuing register cleanup

Ready for review.

Continue the process of using RegStorage rather than
ints to hold register value in the top layers of codegen.
Given the huge number of changes in this CL, I've attempted
to minimize the number of actual logic changes. With this
CL, the use of ints for registers has largely been eliminated
except in the lowest utility levels. "Wide" utility routines
have been updated to take a single RegStorage rather than
a pair of ints representing low and high registers.

Upcoming CLs will be smaller and more targeted. My expectations:
o Allocate float double registers as a single double rather than
a pair of float single registers.
o Refactor to push code which assumes long and double Dalvik
values are held in a pair of register to the target dependent
layer.
o Clean-up of the xxx_mir.h files to reduce the amount of #defines
for registers. May also do a register renumbering to bring all
of our targets' register naming more consistent. Possibly
introduce a target-independent float/non-float test at the
RegStorage level.

Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
40bbb39b85c063cd6a9f4ab00ff70372370e08cf 19-Mar-2014 buzbee <buzbee@google.com> Fix Quick compiler "out of registers"

There are a few places in the Arm backend that expect to be
able to survive on a single temp register - in particular
entry code generation and argument passing. However, in the
case of a very large frame and floating point ld/st, the
existing code could end up using 2 temps.

In short, if there is a displacement overflow we try to use
indexed load/store instructions (slightly more efficient).
However, there are none for floating point - so we ended up
burning yet another register to construct a direct pointer.
This CL detects this case and doesn't try to use the indexed
load/store mechanism for floats.

Fix for https://code.google.com/p/android/issues/detail?id=67349

Change-Id: I1ea596ea660e4add89fd4fddb8cbf99a54fbd343
60d7a65f7fb60f502160a2e479e86014c7787553 14-Mar-2014 Brian Carlstrom <bdc@google.com> Fix stack overflow for mutual recursion.

There was an error where we would have a pc that was in the method
which generated the stack overflow. This didn't work however
because the stack overflow check was before we stored the method in
the stack. The result was that the stack overflow handler had a PC
which wasnt necessarily in the method at the top of the stack. This
is now fixed by always restoring the link register before branching
to the throw entrypoint.

Slight code size regression on ARM/Mips (unmeasured). Regression on ARM
is 4 bytes of code per stack overflow check. Some of this regression is
mitigated by having one less GC safepoint.

Also adds test case for StackOverflowError issue (from bdc).

Tests passing: ARM, X86, Mips
Phone booting: ARM

Bug: https://code.google.com/p/android/issues/detail?id=66411
Bug: 12967914
Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d

(cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
c0f96d03a1855fda7d94332331b94860404874dd 14-Mar-2014 Brian Carlstrom <bdc@google.com> Fix stack overflow for mutual recursion.

There was an error where we would have a pc that was in the method
which generated the stack overflow. This didn't work however
because the stack overflow check was before we stored the method in
the stack. The result was that the stack overflow handler had a PC
which wasnt necessarily in the method at the top of the stack. This
is now fixed by always restoring the link register before branching
to the throw entrypoint.

Slight code size regression on ARM/Mips (unmeasured). Regression on ARM
is 4 bytes of code per stack overflow check. Some of this regression is
mitigated by having one less GC safepoint.

Also adds test case for StackOverflowError issue (from bdc).

Tests passing: ARM, X86, Mips
Phone booting: ARM

Bug: https://code.google.com/p/android/issues/detail?id=66411
Bug: 12967914
Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
0f6784737882199197796b67b99e5f1ded383bee 11-Mar-2014 Ian Rogers <irogers@google.com> Unify 64bit int constant definitions.

LL and ULL prefixes are word size dependent, use the INT64_C and UINT64_C
macros instead.

Change-Id: I5b70027651898814fc0b3e9e22a18a1047e76cb9
dbb8c49d540edd2a39076093163c7218f03aa502 28-Feb-2014 Vladimir Marko <vmarko@google.com> Remove non-existent ARM insn kThumb2SubsRRI12.

For kOpSub/kOpAdd, prefer modified immediate encodings
because they set flags.

Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
2c498d1f28e62e81fbdb477ff93ca7454e7493d7 30-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Specializing x86 range argument copying

The ARM implementation of range argument copying was specialized in some cases.
For all other architectures, it would fall back to generating memcpy. This patch
updates the x86 implementation so it does not call memcpy and instead generates
loads and stores, favoring movement of 128-bit chunks.

Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
bd288c2c1206bc99fafebfb9120a83f13cf9723b 21-Dec-2013 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Add conditional move support to x86 and allow GenMinMax to use it

X86 supports conditional moves which is useful for reducing branchiness.
This patch adds support to the x86 backend to generate conditional reg
to reg operations. Both encoder and decoder support was added for cmov.

The x86 version of GenMinMax used for generating inlined version Math.min/max
has been updated to make use of the conditional move support.

Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
58af1f9385742f70aca4fcb5e13aba53b8be2ef4 19-Dec-2013 Vladimir Marko <vmarko@google.com> Clean up usage of carry flag condition codes.

On X86, kCondUlt and kCondUge are bound to CS and CC,
respectively, while on ARM it's the other way around. The
explicit binding in ConditionCode was wrong and misleading
and could lead to subtle bugs. Therefore, we detach those
constants and clean up usage. The CS and CC conditions are
now effectively unused but we keep them around as they may
eventually be useful.

And some minor cleanup and comments.

Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
2247984899247b1402408d39731ff64048f0e274 19-Nov-2013 Vladimir Marko <vmarko@google.com> Clean up kOpCmp on ARM.

kThumb2CmnRI8M is now used.

Change-Id: I300299258ed99d86c300dee45c904c360dd44638
332b7aa6220124dc638b9f7e59611c376473f128 18-Nov-2013 Vladimir Marko <vmarko@google.com> Improve Thumb2 instructions' use of constant operands.

Rename instructions using modified immediate to use suffix
I8M. Many were using I8 which may lead to confusion with
Thumb I8 instructions and some were using other suffixes.

Add and use CmnRI8M, increase constant range of AddRRI12 and
SubRRI12 and use BicRRI8M for applicable kOpAnd constants.
In particular, this should marginaly improve Math.abs(float)
and Math.abs(double) by converting x & 0x7fffffff to BIC.

Bug: 11579369

Change-Id: I0f17a9eb80752d2625730a60555152cdffed50ba
7020278bce98a0735dc6abcbd33bdf1ed2634f1d 23-Oct-2013 Dave Allison <dallison@google.com> Support hardware divide instruction

Bug: 11299025

Uses sdiv for division and a combo of sdiv, mul and sub for modulus.
Only does this on processors that are capable of the sdiv instruction, as determined
by the build system.

Also provides a command line arg --instruction-set-features= to allow cross compilation.
Makefile adds the --instruction-set-features= arg to build-time dex2oat runs and defaults
it to something obtained from the target architecture.

Provides a GetInstructionSetFeatures() function on CompilerDriver that can be
queried for various features. The only feature supported right now is hasDivideInstruction().

Also adds a few more instructions to the ARM disassembler

b/11535253 is an addition to this CL to be done later.

Change-Id: Ia8aaf801fd94bc71e476902749cf20f74eba9f68
a8b4caf7526b6b66a8ae0826bd52c39c66e3c714 24-Oct-2013 Vladimir Marko <vmarko@google.com> Add byte swap instructions for ARM and x86.

Change-Id: I03fdd61ffc811ae521141f532b3e04dda566c77d
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
409fe94ad529d9334587be80b9f6a3d166805508 11-Oct-2013 buzbee <buzbee@google.com> Quick assembler fix

This CL re-instates the select pattern optimization disabled by
CL 374310, and fixes the underlying problem: improper handling of
the kPseudoBarrier LIR opcode. The bug was introduced in the
recent assembler restructuring. In short, LIR pseudo opcodes (which
have values < 0), should always have size 0 - and thus cause no
bits to be emitted during assembly. In this case, bad logic caused
us to set the size of a kPseudoBarrier opcode via lookup through the
EncodingMap.

Because all pseudo ops are < 0, this meant we did an array underflow
load, picking up whatever garbage was located before the EncodingMap.
This explains why this error showed up recently - we'd previuosly just
gotten a lucky layout.

This CL corrects the faulty logic, and adds DCHECKs to uses of
the EncodingMap to ensure that we don't try to access w/ a
pseudo op. Additionally, the existing is_pseudo_op() macro is
replaced with IsPseudoLirOp(), named similar to the existing
IsPseudoMirOp().

Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
b48819db07f9a0992a72173380c24249d7fc648a 15-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: assembly phase

Not as much compile-time gain from reworking the assembly phase as I'd
hoped, but still worthwhile. Should see ~2% improvement thanks to
the assembly rework. On the other hand, expect some huge gains for some
application thanks to better detection of large machine-generated init
methods. Thinkfree shows a 25% improvement.

The major assembly change was to establish thread the LIR nodes that
require fixup into a fixup chain. Only those are processed during the
final assembly pass(es). This doesn't help for methods which only
require a single pass to assemble, but does speed up the larger methods
which required multiple assembly passes.

Also replaced the block_map_ basic block lookup table (which contained
space for a BasicBlock* for each dex instruction unit) with a block id
map - cutting its space requirements by half in a 32-bit pointer
environment.

Changes:
o Reduce size of LIR struct by 12.5% (one of the big memory users)
o Repurpose the use/def portion of the LIR after optimization complete.
o Encode instruction bits to LIR
o Thread LIR nodes requiring pc fixup
o Change follow-on assembly passes to only consider fixup LIRs
o Switch on pc-rel fixup kind
o Fast-path for small methods - single pass assembly
o Avoid using cb[n]z for null checks (almost always exceed displacement)
o Improve detection of large initialization methods.
o Rework def/use flag setup.
o Remove a sequential search from FindBlock using lookup table of 16-bit
block ids rather than full block pointers.
o Eliminate pcRelFixup and use fixup kind instead.
o Add check for 16-bit overflow on dex offset.

Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
468532ea115657709bc32ee498e701a4c71762d4 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
(cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
848871b4d8481229c32e0d048a9856e5a9a17ef9 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
7934ac288acfb2552bb0b06ec1f61e5820d924a4 26-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comments issues

Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/indent issues

Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
9b7085a4e7c40e7fa01932ea1647a4a33ac1c585 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint readability/braces issues

Change-Id: I56b88956510077b0e13aad4caee8898313fab55b
38f85e4892f6504971bde994fec81fd61780ac30 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/operators issues

Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
df62950e7a32031b82360c407d46a37b94188fbb 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/parens issues

Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
2ce745c06271d5223d57dbf08117b20d5b60694a 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/braces issues

Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81