History log of /art/compiler/dex/quick/arm/int_arm.cc
Revision Date Author Comments
56e7666fe0df2c1a16b2cc6d12562c3e82199c0d 06-Jan-2015 Andreas Gampe <agampe@google.com> ART: Fix divide-by-zero for ARM

There was an infinite loop in the code generation for a divide
by literal zero.

Bug: 18887754

(cherry picked from commit cfe71e59c667abb35bc2363c49af7f8b549c44d0)

Change-Id: Ibd481918d3c6d7bc62fdd1a6807042009f561d95
c375e52e813857ad4c726bfb19ef4114e11095df 01-Dec-2014 Vladimir Marko <vmarko@google.com> Quick: Fix neg-long on ARM for overlapping regs.

Bug: 18569347

(cherry picked from commit 2f340a843ea5b3413c901f8c2365243b68864468)

Change-Id: Icde3bfdd7c90d51548823ce1f81caf9484de2be5
02ff2d4187249d26fabe8e5eacc27b99984ee353 04-Sep-2014 Serguei Katkov <serguei.i.katkov@intel.com> AddIntrinsicSlowPath with resume requires clobbering

AddIntrinsicSlowPath with resume results in a call.
So all temps must be clobbered at the point where
AddIntrinsicSlowPath returns.

(cherry-picked from 9863daf4fdc1a08339edac794452dbc719aef4f1)
Change-Id: If9eb887e295ff5e59920f4da1cef63258ad490b0
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
fa9c8ec37c66574654e448513e1bb59af7cb9365 07-Aug-2014 Zheng Xu <zheng.xu@arm.com> Add arraycopy intrinsic for arm and arm64.

Implement intrinsic for
java.lang.System.arraycopy(char[], int, char[], int, int).

Bug: 16241558

(cherry picked from commit 947717a2b085f36ea007ac64f728e19ff1c8db0b)

Change-Id: I8199f5c9ce9827f869f0f93aaff7ec359a84d922
c76c614d681d187d815760eb909e5faf488a3c35 05-Aug-2014 Andreas Gampe <agampe@google.com> ART: Refactor long ops in quick compiler

Make GenArithOpLong virtual. Let the implementation in gen_common be
very basic, without instruction-set checks, and meant as a fall-back.
Backends should implement and dispatch to code for better implementations.
This allows to remove the GenXXXLong virtual methods from Mir2Lir, and
clean up the backends (especially removing some LOG(FATAL) implementations).

Change-Id: I6366443c0c325c1999582d281608b4fa229343cf
984305917bf57b3f8d92965e4715a0370cc5bcfb 28-Jul-2014 Andreas Gampe <agampe@google.com> ART: Rework quick entrypoint code in Mir2Lir, cleanup

To reduce the complexity of calling trampolines in generic code,
introduce an enumeration for entrypoints. Introduce a header that lists
the entrypoint enum and exposes a templatized method that translates an
enum value to the corresponding thread offset value.

Call helpers are rewritten to have an enum parameter instead of the
thread offset. Also rewrite LoadHelper and GenConversionCall this way.
It is now LoadHelper's duty to select the right thread offset size.

Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to
further simplify the call helpers, as well as make OpThreadMem specific
to X86 only (removed from Mir2Lir).

Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify
both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they
are now specific to X86 only.

Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the
X86 backend.

Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend.

Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend.

Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented.

Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
bebee4fd10e5db6cb07f59bc0f73297c900ea5f0 16-Jul-2014 Andreas Gampe <agampe@google.com> ART: Refactor GenSelect, refactor gen_common accordingly

This adds a GenSelect method meant for selection of constants. The
general-purpose GenInstanceof code is refactored to take advantage of
this. This cleans up code and squashes a branch-over on ARM64 to a
cset.

Also add a slow-path for type initialization in GenInstanceof.

Bug: 16241558

(cherry picked from commit 90969af6deb19b1dbe356d62fe68d8f5698d3d8f)

Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
90969af6deb19b1dbe356d62fe68d8f5698d3d8f 16-Jul-2014 Andreas Gampe <agampe@google.com> ART: Refactor GenSelect, refactor gen_common accordingly

This adds a GenSelect method meant for selection of constants. The
general-purpose GenInstanceof code is refactored to take advantage of
this. This cleans up code and squashes a branch-over on ARM64 to a
cset.

Also add a slow-path for type initialization in GenInstanceof.

Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
9522af985466b2a05ef5cdede0808777dea7236e 15-Jul-2014 Andreas Gampe <agampe@google.com> ART: Squash a cmp w/ zero and b.ls to cbz (ARM/ARM64)

In case of array bounds checks at constant index 0 we generate a
compare and a branch. Squash into a cbz.

Change-Id: I1c6a6e37a7a2356b2c4580a3387cedb55436e251
48f5c47907654350ce30a8dfdda0e977f5d3d39f 27-Jun-2014 Hans Boehm <hboehm@google.com> Replace memory barriers to better reflect Java needs.

Replaces barriers that enforce ordering of one access type
(e.g. Load) with respect to another (e.g. store) with more general
ones that better reflect both Java requirements and actual hardware
barrier/fence instructions. The old code was inconsistent and
unclear about which barriers implied which others. Sometimes
multiple barriers were generated and then eliminated;
sometimes it was assumed that certain barriers implied others.
The new barriers closely parallel those in C++11, though, for now,
we use something closer to the old naming.

Bug: 14685856

Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 22-Jun-2014 buzbee <buzbee@google.com> Register promotion support for 64-bit targets

Not sufficiently tested for 64-bit targets, but should be
fairly close.

A significant amount of refactoring could stil be done, (in
later CLs).

With this change we are not making any changes to the vmap
scheme. As a result, it is a requirement that if a vreg
is promoted to both a 32-bit view and the low half of a
64-bit view it must share the same physical register. We
may change this restriction later on to allow for more flexibility
for 32-bit Arm.

For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to
promote, we'd end up with something like:

v4 (as an int) -> r10
v4/v5 (as a long) -> r10
v5 (as an int) -> r11
v5/v6 (as a long) -> r11

Fix a couple of ARM64 bugs on the way...

Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
23abec955e2e733999a1e2c30e4e384e46e5dde4 02-Jul-2014 Serban Constantinescu <serban.constantinescu@arm.com> AArch64: Add few more inline functions

This patch adds inlining support for the following functions:
* Math.max/min(long, long)
* Math.max/min(float, float)
* Math.max/min(double, double)
* Integer.reverse(int)
* Long.reverse(long)

Change-Id: Ia2b1619fd052358b3a0d23e5fcbfdb823d2029b9
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
de68676b24f61a55adc0b22fe828f036a5925c41 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"

This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.

Breaks the build.

Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter""

This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.

Fixes an API comment, and differentiates between inserting and appending.

Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d 23-Jun-2014 Andreas Gampe <agampe@google.com> ART: Split out more cases of Load/StoreRef, volatile as parameter

Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.

On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.

Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
04f4d8abe45d6e79eca983e057de76aea24b7df9 30-May-2014 Wei Jin <wejin@google.com> Add an optimization for removing redundant suspend tests in ART

This CL:
(1) eliminates redundant suspend checks (dominated by another check),

(2) removes the special treatment of the R4 register, which got
reset on every native call, possibly yielding long execution
sequences without any suspend checks, and

(3) fixes the absence of suspend checks in leaf methods.

(2) and (3) increase the frequency of suspend checks, which improves
the performance of GC and the accuracy of profile data. To
compensate for the increased number of checks, we implemented an
optimization that leverages dominance information to remove
redundant suspend checks on back edges. Based on the results of
running the Caffeine benchmark on Nexus 7, the patch performs
roughly 30% more useful suspend checks, spreading them much more
evenly along the execution trace, while incurring less than 1%
overhead. For flexibility consideration, this CL defines two flags
to control the enabling of optimizations. The original
implementation is the default.

Change-Id: I31e81a5b3c53030444dbe0434157274c9ab8640f
Signed-off-by: Wei Jin <wejin@google.com>
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 01-Jun-2014 buzbee <buzbee@google.com> Quick compiler: reference cleanup

For 32-bit targets, object references are 32 bits wide both in
Dalvik virtual registers and in core physical registers. Because of
this, object references and non-floating point values were both
handled as if they had the same register class (kCoreReg).

However, for 64-bit systems, references are 32 bits in Dalvik vregs, but
64 bits in physical registers. Although the same underlying physical
core registers will still be used for object reference and non-float
values, different register class views will be used to represent them.
For example, an object reference in arm64 might be held in x3 at some
point, while the same underlying physical register, w3, would be used
to hold a 32-bit int.

This CL breaks apart the handling of object reference and non-float values
to allow the proper register class (or register view) to be used. A
new register class, kRefReg, is introduced which will map to a 32-bit
core register on 32-bit targets, and 64-bit core registers on 64-bit
targets. From this point on, object references should be allocated
registers in the kRefReg class rather than kCoreReg.

Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
ed65c5e982705defdb597d94d1aa3f2997239c9b 22-May-2014 Serban Constantinescu <serban.constantinescu@arm.com> AArch64: Enable LONG_* and INT_* opcodes.

This patch fixes some of the issues with LONG and INT opcodes. The patch
has been tested and passes all the dalvik tests except for 018 and 107.

Change-Id: Idd1923ed935ee8236ab0c7e5fa969eaefeea8708
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
082833c8d577db0b2bebc100602f31e4e971613e 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This could result in "out of registers" failures, as well
as other more subtle problems.

This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Cherry-pick of internal change 468162

Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
05d3aeb33683b16837741f9348d6fba9a8432068 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

Fixes b/15024623

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
b14329f90f725af0f67c45dfcb94933a426d63ce 15-May-2014 Andreas Gampe <agampe@google.com> ART: Fix MonitorExit code on ARM

We do not emit barriers on non-SMP systems. But on ARM, we have
places that need to conditionally execute, which is done through
an IT instruction. The guide of said instruction thus changes
between SMP and non-SMP systems.

To cleanly approach this, change the API so that GenMemBarrier
returns whether it generated an instruction. ARM will have to
query the result and update any dependent IT.

Throw a build system error if TARGET_CPU_SMP is not set.

Fix runtime/Android.mk to work with new multilib host.

Bug: 14989275
Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
2f244e9faccfcca68af3c5484c397a01a1c3a342 08-May-2014 Andreas Gampe <agampe@google.com> ART: Add more ThreadOffset in Mir2Lir and backends

This duplicates all methods with ThreadOffset parameters, so that
both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic
checks against the compilation unit's instruction set determine
which pointer size to use and therefore which methods to call.

Methods with unsupported pointer sizes should fatally fail, as
this indicates an issue during method selection.

Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 07-May-2014 Vladimir Marko <vmarko@google.com> Cleanup ARM load/store wide and remove unused param s_reg.

Use a single LDRD/VLDR instruction for wide load/store on
ARM, adjust the base pointer if needed. Remove unused
parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp()
and StoreBaseIndexedDisp() on all architectures.

Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
455759b5702b9435b91d1b4dada22c4cce7cae3c 06-May-2014 Vladimir Marko <vmarko@google.com> Remove LoadBaseDispWide and StoreBaseDispWide.

Just pass k64 or kDouble to non-wide versions.

Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6 29-Apr-2014 buzbee <buzbee@google.com> Quick compiler: debugging assists

A few minor assists to ease A/B debugging in the Quick
compiler:
1. To save time, the assemblers for some targets only
update the object code offsets on instructions involved with
pc-relative fixups. We add code to fix up all offsets when
doing a verbose codegen listing.
2. Temp registers are normally allocated in a round-robin
fashion. When disabling liveness tracking, we now reset the
round-robin pool to 0 on each instruction boundary. This makes
it easier to spot real codegen differences.
3. Self-register copies were previously emitted, but
marked as nops. Minor change to avoid generating them in the
first place and reduce clutter.

Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb 19-Apr-2014 buzbee <buzbee@google.com> Update load/store utilities for 64-bit backends

This CL replaces the typical use of LoadWord/StoreWord
utilities (which, in practice, were 32-bit load/store) in
favor of a new set that make the size explicit. We now have:

LoadWordDisp/StoreWordDisp:
32 or 64 depending on target. Load or store the natural
word size. Expect this to be used infrequently - generally
when we know we're dealing with a native pointer or flushed
register not holding a Dalvik value (Dalvik values will flush
to home location sizes based on Dalvik, rather than the target).

Load32Disp/Store32Disp:
Load or store 32 bits, regardless of target.

Load64Disp/Store64Disp:
Load or store 64 bits, regardless of target.

LoadRefDisp:
Load a 32-bit compressed reference, and expand it to the
natural word size in the target register.

StoreRefDisp:
Compress a reference held in a register of the natural word
size and store it as a 32-bit compressed reference.

Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
3a74d15ccc9a902874473ac9632e568b19b91b1c 22-Apr-2014 Mingyao Yang <mingyao@google.com> Delete throw launchpads.

Bug: 13170824

Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
80365d9bb947edef0eae0bfe62b9f7a239416e6b 18-Apr-2014 Mingyao Yang <mingyao@google.com> Revert "Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException.""

This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException.
And fix the host test crash.

Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
7fff544c38f0dec3a213236bb785c3ca13d21a0f 18-Apr-2014 Brian Carlstrom <bdc@google.com> Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException."

This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
9d46314a309aff327f9913789b5f61200c162609 18-Apr-2014 Mingyao Yang <mingyao@google.com> Use LIRSlowPath for throwing ArrayOutOfBoundsException.

Get rid of launchpads for throwing ArrayOutOfBoundsException
and use LIRSlowPath instead.

Bug: 13170824
Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
e643a179cf5585ba6bafdd4fa51730d9f50c06f6 08-Apr-2014 Mingyao Yang <mingyao@google.com> Use LIRSlowPath for throwing NPE.

Get rid of launchpads for throwing NPE and use LIRSlowPath instead.
Also clean up some code of using LIRSlowPath for checking div
by zero.

Bug: 13170824

Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
4289456fa265b833434c2a8eee9e7a16da31c524 07-Apr-2014 Mingyao Yang <mingyao@google.com> Use LIRSlowPath for throwing div by zero exception.

Get rid of launchpads for throwing div by zero exception and
use LIRSlowPath instead. Add a CallRuntimeHelper that takes no
argument for the runtime function.

Bug: 13170824
Change-Id: I7e0563e736c6f92bd63e3fbdfe3a777ad333e338
a1983d4dab10b0cc51e9d1b6bcafa9a723fabcd9 07-Apr-2014 buzbee <buzbee@google.com> Quick compiler: fix CmpLong pair handling

OpCmpLong wasn't properly extracting the low register of a
pair.

Change-Id: I6d6cc3de1f543f4316e561648f371f793502fddb
3da67a558f1fd3d8a157d8044d521753f3f99ac8 03-Apr-2014 Dave Allison <dallison@google.com> Add OpEndIT() for marking the end of OpIT blocks

In ARM we need to prevent code motion to the inside of an
IT block. This was done using a GenBarrier() to mark the end, but
it wasn't obvious that this is what was happening. This CL adds
an explicit OpEndIT() that takes the LIR of the OpIT for future
checks.

Bug: 13751744
Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
f9719f9abbea060e086fe1304d72be50cbc8808e 02-Apr-2014 Zheng Xu <zheng.xu@arm.com> ARM: enable optimisation for easy multiply, add modulus pattern.

Fix the issue when src/dest registers overlap in easy multiply.

Change-Id: Ie8cc098c29c74fd06c1b67359ef94f2c6b88a71e
43a065ce1dda78e963868f9753a6e263721af927 02-Apr-2014 Dave Allison <dallison@google.com> Add GenBarrier() calls to terminate all IT blocks.

This is needed to prevent things like load hoisting from putting
instructions inside the IT block.

Bug: 13749123
Change-Id: I98a010453b163ac20a90f626144f798fc06e65a9
dd7624d2b9e599d57762d12031b10b89defc9807 15-Mar-2014 Ian Rogers <irogers@google.com> Allow mixing of thread offsets between 32 and 64bit architectures.

Begin a more full implementation x86-64 REX prefixes.
Doesn't implement 64bit thread offset support for the JNI compiler.

Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 28-Mar-2014 Ian Rogers <irogers@google.com> Revert "Revert "Optimize easy multiply and easy div remainder.""

This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088.
Remove the part of the change that confused !is_div with being multiply rather
than implying remainder.

Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.

(cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088)

Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
3654a6f50a948ead89627f398aaf86a2c2db0088 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
08df4b3da75366e5db37e696eaa7e855cba01deb 25-Mar-2014 Zheng Xu <zheng.xu@arm.com> Optimize easy multiply and easy div remainder.

Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters.
Add special cases for *0 and *1. Add more easy multiply special cases for
Arm.
Reuse easy multiply in SmallLiteralDivRem() to support remainder cases.

Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 07-Mar-2014 buzbee <buzbee@google.com> Continuing register cleanup

Ready for review.

Continue the process of using RegStorage rather than
ints to hold register value in the top layers of codegen.
Given the huge number of changes in this CL, I've attempted
to minimize the number of actual logic changes. With this
CL, the use of ints for registers has largely been eliminated
except in the lowest utility levels. "Wide" utility routines
have been updated to take a single RegStorage rather than
a pair of ints representing low and high registers.

Upcoming CLs will be smaller and more targeted. My expectations:
o Allocate float double registers as a single double rather than
a pair of float single registers.
o Refactor to push code which assumes long and double Dalvik
values are held in a pair of register to the target dependent
layer.
o Clean-up of the xxx_mir.h files to reduce the amount of #defines
for registers. May also do a register renumbering to bring all
of our targets' register naming more consistent. Possibly
introduce a target-independent float/non-float test at the
RegStorage level.

Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
99ad7230ccaace93bf323dea9790f35fe991a4a2 26-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Relaxed memory barriers for x86

X86 provides stronger memory guarantees and thus the memory barriers can be
optimized. This patch ensures that all memory barriers for x86 are treated
as scheduling barriers. And in cases where a barrier is needed (StoreLoad case),
an mfence is used.

Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
d7f8e02041e9d16160bc81bd1fa19189bffc04b3 13-Mar-2014 Zheng Xu <zheng.xu@arm.com> ARM: Do not allocate temp registers in MulLong if possible.

Just use rl_result if we have enough registers and it is *not* either operand.

Change-Id: I5a6f3ec09653b97e41bbc6dce823aa8534f98a13
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 21-Feb-2014 Dave Allison <dallison@google.com> Implicit null/suspend checks (oat version bump)

This adds the ability to use SEGV signals
to throw NullPointerException exceptions from Java code rather
than having the compiler generate explicit comparisons and
branches. It does this by using sigaction to trap SIGSEGV and when triggered
makes sure it's in compiled code and if so, sets the return
address to the entry point to throw the exception.

It also uses this signal mechanism to determine whether to check
for thread suspension. Instead of the compiler generating calls
to a function to check for threads being suspended, the compiler
will now load indirect via an address in the TLS area. To trigger
a suspend, the contents of this address are changed from something
valid to 0. A SIGSEGV will occur and the handler will check
for a valid instruction pattern before invoking the thread
suspension check code.

If a user program taps SIGSEGV it will prevent our signal handler
working. This will cause a failure in the runtime.

There are two signal handlers at present. You can control them
individually using the flags -implicit-checks: on the runtime
command line. This takes a string parameter, a comma
separated set of strings. Each can be one of:

none switch off
null null pointer checks
suspend suspend checks
all all checks

So to switch only suspend checks on, pass:
-implicit-checks:suspend

There is also -explicit-checks to provide the reverse once
we change the default.

For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar

The default is -implicit-checks:none

There is also a property 'dalvik.vm.implicit_checks' whose value is the same
string as the command option. The default is 'none'. For example to switch on
null checks using the option:

setprop dalvik.vm.implicit_checks null

It only works for ARM right now.

Bumps OAT version number due to change to Thread offsets.

Bug: 13121132
Change-Id: If743849138162f3c7c44a523247e413785677370
a1a7074eb8256d101f7b5d256cda26d7de6ce6ce 03-Mar-2014 Vladimir Marko <vmarko@google.com> Rewrite kMirOpSelect for all IF_ccZ opcodes.

Also improve special cases for ARM and add tests.

Change-Id: I06f575b9c7b547dbc431dbfadf2b927151fe16b9
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 28-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Revert "Rework Quick compiler's register handling""

This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace.

Ready. Fixed the original type, plus some mechanical changes
for rebasing.

Still needs additional testing, but the problem with the original
CL appears to have been a typo in the definition of the x86
double return template RegLocation.

Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
dbb8c49d540edd2a39076093163c7218f03aa502 28-Feb-2014 Vladimir Marko <vmarko@google.com> Remove non-existent ARM insn kThumb2SubsRRI12.

For kOpSub/kOpAdd, prefer modified immediate encodings
because they set flags.

Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
86ec520fc8b696ed6f164d7b756009ecd6e4aace 26-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Rework Quick compiler's register handling"

This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c.

Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
2c1ed456dcdb027d097825dd98dbe48c71599b6c 20-Feb-2014 buzbee <buzbee@google.com> Rework Quick compiler's register handling

For historical reasons, the Quick backend found it convenient
to consider all 64-bit Dalvik values held in registers
to be contained in a pair of 32-bit registers. Though this
worked well for ARM (with double-precision registers also
treated as a pair of 32-bit single-precision registers) it doesn't
play well with other targets. And, it is somewhat problematic
for 64-bit architectures.

This is the first of several CLs that will rework the way the
Quick backend deals with physical registers. The goal is to
eliminate the "64-bit value backed with 32-bit register pair"
requirement from the target-indendent portions of the backend
and support 64-bit registers throughout.

The key RegLocation struct, which describes the location of
Dalvik virtual register & register pairs, previously contained
fields for high and low physical registers. The low_reg and
high_reg fields are being replaced with a new type: RegStorage.
There will be a single instance of RegStorage for each RegLocation.
Note that RegStorage does not increase the space used. It is
16 bits wide, the same as the sum of the 8-bit low_reg and
high_reg fields.

At a target-independent level, it will describe whether the physical
register storage associated with the Dalvik value is a single 32
bit, single 64 bit, pair of 32 bit or vector. The actual register
number encoding is left to the target-dependent code layer.

Because physical register handling is pervasive throughout the
backend, this restructuring necessarily involves large CLs with
lots of changes. I'm going to roll these out in stages, and
attempt to segregate the CLs with largely mechanical changes from
those which restructure or rework the logic.

This CL is of the mechanical change variety - it replaces low_reg
and high_reg from RegLocation and introduces RegStorage. It also
includes a lot of new code (such as many calls to GetReg())
that should go away in upcoming CLs.

The tentative plan for the subsequent CLs is:

o Rework standard register utilities such as AllocReg() and
FreeReg() to use RegStorage instead of ints.
o Rework the target-independent GenXXX, OpXXX, LoadValue,
StoreValue, etc. routines to take RegStorage rather than
int register encodings.
o Take advantage of the vector representation and eliminate
the current vector field in RegLocation.
o Replace the "wide" variants of codegen utilities that take
low_reg/high_reg pairs with versions that use RegStorage.
o Add 64-bit register target independent codegen utilities
where possible, and where not virtualize with 32-bit general
register and 64-bit general register variants in the target
dependent layer.
o Expand/rework the LIR def/use flags to allow for more registers
(currently, we lose out on 16 MIPS floating point regs as
well as ARM's D16..D31 for lack of space in the masks).
o [Possibly] move the float/non-float determination of a register
from the target-dependent encoding to RegStorage. In other
words, replace IsFpReg(register_encoding_bits).

At the end of the day, all code in the target independent layer
should be using RegStorage, as should much of the target dependent
layer. Ideally, we won't be using the physical register number
encoding extracted from RegStorage (i.e. GetReg()) until the
NewLIRx() layer.

Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
80b7f4f217958df6950291a5ae861249bf5b943d 13-Feb-2014 Vladimir Marko <vmarko@google.com> am 47c42cae: am 76559681: Merge "Generate ARM special methods from InlineMethod data."

* commit '47c42caef35bdc29229b2714d78e48fbd7dc57e6':
Generate ARM special methods from InlineMethod data.
502c2a84888b7da075049dcaaeb0156602304f65 06-Feb-2014 Vladimir Marko <vmarko@google.com> Generate ARM special methods from InlineMethod data.

Change-Id: I204b01660a1e515879524018d1371e31f41da59b
4708dcd68eebf1173aef1097dad8ab13466059aa 22-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long multiply and shifts

Generate inline code for long shifts by constants and do long
multiplication inline. Convert multiplication by a constant to a
shift when we can. Fix some x86 assembler problems and add the new
instructions that were needed (64 bit shifts).

Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2bf31e67694da24a19fc1f328285cebb1a4b9964 23-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long divide

Implement inline division for literal and variable divisors. Use the
general case for dividing by a literal by using a double length multiply
by the appropriate constant with fixups. This is the Hacker's Delight
algorithm.

Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e02d48fb24747f90fd893e1c3572bb3c500afced 15-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Optimize x86 long arithmetic

Be smarter about taking advantage of a constant operand for x86 long
add/sub/and/or/xor. Using instructions with immediates and generating
results directly into memory reduces the number of temporary registers
and avoids hardcoded register usage.

Also rewrite the existing non-const x86 arithmetic to avoid fixed
register use, and use the fact that x86 instructions are two operand.
Pass the opcode to the XXXLong() routines to easily detect two operand
DEX opcodes.

Add a new StoreFinalValueWide() routine, which is similar to StoreValueWide,
but doesn't do an EvalLoc to allocate registers. The src operand must
already be in registers, and it just updates the dest location, and
calls the right live/dirty routines to get the src into the dest
properly.

Change-Id: Iefc16e7bc2236a73dc780d3d5137ae8343171f62
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a894607bca7eb623bc957363e4b36f44cfeea1b6 22-Jan-2014 Vladimir Marko <vmarko@google.com> Move fused cmp branch ccode to MIR::meta.

This a small refactoring towards removing the large
DecodedInstruction from the MIR class.

Change-Id: I10f9ed5eaac42511d864c71d20a8ff6360292cec
58af1f9385742f70aca4fcb5e13aba53b8be2ef4 19-Dec-2013 Vladimir Marko <vmarko@google.com> Clean up usage of carry flag condition codes.

On X86, kCondUlt and kCondUge are bound to CS and CC,
respectively, while on ARM it's the other way around. The
explicit binding in ConditionCode was wrong and misleading
and could lead to subtle bugs. Therefore, we detach those
constants and clean up usage. The CS and CC conditions are
now effectively unused but we keep them around as they may
eventually be useful.

And some minor cleanup and comments.

Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
b122a4bbed34ab22b4c1541ee25e5cf22f12a926 20-Nov-2013 Ian Rogers <irogers@google.com> Tidy up memory barriers.

Change-Id: I937ea93e6df1835ecfe2d4bb7d84c24fe7fc097b
3e5af82ae1a2cd69b7b045ac008ac3b394d17f41 21-Nov-2013 Vladimir Marko <vmarko@google.com> Intrinsic Unsafe.CompareAndSwapLong() for ARM.

(cherry picked from cb53fcd79b1a5ce608208ec454b5c19f64aaba37)

Change-Id: Iadd3cc8b4ed390670463b80f8efd579ce6ece226
1c282e2b9a9b432e132b2c332f861cad9feb4a73 21-Nov-2013 Vladimir Marko <vmarko@google.com> Refactor intrinsic CAS, prepare for 64-bit version.

Bug: 11391018
Change-Id: Ic0f740e0cd0eb47f2c915f81be02f52f7721f8a3
2247984899247b1402408d39731ff64048f0e274 19-Nov-2013 Vladimir Marko <vmarko@google.com> Clean up kOpCmp on ARM.

kThumb2CmnRI8M is now used.

Change-Id: I300299258ed99d86c300dee45c904c360dd44638
332b7aa6220124dc638b9f7e59611c376473f128 18-Nov-2013 Vladimir Marko <vmarko@google.com> Improve Thumb2 instructions' use of constant operands.

Rename instructions using modified immediate to use suffix
I8M. Many were using I8 which may lead to confusion with
Thumb I8 instructions and some were using other suffixes.

Add and use CmnRI8M, increase constant range of AddRRI12 and
SubRRI12 and use BicRRI8M for applicable kOpAnd constants.
In particular, this should marginaly improve Math.abs(float)
and Math.abs(double) by converting x & 0x7fffffff to BIC.

Bug: 11579369

Change-Id: I0f17a9eb80752d2625730a60555152cdffed50ba
7020278bce98a0735dc6abcbd33bdf1ed2634f1d 23-Oct-2013 Dave Allison <dallison@google.com> Support hardware divide instruction

Bug: 11299025

Uses sdiv for division and a combo of sdiv, mul and sub for modulus.
Only does this on processors that are capable of the sdiv instruction, as determined
by the build system.

Also provides a command line arg --instruction-set-features= to allow cross compilation.
Makefile adds the --instruction-set-features= arg to build-time dex2oat runs and defaults
it to something obtained from the target architecture.

Provides a GetInstructionSetFeatures() function on CompilerDriver that can be
queried for various features. The only feature supported right now is hasDivideInstruction().

Also adds a few more instructions to the ARM disassembler

b/11535253 is an addition to this CL to be done later.

Change-Id: Ia8aaf801fd94bc71e476902749cf20f74eba9f68
e508a2090b19fe705fbc6b99d76474037a74bbfb 04-Nov-2013 Vladimir Marko <vmarko@google.com> Fix unaligned Memory peek/poke intrinsics.

Change-Id: Id454464d0b28aa37f5239f1c6589ceb0b3bbbdea
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
379067c3970fb225332cca25301743f5010d3ef9 16-Oct-2013 Ian Rogers <irogers@google.com> Don't clobber array reg if its needed for card marking

Change-Id: I4377717a2431ffd7e8fafc2e2cca7c1285b38668
773aab1e8992b2834153eb23c976a4eb0da51a71 14-Oct-2013 Ian Rogers <irogers@google.com> Correct free-ing of temp register.

Bug 11199874.
The card mark was potentially using a register freed just before. Make the
free-ing of temps strongly correspond to their allocation.

Change-Id: I3d1e8c923b7fd8b3666e841d3ff9a46e6eb58318
a9a8254c920ce8e22210abfc16c9842ce0aea28f 04-Oct-2013 Ian Rogers <irogers@google.com> Improve quick codegen for aput-object.

1) don't type check known null.
2) if we know types in verify don't check at runtime.
3) if we're runtime checking then move all the code out-of-line.

Also, don't set up a callee-save frame for check-cast, do an instance-of test
then throw an exception if that fails.
Tidy quick entry point of Ldivmod to Lmod which it is on x86 and mips.
Fix monitor-enter/exit NPE for MIPS.
Fix benign bug in mirror::Class::CannotBeAssignedFromOtherTypes, a byte[]
cannot be assigned to from other types.

Change-Id: I9cb3859ec70cca71ed79331ec8df5bec969d6745
d9c4fc94fa618617f94e1de9af5f034549100753 02-Oct-2013 Ian Rogers <irogers@google.com> Inflate contended lock word by suspending owner.

Bug 6961405.
Don't inflate monitors for Notify and NotifyAll.
Tidy lock word, handle recursive lock case alongside unlocked case and move
assembly out of line (except for ARM quick). Also handle null in out-of-line
assembly as the test is quick and the enter/exit code is already a safepoint.
To gain ownership of a monitor on behalf of another thread, monitor contenders
must not hold the monitor_lock_, so they wait on a condition variable.
Reduce size of per mutex contention log.
Be consistent in calling thin lock thread ids just thread ids.
Fix potential thread death races caused by the use of FindThreadByThreadId,
make it invariant that returned threads are either self or suspended now.

Code size reduction on ARM boot.oat 0.2%.
Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%,
nexus 4 speedup 2.09% on DeltaBlue.

Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
b48819db07f9a0992a72173380c24249d7fc648a 15-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: assembly phase

Not as much compile-time gain from reworking the assembly phase as I'd
hoped, but still worthwhile. Should see ~2% improvement thanks to
the assembly rework. On the other hand, expect some huge gains for some
application thanks to better detection of large machine-generated init
methods. Thinkfree shows a 25% improvement.

The major assembly change was to establish thread the LIR nodes that
require fixup into a fixup chain. Only those are processed during the
final assembly pass(es). This doesn't help for methods which only
require a single pass to assemble, but does speed up the larger methods
which required multiple assembly passes.

Also replaced the block_map_ basic block lookup table (which contained
space for a BasicBlock* for each dex instruction unit) with a block id
map - cutting its space requirements by half in a 32-bit pointer
environment.

Changes:
o Reduce size of LIR struct by 12.5% (one of the big memory users)
o Repurpose the use/def portion of the LIR after optimization complete.
o Encode instruction bits to LIR
o Thread LIR nodes requiring pc fixup
o Change follow-on assembly passes to only consider fixup LIRs
o Switch on pc-rel fixup kind
o Fast-path for small methods - single pass assembly
o Avoid using cb[n]z for null checks (almost always exceed displacement)
o Improve detection of large initialization methods.
o Rework def/use flag setup.
o Remove a sequential search from FindBlock using lookup table of 16-bit
block ids rather than full block pointers.
o Eliminate pcRelFixup and use fixup kind instead.
o Add check for 16-bit overflow on dex offset.

Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
4858e1dbfbccb287ad57868230a9e79011483c2a 13-Sep-2013 Jeff Hao <jeffhao@google.com> Make inlined CAS32 loop until store is successful if values match.

The native implementation of compareAndSwap uses android_atomic_cas,
which will repeat the strex until it succeeds. The compiled version
was changed to do the same.

Bug: 10530407
Change-Id: I7efb3f92d0d0610fcc5a885e2c97f1d701b5a4ea
(cherry picked from commit 2de2aa1a96dfa5bebc004f29b5dbfafd37039cee)
31fb301f2b711607cfaf399f4163d48b6e2cf649 12-Sep-2013 Jeff Hao <jeffhao@google.com> Revert "Fix CAS intrinsic to clear exclusive if values don't match."

Ian is correct. I can still see this bug even with this change.

This reverts commit 3a0831507637028a439712dedaaddd7cd0893995.

Change-Id: I780f2de926f1ff7576adc679c56a6cf491dad127
6fc9251ae4f34c31351d9d902dd6c6cbc7baba1c 13-Sep-2013 Jeff Hao <jeffhao@google.com> Make inlined CAS32 loop until store is successful if values match.

The native implementation of compareAndSwap uses android_atomic_cas,
which will repeat the strex until it succeeds. The compiled version
was changed to do the same.

Bug: 10530407
Change-Id: I7efb3f92d0d0610fcc5a885e2c97f1d701b5a4ea
(cherry picked from commit 2de2aa1a96dfa5bebc004f29b5dbfafd37039cee)
2de2aa1a96dfa5bebc004f29b5dbfafd37039cee 13-Sep-2013 Jeff Hao <jeffhao@google.com> Make inlined CAS32 loop until store is successful if values match.

The native implementation of compareAndSwap uses android_atomic_cas,
which will repeat the strex until it succeeds. The compiled version
was changed to do the same.

Bug: 10530407
Change-Id: I7efb3f92d0d0610fcc5a885e2c97f1d701b5a4ea
95848d01adae14c6a9ba433f6789a9462edb8e7d 12-Sep-2013 Jeff Hao <jeffhao@google.com> Revert "Fix CAS intrinsic to clear exclusive if values don't match."

Ian is correct. I can still see this bug even with this change.

This reverts commit 3a0831507637028a439712dedaaddd7cd0893995.

Change-Id: I780f2de926f1ff7576adc679c56a6cf491dad127
3a0831507637028a439712dedaaddd7cd0893995 12-Sep-2013 Jeff Hao <jeffhao@google.com> Fix CAS intrinsic to clear exclusive if values don't match.

The LDREX has a matching STREX if the values match, but it needed
a CLREX for the case where they didn't.

Bug: 10530407
Change-Id: I46b474cca326a251536e7f214c80486694431386
(cherry picked from commit 78765e84a3654357a03f84b76985556cf7d9731a)
78765e84a3654357a03f84b76985556cf7d9731a 12-Sep-2013 Jeff Hao <jeffhao@google.com> Fix CAS intrinsic to clear exclusive if values don't match.

The LDREX has a matching STREX if the values match, but it needed
a CLREX for the case where they didn't.

Bug: 10530407
Change-Id: I46b474cca326a251536e7f214c80486694431386
252254b130067cd7a5071865e793966871ae0246 09-Sep-2013 buzbee <buzbee@google.com> More Quick compile-time tuning: labels & branches

This CL represents a roughly 3.5% performance improvement for the
compile phase of dex2oat. Move of the gain comes from avoiding
the generation of dex boundary LIR labels unless a debug listing
is requested. The other significant change is moving from a basic block
ending branch model of "always generate a fall-through branch, and then
delete it if we can" to a "only generate a fall-through branch if we need
it" model.

The data motivating these changes follow. Note that two area of
potentially attractive gain remain: restructing the assembler model and
reworking the register handling utilities. These will be addressed
in subsequent CLs.

--- data follows

The Quick compiler's assembler has shown up on profile reports a bit
more than seems reasonable. We've tried a few quick fixes to apparently
hot portions of the code, but without much gain. So, I've been looking at
the assembly process at a somewhat higher level. There look to be several
potentially good opportunities.

First, an analysis of the makeup of the LIR graph showed a surprisingly
high proportion of LIR pseudo ops. Using the boot classpath as a basis,
we get:

32.8% of all LIR nodes are pseudo ops.
10.4% are LIR instructions which require pc-relative fixups.
11.8% are LIR instructions that have been nop'd by the various
optimization passes.

Looking only at the LIR pseudo ops, we get:
kPseudoDalvikByteCodeBoundary 43.46%
kPseudoNormalBlockLabel 21.14%
kPseudoSafepointPC 20.20%
kPseudoThrowTarget 6.94%
kPseudoTarget 3.03%
kPseudoSuspendTarget 1.95%
kPseudoMethodExit 1.26%
kPseudoMethodEntry 1.26%
kPseudoExportedPC 0.37%
kPseudoCaseLabel 0.30%
kPseudoBarrier 0.07%
kPseudoIntrinsicRetry 0.02%
Total LIR count: 10167292

The standout here is the Dalvik opcode boundary marker. This is just a
label inserted at the beginning of the codegen for each Dalvik bytecode.
If we're also doing a verbose listing, this is also where we hang the
pretty-print disassembly string. However, this label was also
being used as a convenient way to find the target of switch case
statements (and, I think at one point was used in the Mir->GBC conversion
process).

This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only
verbose listing runs, and replaces the codegen uses of the label with
the kPseudoNormalBlockLabel attached to the basic block that contains the
switch case target. Great savings here - 14.3% reduction in the number of
LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6%
of all LIR. That's still a lot, but much better. Possible further
improvements via combining normal labels with kPseudoSafepointPC labels
where appropriate, and also perhaps reduce memory usage by using a
short-hand form for labels rather than a full LIR node. Also, many
of the basic block labels are no longer branch targets by the time
we get to assembly - cheaper to delete, or just ingore?

Here's the "after" LIR pseudo op breakdown:

kPseudoNormalBlockLabel 37.39%
kPseudoSafepointPC 35.72%
kPseudoThrowTarget 12.28%
kPseudoTarget 5.36%
kPseudoSuspendTarget 3.45%
kPseudoMethodEntry 2.24%
kPseudoMethodExit 2.22%
kPseudoExportedPC 0.65%
kPseudoCaseLabel 0.53%
kPseudoBarrier 0.12%
kPseudoIntrinsicRetry 0.04%
Total LIR count: 5748232

Not done in this CL, but it will be worth experimenting with actually
deleting LIR nodes from the graph when they are optimized away, rather
than just setting the NOP bit. Keeping them around is invaluable
during debugging - but when not debugging it may pay off if the cost of
node removal is less than the cost of traversing through dead nodes
in subsequent passes.

Next up (and partially in this CL - but mostly to be done in follow-on
CLs) is the overall assembly process. Inherited from the trace JIT,
the Quick compiler has a fairly simple-minded approach to instruction
assembly. First, a pass is made over the LIR list to assign offsets
to each instruction. Then, the assembly pass is made - which generates
the actual machine instruction bit patterns and pushes the instruction
data into the code_buffer. However, the code generator takes the "always
optimistic" approach to instruction selection and emits the shortest
instruction. If, during assembly, we find that a branch or load doesn't
reach, that short-form instruction is replaces with a longer sequence.

Of course, this invalidates the previously-computed offset calculations.
Assembly thus is an iterative process: compute offsets and then assemble
until we survive an assembly pass without invalidation. This seems
like a likely candidate for improvement. First, I analyzed the
number of retries required, and the reason for invalidation over the
boot classpath load.

The results: more than half of methods don't require a retry, and
very few require more than 1 extra pass:

5 or more: 6 of 96334
4 or more: 22 of 96334
3 or more: 140 of 96334
2 or more: 1794 of 96334 - 2%
1 or more: 40911 of 96334 - 40%
0 retries: 55423 of 96334 - 58%

The interesting group here is the one that requires 1 retry. Looking
at the reason, we see three typical reasons:

1. A cbnz/cbz doesn't reach (only 7 bits of offset)
2. A 16-bit Thumb1 unconditional branch doesn't reach.
3. An unconditional branch which branches to the next instruction
is encountered, and deleted.

The first 2 cases are the cost of the optimistic strategy - nothing
much to change there. However, the interesting case is #3 - dead
branch elimination. A further analysis of the single retry group showed
that 42% of the methods (16305) that required a single retry did so
*only* because of dead branch elimination. The big question here is
why so many dead branches survive to the assembly stage. We have
a dead branch elimination pass which is supposed to catch these - perhaps
it's not working correctly, should be moved later in the optimization
process, or perhaps run multiple times.

Other things to consider:

o Combine the offset generation pass with the assembly pass. Skip
pc-relative fixup assembly (other than assigning offset), but push
LIR* for them into work list. Following the main pass, zip through
the work list and assemble the pc-relative instructions (now that we
know the offsets). This would significantly cut back on traversal
costs.

o Store the assembled bits into both the code buffer and the LIR.
In the event we have to retry, only the pc-relative instructions
would need to be assembled, and we'd finish with a pass over the
LIR just to dumb the bits into the code buffer.

Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
11b63d13f0a3be0f74390b66b58614a37f9aa6c1 27-Aug-2013 buzbee <buzbee@google.com> Quick compiler: division by literal fix

The constant propagation optimization pass attempts to identify
constants in Dalvik virtual registers and handle them more efficiently.
The use of small constants in divison, though, was handled incorrectly
in that the high level code correctly detected the use of a constant,
but the actual code generation routine was only expecting the use of
a special constant form opcode.

see b/10503566

Change-Id: I88aa4d2eafebb2b1af1a1e88049f1845aefae261
468532ea115657709bc32ee498e701a4c71762d4 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
(cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
848871b4d8481229c32e0d048a9856e5a9a17ef9 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
834b394ee759ed31c5371d8093d7cd8cd90014a8 31-Jul-2013 Brian Carlstrom <bdc@google.com> Merge remote-tracking branch 'goog/dalvik-dev' into merge-art-to-dalvik-dev

Change-Id: I323e9e8c29c3e39d50d9aba93121b26266c52a46
7655f29fabc0a12765de828914a18314382e5a35 29-Jul-2013 Ian Rogers <irogers@google.com> Portable refactorings.

Separate quick from portable entrypoints.
Move architectural dependencies into arch.

Change-Id: I9adbc0a9782e2959fdc3308215f01e3107632b7c
166db04e259ca51838c311891598664deeed85ad 26-Jul-2013 Ian Rogers <irogers@google.com> Move assembler out of runtime into compiler/utils.

Other directory layout bits of clean up. There is still work to separate quick
and portable in some files (e.g. argument visitor, proxy..).

Change-Id: If8fecffda8ba5c4c47a035f0c622c538c6b58351
7934ac288acfb2552bb0b06ec1f61e5820d924a4 26-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comments issues

Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
4274889d48ef82369bf2c1ca70d84689b4f9e93a 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fixing cpplint readability/check issues

Change-Id: Ia81db7238b4a13ff2e585aaac9d5e3e91df1e3e0
df62950e7a32031b82360c407d46a37b94188fbb 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/parens issues

Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
b1eba213afaf7fa6445de863ddc9680ab99762ea 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comma issues

Change-Id: I456fc8d80371d6dfc07e6d109b7f478c25602b65
2ce745c06271d5223d57dbf08117b20d5b60694a 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/braces issues

Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81