History log of /art/compiler/dex/quick/arm/call_arm.cc
Revision Date Author Comments
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
41b175aba41c9365a1c53b8a1afbd17129c87c14 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
848f70a3d73833fc1bf3032a9ff6812e429661d9 15-Jan-2014 Jeff Hao <jeffhao@google.com> Replace String CharArray with internal uint16_t array.

Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter

Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
2cebb24bfc3247d3e9be138a3350106737455918 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Replace NULL with nullptr

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
1109fb3cacc8bb667979780c2b4b12ce5bb64549 07-Apr-2015 David Srbecky <dsrbecky@google.com> Implement CFI for Quick.

CFI is necessary for stack unwinding in gdb, lldb, and libunwind.

Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
cc23481b66fd1f2b459d82da4852073e32f033aa 07-Apr-2015 Vladimir Marko <vmarko@google.com> Promote pointer to dex cache arrays on arm.

Do the use-count analysis on temps (ArtMethod* and the new
PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph
isn't really supposed to know how the ArtMethod* is used by
the backend.

Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
e5c76c515a481074aaa6b869aa16490a47ba98bc 06-Apr-2015 Vladimir Marko <vmarko@google.com> PC-relative loads from dex cache arrays for arm.

Change-Id: Ic25df4b51a901ff1d2ca356b5eec71d4acc5d9b7
6f7158927fee233255f8e96719c374694b10cad3 30-Mar-2015 David Srbecky <dsrbecky@google.com> Write .debug_line section using the new DWARF library.

Also simplify dex to java mapping and handle mapping
in prologues and epilogues.

Change-Id: I410f06024580f2a8788f2c93fe9bca132805029a
20f85597828194c12be10d3a927999def066555e 19-Mar-2015 Vladimir Marko <vmarko@google.com> Fixed layout for dex caches in boot image.

Define a fixed layout for dex cache arrays (type, method,
string and field arrays) for dex caches in the boot image.
This gives those arrays fixed offsets from the boot image
code and allows PC-relative addressing of their elements.

Use the PC-relative load on arm64 for relevant instructions,
i.e. invoke-static, invoke-direct, const-string,
const-class, check-cast and instance-of. This reduces the
arm64 boot.oat on Nexus 9 by 1.1MiB.

This CL provides the infrastructure and shows on the arm64
the gains that we can achieve by having fixed dex cache
arrays' layout. To fully use this for the boot images, we
need to implement the PC-relative addressing for other
architectures. To achieve similar gains for apps, we need
to move the dex cache arrays to a .bss section of the oat
file. These changes will be implemented in subsequent CLs.

(Also remove some compiler_driver.h dependencies to reduce
incremental build times.)

Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
f6737f7ed741b15cfd60c2530dab69f897540735 23-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Clean up Mir2Lir codegen.

Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad().

Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
0b40ecf156e309aa17c72a28cd1b0237dbfb8746 20-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Clean up slow paths.

Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
e15ea086439b41a805d164d2beb07b4ba96aaa97 10-Feb-2015 Hiroshi Yamauchi <yamauchi@google.com> Reserve bits in the lock word for read barriers.

This prepares for the CC collector to use the standard object header
model by storing the read barrier state in the lock word.

Bug: 19355854
Bug: 12687968
Change-Id: Ia7585662dd2cebf0479a3e74f734afe5059fb70f
6ce3eba0f2e6e505ed408cdc40d213c8a512238d 16-Feb-2015 Vladimir Marko <vmarko@google.com> Add suspend checks to special methods.

Generate suspend checks at the beginning of special methods.
If we need to call to runtime, go to the slow path where we
create a simplified but valid frame, spill all arguments,
call art_quick_test_suspend, restore necessary arguments and
return back to the fast path. This keeps the fast path
overhead to a minimum.

Bug: 19245639
Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
72f53af0307b9109a1cfc0671675ce5d45c66d3a 12-Nov-2014 Chao-ying Fu <chao-ying.fu@intel.com> ART: Remove MIRGraph::dex_pc_to_block_map_

This patch removes MIRGraph::dex_pc_to_block_map_, adds a local
variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and
updates several functions to pass dex_pc_to_block_map.
The goal is to limit the scope of dex_pc_to_block_map and
the usage of FindBlock, so that various compiler optimizations
cannot rely on dex pc to look up basic blocks to avoid
duplicated dex pc issues.
Also, this patch changes quick targets to use successor blocks
for switch case target generation at Mir2Lir::InstallSwitchTables().

Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
0b9203e7996ee1856f620f95d95d8a273c43a3df 23-Jan-2015 Andreas Gampe <agampe@google.com> ART: Some Quick cleanup

Make several fields const in CompilationUnit. May benefit some Mir2Lir
code that repeats tests, and in general immutability is good.

Remove compiler_internals.h and refactor some other headers to reduce
overly broad imports (and thus forced recompiles on changes).

Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
7e499925f8b4da46ae51040e9322690f3df992e6 06-Jan-2015 Andreas Gampe <agampe@google.com> ART: Remove LowestSetBit and IsPowerOfTwo

Remove those functions from Mir2Lir and replace with functionality
from utils.h.

Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
9d5c25acdd1e9635fde8f8bf52a126b4d371dabd 26-Nov-2014 Vladimir Marko <vmarko@google.com> Quick: Use 16-bit Thumb2 PUSH/POP when possible.

Generate correct PUSH/POP in Gen{Entry,Exit}Sequence()
to avoid extra processing during insn fixup.

Change-Id: I396168e2a42faee6980d40779c7de9657531867b
bf535be514570fc33fc0a6347a87dcd9097d9bfd 19-Nov-2014 Vladimir Marko <vmarko@google.com> Add card mark to filled-new-array.

Bug: 18032332
Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
2d7210188805292e463be4bcf7a133b654d7e0ea 10-Nov-2014 Mathieu Chartier <mathieuc@google.com> Change 64 bit ArtMethod fields to be pointer sized

Changed the 64 bit entrypoint and gc map fields in ArtMethod to be
pointer sized. This saves a large amount of memory on 32 bit systems.
Reduces ArtMethod size by 16 bytes on 32 bit.

Total number of ArtMethod on low memory mako: 169957
Image size: 49203 methods -> 787248 image size reduction.
Zygote space size: 1070 methods -> 17120 size reduction.
App methods: ~120k -> 2 MB savings.

Savings per app on low memory mako: 125K+ per app
(less active apps -> more image methods per app).

Savings depend on how often the shared methods are on dirty pages vs
shared.

TODO in another CL, delete gc map field from ArtMethod since we
should be able to get it from the Oat method header.

Bug: 17643507

Change-Id: Ie9508f05907a9f693882d4d32a564460bf273ee8

(cherry picked from commit e832e64a7e82d7f72aedbd7d798fb929d458ee8f)
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
832336b3c9eb892045a8de1bb12c9361112ca3c5 09-Oct-2014 Ian Rogers <irogers@google.com> Don't copy fill array data to quick literal pool.

Currently quick copies the fill array data from the dex file to the literal
pool. It then has to go through hoops to pass this PC relative address down
to out-of-line code. Instead, pass the offset of the table to the out-of-line
code and use the CodeItem data associated with the ArtMethod. This reduces
the size of oat code while greatly simplifying it.
Unify the FillArrayData implementation in quick, portable and the interpreters.

Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
f4da675bbc4615c5f854c81964cac9dd1153baea 01-Aug-2014 Vladimir Marko <vmarko@google.com> Implement method calls using relative BL on ARM.

Store the linker patches with each CompiledMethod instead of
keeping them in CompilerDriver. Reorganize oat file creation
to apply the patches as we're writing the method code. Add
framework for platform-specific relative call patches in the
OatWriter. Implement relative call patches for ARM.

Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 22-Sep-2014 Vladimir Marko <vmarko@google.com> Deprecate GrowableArray, use ArenaVector instead.

Purge GrowableArray from Quick and Portable.
Remove GrowableArray<T>::Iterator.

Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
8d0d03e24325463f0060abfd05dba5598044e9b1 07-Jun-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Change temporaries to positive names

Changes compiler temporaries to have positive names. The numbering now
puts them above the code VRs (locals + ins, in that order). The patch also
introduces APIs to query the number of temporaries, locals and ins.

The compiler temp infrastructure suffered from several issues
which are also addressed by this patch:
-There is no longer a queue of compiler temps. This would be polluted
with Method* when post opts were called multiple times.
-Sanity checks have been added to allow requesting of temps from BE
and to prevent temps after frame is committed.
-None of the structures holding temps can overflow because they are
allocated to allow holding maximum temps. Thus temps can be requested
by BE with no problem.
-Since the queue of compiler temps is no longer maintained, it is no
longer possible to refer to a temp that has invalid ssa (because it
was requested before ssa was run).
-The BE can now request temps after all ME allocations and it is guaranteed
to actually receive them.
-ME temps are now treated like normal VRs in all cases with no special
handling. Only the BE temps are handled specially because there are no
references to them from MIRs.
-Deprecated and removed several fields in CompilationUnit that saved
register information and updated callsites to call the new interface from
MIRGraph.

Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
b038ba66a166fb264ca121632f447712e0973b5b 14-Aug-2014 Dave Allison <dallison@google.com> Revert "Revert "Reduce stack usage for overflow checks""

Fixes stack protection issue.
Fixes mac build issue.

This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5.

Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
4cf00ba324f5f6884059796a6ba41937f32e1844 14-Aug-2014 Dave Allison <dallison@google.com> Revert "Reduce stack usage for overflow checks"

This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429.

Change-Id: I282a048994fcd130fe73842b16c21680053c592f
03c9785a8a6d712775cf406c4371d0227c44148f 14-Aug-2014 Dave Allison <dallison@google.com> Revert "Revert "Reduce stack usage for overflow checks""

Fixes stack protection issue.
Fixes mac build issue.

This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5.

Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
83b1940e6482b9d8feba5c492507735686650ea5 14-Aug-2014 Dave Allison <dallison@google.com> Revert "Reduce stack usage for overflow checks"

This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429.

Change-Id: I282a048994fcd130fe73842b16c21680053c592f
63c051a540e6dfc806f656b88ac3a63e99395429 26-Jul-2014 Dave Allison <dallison@google.com> Reduce stack usage for overflow checks

This reduces the stack space reserved for overflow checks to 12K, split
into an 8K gap and a 4K protected region. GC needs over 8K when running
in a stack overflow situation.

Also prevents signal runaway by detecting a signal inside code that
resulted from a signal handler invokation. And adds a max signal count to
the SignalTest to prevent it running forever.

Also reduces the number of iterations for the InterfaceTest as this was
taking (almost) forever with the --trace option on run-test.

Bug: 15435566

Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694

Conflicts:
compiler/optimizing/code_generator_x86_64.cc
runtime/arch/x86/fault_handler_x86.cc
runtime/arch/x86_64/quick_entrypoints_x86_64.S
648d7112609dd19c38131b3e71c37bcbbd19d11e 26-Jul-2014 Dave Allison <dallison@google.com> Reduce stack usage for overflow checks

This reduces the stack space reserved for overflow checks to 12K, split
into an 8K gap and a 4K protected region. GC needs over 8K when running
in a stack overflow situation.

Also prevents signal runaway by detecting a signal inside code that
resulted from a signal handler invokation. And adds a max signal count to
the SignalTest to prevent it running forever.

Also reduces the number of iterations for the InterfaceTest as this was
taking (almost) forever with the --trace option on run-test.

Bug: 15435566

Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694
8c18c2aaedb171f9b03ec49c94b0e33449dc411b 06-Aug-2014 Andreas Gampe <agampe@google.com> ART: Generate chained compare-and-branch for short switches

Refactor Mir2Lir to generate chained compare-and-branch sequences
for short switches on all architectures.

Bug: 16241558

(cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13)

Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
48971b3242e5126bcd800cc9c68df64596b43d13 06-Aug-2014 Andreas Gampe <agampe@google.com> ART: Generate chained compare-and-branch for short switches

Refactor Mir2Lir to generate chained compare-and-branch sequences
for short switches on all architectures.

Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
0f45f22eb3c52f0ece4c56989180e79c6680d825 15-Jul-2014 Andreas Gampe <agampe@google.com> ART: Throw StackOverflowError in native code

Initialize stack-overflow errors in native code to be able to reduce
the preserved area size of the stack.

Includes a refactoring away from constexpr in instruction_set.h to allow
for easy changing of the values.

Bug: 16256184

(cherry picked from commit 7ea6f79bbddd69d5db86a8656a31aaaf64ae2582)

Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
7ea6f79bbddd69d5db86a8656a31aaaf64ae2582 15-Jul-2014 Andreas Gampe <agampe@google.com> ART: Throw StackOverflowError in native code

Initialize stack-overflow errors in native code to be able to reduce
the preserved area size of the stack.

Includes a refactoring away from constexpr in instruction_set.h to allow
for easy changing of the values.

Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
147eb41b53729ec8d5c188d1cac90964a51afb8a 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73

Conflicts:
compiler/dex/quick/arm64/target_arm64.cc
compiler/image_test.cc
runtime/fault_handler.cc
69dfe51b684dd9d510dbcb63295fe180f998efde 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
48f5c47907654350ce30a8dfdda0e977f5d3d39f 27-Jun-2014 Hans Boehm <hboehm@google.com> Replace memory barriers to better reflect Java needs.

Replaces barriers that enforce ordering of one access type
(e.g. Load) with respect to another (e.g. store) with more general
ones that better reflect both Java requirements and actual hardware
barrier/fence instructions. The old code was inconsistent and
unclear about which barriers implied which others. Sometimes
multiple barriers were generated and then eliminated;
sometimes it was assumed that certain barriers implied others.
The new barriers closely parallel those in C++11, though, for now,
we use something closer to the old naming.

Bug: 14685856

Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Add implicit null and stack checks for x86""

Fixes x86_64 cross compile issue. Removes command line options
and property to set implicit checks - this is hard coded now.

This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791.

Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
0025a86411145eb7cd4971f9234fc21c7b4aced1 11-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Revert "Add implicit null and stack checks for x86"""

Broke the build.

This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e.

Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
de68676b24f61a55adc0b22fe828f036a5925c41 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"

This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.

Breaks the build.

Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter""

This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.

Fixes an API comment, and differentiates between inserting and appending.

Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d 23-Jun-2014 Andreas Gampe <agampe@google.com> ART: Split out more cases of Load/StoreRef, volatile as parameter

Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.

On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.

Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
5655e84e8d71697d8ef3ea901a0b853af42c559e 18-Jun-2014 Andreas Gampe <agampe@google.com> ART: Implicit checks in the compiler are independent from Runtime

When cross-compiling, those flags are independent. This is an
initial CL that helps bypass fatal failures when cross-compiling,
as not all architectures support (and have turned on) implicit
checks.

The actual transport for the target architecture when it is
different from the runtime needs to be implemented in a follow-up
CL.

Bug: 15703710
Change-Id: Idc881a9a4abfd38643b862a491a5af9b8841f693
7cd26f355ba83be75b72ed628ed5ee84a3245c4f 19-Jun-2014 Andreas Gampe <agampe@google.com> ART: Target-dependent stack overflow, less check elision

Refactor the separate stack overflow reserved sizes from thread.h
into instruction_set.h and make sure they're used in the compiler.

Refactor the decision on when to elide stack overflow checks:
especially with large interpreter stack frames, it is not a good
idea to elide checks when the frame size is even close to the
reserved size. Currently enforce checks when the frame size is
>= 2KB, but make sure that frame sizes 1KB and below will elide
the checks (number from experience).

Bug: 15728765
Change-Id: I016bfd3d8218170cbccbd123ed5e2203db167c06
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
576ca0cd692c0b6ae70e776de91015b8ff000a08 07-Jun-2014 Ian Rogers <irogers@google.com> Reduce header files including header files.

Main focus is getting heap.h out of runtime.h.

Change-Id: I8d13dce8512816db2820a27b24f5866cc871a04b
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 01-Jun-2014 buzbee <buzbee@google.com> Quick compiler: reference cleanup

For 32-bit targets, object references are 32 bits wide both in
Dalvik virtual registers and in core physical registers. Because of
this, object references and non-floating point values were both
handled as if they had the same register class (kCoreReg).

However, for 64-bit systems, references are 32 bits in Dalvik vregs, but
64 bits in physical registers. Although the same underlying physical
core registers will still be used for object reference and non-float
values, different register class views will be used to represent them.
For example, an object reference in arm64 might be held in x3 at some
point, while the same underlying physical register, w3, would be used
to hold a 32-bit int.

This CL breaks apart the handling of object reference and non-float values
to allow the proper register class (or register view) to be used. A
new register class, kRefReg, is introduced which will map to a 32-bit
core register on 32-bit targets, and 64-bit core registers on 64-bit
targets. From this point on, object references should be allocated
registers in the kRefReg class rather than kCoreReg.

Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
fe8cf8b1c1b4af0f8b4bb639576f7a5fc59f52ea 15-May-2014 Bill Buzbee <buzbee@google.com> Quick Compiler: fix Arm cts failures

Fixes move_wide_16#testN1, move_wide_16#testN2

Two bugs for the price of one (thanks CTS!)

First, the new stack overflow checking code was broken for very
large frames. For Arm on method entry, we only have 1 available
temp register, r12, until argument registers are flushed.
Previously, for explicit checks on large frames,
r12 was immediately loaded with the stack_end value. However,
later on when the frame is extended, if the frame size exceeds
the range of a reg-reg-imm subtract, the codegen utilities will
allocate a new temporary register to complete the operation. r12
was getting clobbered. Similarly, for medium-large frames r12
could get clobbered during frame creation.

What we should always do when directly using fixed registers like
this is to lock them to prevent them from being allocated as a
temp. The other half of the first bug is easily solved by delaying
the load of stack_end until after the new sp is computed. We'll
increase the stall cost, but this is an uncommon case.

The second bug was likely a typo in LoadValueDisp(). I'm a bit
surprised we hadn't hit this one earlier - but perhaps it was
recently introduced. The wrong base register was being used in
the non-float, wide, excessive offset case (which I suppose is also
somewhat uncommon).

Cherry-pick of internal commit If5b30f729e31d86db604045dd7581fd4626e0b55

Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
b14329f90f725af0f67c45dfcb94933a426d63ce 15-May-2014 Andreas Gampe <agampe@google.com> ART: Fix MonitorExit code on ARM

We do not emit barriers on non-SMP systems. But on ARM, we have
places that need to conditionally execute, which is done through
an IT instruction. The guide of said instruction thus changes
between SMP and non-SMP systems.

To cleanly approach this, change the API so that GenMemBarrier
returns whether it generated an instruction. ARM will have to
query the result and update any dependent IT.

Throw a build system error if TARGET_CPU_SMP is not set.

Fix runtime/Android.mk to work with new multilib host.

Bug: 14989275
Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
56e86eaf73eb3efa029f2dd53b2d21e3597d8e5f 15-May-2014 Bill Buzbee <buzbee@google.com> Revert "Revert "Quick Compiler: fix Arm cts failures""

It turns out the medium-large frame explicit stack
overflow check was also broken in a similar way: r12
was live going into the frame extension, but the frame
extension code sometimes needs a free temp.

This reverts commit 9cf44af1a223f905457688931317a4e4cb086a84.

Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
9cf44af1a223f905457688931317a4e4cb086a84 15-May-2014 Bill Buzbee <buzbee@google.com> Revert "Quick Compiler: fix Arm cts failures"

Error detected on further testing.

This reverts commit 06a4809f271c44ec1491e0b07ae9974aa35bc8ad.

Change-Id: Ia7b6b463f6422abac432f1a9484e4e080d003148
06a4809f271c44ec1491e0b07ae9974aa35bc8ad 14-May-2014 buzbee <buzbee@google.com> Quick Compiler: fix Arm cts failures

Fixes move_wide_16#testN1, move_wide_16#testN2

Two bugs for the price of one (thanks CTS!)

First, the new stack overflow checking code was broken for very
large frames. For Arm on method entry, we only have 1 available
temp register, r12, until argument registers are flushed.
Previously, for explicit checks on large frames,
r12 was immediately loaded with the stack_end value. However,
later on when the frame is extended, if the frame size exceeds
the range of a reg-reg-imm subtract, the codegen utilities will
allocate a new temporary register to complete the operation. r12
was getting clobbered.

What we should always do when directly using fixed registers like
this is to lock them to prevent them from being allocated as a
temp. The other half of the first bug is easily solved by delaying
the load of stack_end until after the new sp is computed. We'll
increase the stall cost, but this is an uncommon case.

The second bug was likely a typo in LoadValueDisp(). I'm a bit
surprised we hadn't hit this one earlier - but perhaps it was
recently introduced. The wrong base register was being used in
the non-float, wide, excessive offset case (which I suppose is also
somewhat uncommon).

Change-Id: I2c5074c9570b022af680f472deac9fe72a2e827e
9b9dec8bbcb812315eb0b68b3465c6c567f09527 14-May-2014 Andreas Gampe <agampe@google.com> ART: Fix ARM dmb placement in monitor-exit

This moves the dmb in quick-compiled monitor-exit before the str
perfoming the unlock.

Change-Id: I231f98ff21eb7bac45b4a1b7ff57316deeb858cc
5cd33753b96d92c03e3cb10cb802e68fb6ef2f21 16-Apr-2014 Dave Allison <dallison@google.com> Handle implicit stack overflow without affecting stack walks

This changes the way in which implicit stack overflows are handled
to satisfy concerns about changes to the stack walk code.

Instead of creating a gap in the stack and checking for it in
the stack walker, use the ManagedStack infrastructure to concoct
an invisible gap that will never be seen by a stack walk.

Also, this uses madvise to tell the kernel that the main stack's
protected region will probably never be accessed, and instead
of using memset to map the pages in, use memcpy to read from
them. This will save 32K on the main stack.

Also adds a 'signals' verbosity level as per a review request.

Bug: 14066862
Change-Id: I5257305feeaea241d11e6aa6f021d2a81da20b81
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 25-Apr-2014 Mingyao Yang <mingyao@google.com> Rewrite suspend test check with LIRSlowPath.

Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb 19-Apr-2014 buzbee <buzbee@google.com> Update load/store utilities for 64-bit backends

This CL replaces the typical use of LoadWord/StoreWord
utilities (which, in practice, were 32-bit load/store) in
favor of a new set that make the size explicit. We now have:

LoadWordDisp/StoreWordDisp:
32 or 64 depending on target. Load or store the natural
word size. Expect this to be used infrequently - generally
when we know we're dealing with a native pointer or flushed
register not holding a Dalvik value (Dalvik values will flush
to home location sizes based on Dalvik, rather than the target).

Load32Disp/Store32Disp:
Load or store 32 bits, regardless of target.

Load64Disp/Store64Disp:
Load or store 64 bits, regardless of target.

LoadRefDisp:
Load a 32-bit compressed reference, and expand it to the
natural word size in the target register.

StoreRefDisp:
Compress a reference held in a register of the natural word
size and store it as a 32-bit compressed reference.

Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b 10-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Use trampolines for calls to helpers"""

This reverts commit f9487c039efb4112616d438593a2ab02792e0304.

Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304 09-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Use trampolines for calls to helpers""

This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff.

Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61

Conflicts:
compiler/dex/quick/mir_to_lir.h
081f73e888b3c246cf7635db37b7f1105cf1a2ff 07-Apr-2014 Dave Allison <dallison@google.com> Revert "Use trampolines for calls to helpers"

This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6.

Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6 19-Feb-2014 Dave Allison <dallison@google.com> Use trampolines for calls to helpers

This is an ARM specific optimization to the compiler
that uses trampoline islands to make calls to runtime
helper functions. The intention is to reduce the size
of the generated code (by 2 bytes per call) without
affecting performance.

By default this is on when generating an OAT file. It is
off when compiling to memory.

To switch this off in dex2oat, use the command line option:
--no-helper-trampolines

Enhances disassembler to print the trampoline entry on the
BL instruction like this:

0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend

Bug: 12607709
Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
3da67a558f1fd3d8a157d8044d521753f3f99ac8 03-Apr-2014 Dave Allison <dallison@google.com> Add OpEndIT() for marking the end of OpIT blocks

In ARM we need to prevent code motion to the inside of an
IT block. This was done using a GenBarrier() to mark the end, but
it wasn't obvious that this is what was happening. This CL adds
an explicit OpEndIT() that takes the LIR of the OpIT for future
checks.

Bug: 13751744
Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
dd7624d2b9e599d57762d12031b10b89defc9807 15-Mar-2014 Ian Rogers <irogers@google.com> Allow mixing of thread offsets between 32 and 64bit architectures.

Begin a more full implementation x86-64 REX prefixes.
Doesn't implement 64bit thread offset support for the JNI compiler.

Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
f943914730db8ad2ff03d49a2cacd31885d08fd7 27-Mar-2014 Dave Allison <dallison@google.com> Implement implicit stack overflow checks

This also fixes some failing run tests due to missing
null pointer markers.

The implementation of the implicit stack overflow checks introduces
the ability to have a gap in the stack that is skipped during
stack walk backs. This gap is protected against read/write and
is used to trigger a SIGSEGV at function entry if the stack
will overflow.

Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
05a48b1f8e62564abb7c2fe674e3234d5861647f 01-Apr-2014 Mathieu Chartier <mathieuc@google.com> Fix stack overflow slow path error.

The frame size without spill was being passed into the slow path
instead of the spill size. This was incorrect since only the spills
will have been pushed at the point of the overflow check.

Also addressed an other comment.

Change-Id: Ic6e455122473a8f796b291d71f945bcf72788662
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 07-Mar-2014 buzbee <buzbee@google.com> Continuing register cleanup

Ready for review.

Continue the process of using RegStorage rather than
ints to hold register value in the top layers of codegen.
Given the huge number of changes in this CL, I've attempted
to minimize the number of actual logic changes. With this
CL, the use of ints for registers has largely been eliminated
except in the lowest utility levels. "Wide" utility routines
have been updated to take a single RegStorage rather than
a pair of ints representing low and high registers.

Upcoming CLs will be smaller and more targeted. My expectations:
o Allocate float double registers as a single double rather than
a pair of float single registers.
o Refactor to push code which assumes long and double Dalvik
values are held in a pair of register to the target dependent
layer.
o Clean-up of the xxx_mir.h files to reduce the amount of #defines
for registers. May also do a register renumbering to bring all
of our targets' register naming more consistent. Possibly
introduce a target-independent float/non-float test at the
RegStorage level.

Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
0d507d1e0441e6bd6f3affca3a60774ea920f317 19-Mar-2014 Mathieu Chartier <mathieuc@google.com> Optimize stack overflow handling.

We now subtract the frame size from the stack pointer for methods
which have a frame smaller than a certain size. Also changed code to
use slow paths instead of launchpads.

Delete kStackOverflow launchpad since it is no longer needed.

ARM optimizations:
One less move per stack overflow check (without fault handler for
stack overflows). Use ldr pc instead of ldr r12, b r12.
Code size (boot.oat):
Before: 58405348
After: 57803236

TODO: X86 doesn't have the case for large frames. This could case an
incoming signal to go past the end of the stack (unlikely however).

Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 21-Feb-2014 Dave Allison <dallison@google.com> Implicit null/suspend checks (oat version bump)

This adds the ability to use SEGV signals
to throw NullPointerException exceptions from Java code rather
than having the compiler generate explicit comparisons and
branches. It does this by using sigaction to trap SIGSEGV and when triggered
makes sure it's in compiled code and if so, sets the return
address to the entry point to throw the exception.

It also uses this signal mechanism to determine whether to check
for thread suspension. Instead of the compiler generating calls
to a function to check for threads being suspended, the compiler
will now load indirect via an address in the TLS area. To trigger
a suspend, the contents of this address are changed from something
valid to 0. A SIGSEGV will occur and the handler will check
for a valid instruction pattern before invoking the thread
suspension check code.

If a user program taps SIGSEGV it will prevent our signal handler
working. This will cause a failure in the runtime.

There are two signal handlers at present. You can control them
individually using the flags -implicit-checks: on the runtime
command line. This takes a string parameter, a comma
separated set of strings. Each can be one of:

none switch off
null null pointer checks
suspend suspend checks
all all checks

So to switch only suspend checks on, pass:
-implicit-checks:suspend

There is also -explicit-checks to provide the reverse once
we change the default.

For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar

The default is -implicit-checks:none

There is also a property 'dalvik.vm.implicit_checks' whose value is the same
string as the command option. The default is 'none'. For example to switch on
null checks using the option:

setprop dalvik.vm.implicit_checks null

It only works for ARM right now.

Bumps OAT version number due to change to Thread offsets.

Bug: 13121132
Change-Id: If743849138162f3c7c44a523247e413785677370
83cc7ae96d4176533dd0391a1591d321b0a87f4f 12-Feb-2014 Vladimir Marko <vmarko@google.com> Create a scoped arena allocator and use that for LVN.

This saves more than 0.5s of boot.oat compilation time
on Nexus 5.

TODO: Move other stuff to the scoped allocator. This CL
alone increases the peak memory allocation. By reusing
the memory for other parts of the compilation we should
reduce this overhead.

Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 28-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Revert "Rework Quick compiler's register handling""

This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace.

Ready. Fixed the original type, plus some mechanical changes
for rebasing.

Still needs additional testing, but the problem with the original
CL appears to have been a typo in the definition of the x86
double return template RegLocation.

Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
dbb8c49d540edd2a39076093163c7218f03aa502 28-Feb-2014 Vladimir Marko <vmarko@google.com> Remove non-existent ARM insn kThumb2SubsRRI12.

For kOpSub/kOpAdd, prefer modified immediate encodings
because they set flags.

Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
86ec520fc8b696ed6f164d7b756009ecd6e4aace 26-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Rework Quick compiler's register handling"

This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c.

Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
2c1ed456dcdb027d097825dd98dbe48c71599b6c 20-Feb-2014 buzbee <buzbee@google.com> Rework Quick compiler's register handling

For historical reasons, the Quick backend found it convenient
to consider all 64-bit Dalvik values held in registers
to be contained in a pair of 32-bit registers. Though this
worked well for ARM (with double-precision registers also
treated as a pair of 32-bit single-precision registers) it doesn't
play well with other targets. And, it is somewhat problematic
for 64-bit architectures.

This is the first of several CLs that will rework the way the
Quick backend deals with physical registers. The goal is to
eliminate the "64-bit value backed with 32-bit register pair"
requirement from the target-indendent portions of the backend
and support 64-bit registers throughout.

The key RegLocation struct, which describes the location of
Dalvik virtual register & register pairs, previously contained
fields for high and low physical registers. The low_reg and
high_reg fields are being replaced with a new type: RegStorage.
There will be a single instance of RegStorage for each RegLocation.
Note that RegStorage does not increase the space used. It is
16 bits wide, the same as the sum of the 8-bit low_reg and
high_reg fields.

At a target-independent level, it will describe whether the physical
register storage associated with the Dalvik value is a single 32
bit, single 64 bit, pair of 32 bit or vector. The actual register
number encoding is left to the target-dependent code layer.

Because physical register handling is pervasive throughout the
backend, this restructuring necessarily involves large CLs with
lots of changes. I'm going to roll these out in stages, and
attempt to segregate the CLs with largely mechanical changes from
those which restructure or rework the logic.

This CL is of the mechanical change variety - it replaces low_reg
and high_reg from RegLocation and introduces RegStorage. It also
includes a lot of new code (such as many calls to GetReg())
that should go away in upcoming CLs.

The tentative plan for the subsequent CLs is:

o Rework standard register utilities such as AllocReg() and
FreeReg() to use RegStorage instead of ints.
o Rework the target-independent GenXXX, OpXXX, LoadValue,
StoreValue, etc. routines to take RegStorage rather than
int register encodings.
o Take advantage of the vector representation and eliminate
the current vector field in RegLocation.
o Replace the "wide" variants of codegen utilities that take
low_reg/high_reg pairs with versions that use RegStorage.
o Add 64-bit register target independent codegen utilities
where possible, and where not virtualize with 32-bit general
register and 64-bit general register variants in the target
dependent layer.
o Expand/rework the LIR def/use flags to allow for more registers
(currently, we lose out on 16 MIPS floating point regs as
well as ARM's D16..D31 for lack of space in the masks).
o [Possibly] move the float/non-float determination of a register
from the target-dependent encoding to RegStorage. In other
words, replace IsFpReg(register_encoding_bits).

At the end of the day, all code in the target independent layer
should be using RegStorage, as should much of the target dependent
layer. Ideally, we won't be using the physical register number
encoding extracted from RegStorage (i.e. GetReg()) until the
NewLIRx() layer.

Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
3bc01748ef1c3e43361bdf520947a9d656658bf8 06-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> GenSpecialCase support for x86

Moved GenSpecialCase from being ARM specific to common code to allow
it to be used by x86 quick as well.

Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
80b7f4f217958df6950291a5ae861249bf5b943d 13-Feb-2014 Vladimir Marko <vmarko@google.com> am 47c42cae: am 76559681: Merge "Generate ARM special methods from InlineMethod data."

* commit '47c42caef35bdc29229b2714d78e48fbd7dc57e6':
Generate ARM special methods from InlineMethod data.
502c2a84888b7da075049dcaaeb0156602304f65 06-Feb-2014 Vladimir Marko <vmarko@google.com> Generate ARM special methods from InlineMethod data.

Change-Id: I204b01660a1e515879524018d1371e31f41da59b
c9bf407643329fee7eb2603fdace46eebf618cc6 10-Feb-2014 Vladimir Marko <vmarko@google.com> Fix special getter/setter generation.

Change-Id: I381618bdcc46c51b50e94042f332db99c3a71a38
2bc47809febcf36369dd40877b8226318642b428 10-Feb-2014 Vladimir Marko <vmarko@google.com> Revert "Revert "Check FastInstance() early for special getters and setters.""

This reverts commit 632e458dc267fadfb8120be3ab02701e09e64875.

Change-Id: I5098c41ee84fbbb39397133a7ecfd367fecebe42
632e458dc267fadfb8120be3ab02701e09e64875 08-Feb-2014 Ian Rogers <irogers@google.com> Revert "Check FastInstance() early for special getters and setters."

This reverts commit 5dc5727261e87ba8a418e2d0e970c75f67e4ab79.

Change-Id: I3299c8ca5c3ce3f2de994bab61ea16a734f1de33
f33ffde784fdbbc123dc7d2a5470f2b8ff6c3de4 08-Feb-2014 Ian Rogers <irogers@google.com> Revert "Generate ARM special methods from InlineMethod data."

This reverts commit 1ca62346583210f64092a44a74b5947d51826e7a.

Change-Id: I9589fc6ff15fbea36edb99f6d336040a8e0b4e71
1ca62346583210f64092a44a74b5947d51826e7a 06-Feb-2014 Vladimir Marko <vmarko@google.com> Generate ARM special methods from InlineMethod data.

Change-Id: Icd3af7fae67f9bd33d218056509a23549d1eeba2
5dc5727261e87ba8a418e2d0e970c75f67e4ab79 05-Feb-2014 Vladimir Marko <vmarko@google.com> Check FastInstance() early for special getters and setters.

Perform the FastInstance() check for getters and setters
when they are detected by the inliner. This will help avoid
the FastInstance() check for inlining.

We also record the field offset and whether the field is
volatile and whether the method is static for use when
inlining or generating the special accessors.

Change-Id: I3f832fc9ae263883b8a984be89a3b7793398b55a
58af1f9385742f70aca4fcb5e13aba53b8be2ef4 19-Dec-2013 Vladimir Marko <vmarko@google.com> Clean up usage of carry flag condition codes.

On X86, kCondUlt and kCondUge are bound to CS and CC,
respectively, while on ARM it's the other way around. The
explicit binding in ConditionCode was wrong and misleading
and could lead to subtle bugs. Therefore, we detach those
constants and clean up usage. The CS and CC conditions are
now effectively unused but we keep them around as they may
eventually be useful.

And some minor cleanup and comments.

Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
5816ed48bc339c983b40dc493e96b97821ce7966 27-Nov-2013 Vladimir Marko <vmarko@google.com> Detect special methods at the end of verification.

This moves special method handling to method inliner
and prepares for eventual inlining of these methods.

Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
31c2aac7137b69d5622eea09597500731fbee2ef 09-Dec-2013 Vladimir Marko <vmarko@google.com> Rename ClobberCalleeSave to *Caller*, fix it for x86.

Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
d9c4fc94fa618617f94e1de9af5f034549100753 02-Oct-2013 Ian Rogers <irogers@google.com> Inflate contended lock word by suspending owner.

Bug 6961405.
Don't inflate monitors for Notify and NotifyAll.
Tidy lock word, handle recursive lock case alongside unlocked case and move
assembly out of line (except for ARM quick). Also handle null in out-of-line
assembly as the test is quick and the enter/exit code is already a safepoint.
To gain ownership of a monitor on behalf of another thread, monitor contenders
must not hold the monitor_lock_, so they wait on a condition variable.
Reduce size of per mutex contention log.
Be consistent in calling thin lock thread ids just thread ids.
Fix potential thread death races caused by the use of FindThreadByThreadId,
make it invariant that returned threads are either self or suspended now.

Code size reduction on ARM boot.oat 0.2%.
Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%,
nexus 4 speedup 2.09% on DeltaBlue.

Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
252254b130067cd7a5071865e793966871ae0246 09-Sep-2013 buzbee <buzbee@google.com> More Quick compile-time tuning: labels & branches

This CL represents a roughly 3.5% performance improvement for the
compile phase of dex2oat. Move of the gain comes from avoiding
the generation of dex boundary LIR labels unless a debug listing
is requested. The other significant change is moving from a basic block
ending branch model of "always generate a fall-through branch, and then
delete it if we can" to a "only generate a fall-through branch if we need
it" model.

The data motivating these changes follow. Note that two area of
potentially attractive gain remain: restructing the assembler model and
reworking the register handling utilities. These will be addressed
in subsequent CLs.

--- data follows

The Quick compiler's assembler has shown up on profile reports a bit
more than seems reasonable. We've tried a few quick fixes to apparently
hot portions of the code, but without much gain. So, I've been looking at
the assembly process at a somewhat higher level. There look to be several
potentially good opportunities.

First, an analysis of the makeup of the LIR graph showed a surprisingly
high proportion of LIR pseudo ops. Using the boot classpath as a basis,
we get:

32.8% of all LIR nodes are pseudo ops.
10.4% are LIR instructions which require pc-relative fixups.
11.8% are LIR instructions that have been nop'd by the various
optimization passes.

Looking only at the LIR pseudo ops, we get:
kPseudoDalvikByteCodeBoundary 43.46%
kPseudoNormalBlockLabel 21.14%
kPseudoSafepointPC 20.20%
kPseudoThrowTarget 6.94%
kPseudoTarget 3.03%
kPseudoSuspendTarget 1.95%
kPseudoMethodExit 1.26%
kPseudoMethodEntry 1.26%
kPseudoExportedPC 0.37%
kPseudoCaseLabel 0.30%
kPseudoBarrier 0.07%
kPseudoIntrinsicRetry 0.02%
Total LIR count: 10167292

The standout here is the Dalvik opcode boundary marker. This is just a
label inserted at the beginning of the codegen for each Dalvik bytecode.
If we're also doing a verbose listing, this is also where we hang the
pretty-print disassembly string. However, this label was also
being used as a convenient way to find the target of switch case
statements (and, I think at one point was used in the Mir->GBC conversion
process).

This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only
verbose listing runs, and replaces the codegen uses of the label with
the kPseudoNormalBlockLabel attached to the basic block that contains the
switch case target. Great savings here - 14.3% reduction in the number of
LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6%
of all LIR. That's still a lot, but much better. Possible further
improvements via combining normal labels with kPseudoSafepointPC labels
where appropriate, and also perhaps reduce memory usage by using a
short-hand form for labels rather than a full LIR node. Also, many
of the basic block labels are no longer branch targets by the time
we get to assembly - cheaper to delete, or just ingore?

Here's the "after" LIR pseudo op breakdown:

kPseudoNormalBlockLabel 37.39%
kPseudoSafepointPC 35.72%
kPseudoThrowTarget 12.28%
kPseudoTarget 5.36%
kPseudoSuspendTarget 3.45%
kPseudoMethodEntry 2.24%
kPseudoMethodExit 2.22%
kPseudoExportedPC 0.65%
kPseudoCaseLabel 0.53%
kPseudoBarrier 0.12%
kPseudoIntrinsicRetry 0.04%
Total LIR count: 5748232

Not done in this CL, but it will be worth experimenting with actually
deleting LIR nodes from the graph when they are optimized away, rather
than just setting the NOP bit. Keeping them around is invaluable
during debugging - but when not debugging it may pay off if the cost of
node removal is less than the cost of traversing through dead nodes
in subsequent passes.

Next up (and partially in this CL - but mostly to be done in follow-on
CLs) is the overall assembly process. Inherited from the trace JIT,
the Quick compiler has a fairly simple-minded approach to instruction
assembly. First, a pass is made over the LIR list to assign offsets
to each instruction. Then, the assembly pass is made - which generates
the actual machine instruction bit patterns and pushes the instruction
data into the code_buffer. However, the code generator takes the "always
optimistic" approach to instruction selection and emits the shortest
instruction. If, during assembly, we find that a branch or load doesn't
reach, that short-form instruction is replaces with a longer sequence.

Of course, this invalidates the previously-computed offset calculations.
Assembly thus is an iterative process: compute offsets and then assemble
until we survive an assembly pass without invalidation. This seems
like a likely candidate for improvement. First, I analyzed the
number of retries required, and the reason for invalidation over the
boot classpath load.

The results: more than half of methods don't require a retry, and
very few require more than 1 extra pass:

5 or more: 6 of 96334
4 or more: 22 of 96334
3 or more: 140 of 96334
2 or more: 1794 of 96334 - 2%
1 or more: 40911 of 96334 - 40%
0 retries: 55423 of 96334 - 58%

The interesting group here is the one that requires 1 retry. Looking
at the reason, we see three typical reasons:

1. A cbnz/cbz doesn't reach (only 7 bits of offset)
2. A 16-bit Thumb1 unconditional branch doesn't reach.
3. An unconditional branch which branches to the next instruction
is encountered, and deleted.

The first 2 cases are the cost of the optimistic strategy - nothing
much to change there. However, the interesting case is #3 - dead
branch elimination. A further analysis of the single retry group showed
that 42% of the methods (16305) that required a single retry did so
*only* because of dead branch elimination. The big question here is
why so many dead branches survive to the assembly stage. We have
a dead branch elimination pass which is supposed to catch these - perhaps
it's not working correctly, should be moved later in the optimization
process, or perhaps run multiple times.

Other things to consider:

o Combine the offset generation pass with the assembly pass. Skip
pc-relative fixup assembly (other than assigning offset), but push
LIR* for them into work list. Following the main pass, zip through
the work list and assemble the pc-relative instructions (now that we
know the offsets). This would significantly cut back on traversal
costs.

o Store the assembled bits into both the code buffer and the LIR.
In the event we have to retry, only the pc-relative instructions
would need to be assembled, and we'd finish with a pass over the
LIR just to dumb the bits into the code buffer.

Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 06-Sep-2013 Ian Rogers <irogers@google.com> Refactor CompilerDriver::Compute..FieldInfo

Don't use non-const reference arguments.
Move ins before outs.

Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db 25-Aug-2013 Mathieu Chartier <mathieuc@google.com> New arena memory allocator.

Before we were creating arenas for each method. The issue with doing this
is that we needed to memset each memory allocation. This can be improved
if you start out with arenas that contain all zeroed memory and recycle
them for each method. When you give memory back to the arena pool you do
a single memset to zero out all of the memory that you used.

Always inlined the fast path of the allocation code.

Removed the "zero" parameter since the new arena allocator always returns
zeroed memory.

Host dex2oat time on target oat apks (2 samples each).
Before:
real 1m11.958s
user 4m34.020s
sys 1m28.570s

After:
real 1m9.690s
user 4m17.670s
sys 1m23.960s

Target device dex2oat samples (Mako, Thinkfree.apk):
Without new arena allocator:
0m26.47s real 0m54.60s user 0m25.85s system
0m25.91s real 0m54.39s user 0m26.69s system
0m26.61s real 0m53.77s user 0m27.35s system
0m26.33s real 0m54.90s user 0m25.30s system
0m26.34s real 0m53.94s user 0m27.23s system

With new arena allocator:
0m25.02s real 0m54.46s user 0m19.94s system
0m25.17s real 0m55.06s user 0m20.72s system
0m24.85s real 0m55.14s user 0m19.30s system
0m24.59s real 0m54.02s user 0m20.07s system
0m25.06s real 0m55.00s user 0m20.42s system

Correctness of Thinkfree.apk.oat verified by diffing both of the oat files.

Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
468532ea115657709bc32ee498e701a4c71762d4 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
(cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
848871b4d8481229c32e0d048a9856e5a9a17ef9 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
834b394ee759ed31c5371d8093d7cd8cd90014a8 31-Jul-2013 Brian Carlstrom <bdc@google.com> Merge remote-tracking branch 'goog/dalvik-dev' into merge-art-to-dalvik-dev

Change-Id: I323e9e8c29c3e39d50d9aba93121b26266c52a46
7655f29fabc0a12765de828914a18314382e5a35 29-Jul-2013 Ian Rogers <irogers@google.com> Portable refactorings.

Separate quick from portable entrypoints.
Move architectural dependencies into arch.

Change-Id: I9adbc0a9782e2959fdc3308215f01e3107632b7c
166db04e259ca51838c311891598664deeed85ad 26-Jul-2013 Ian Rogers <irogers@google.com> Move assembler out of runtime into compiler/utils.

Other directory layout bits of clean up. There is still work to separate quick
and portable in some files (e.g. argument visitor, proxy..).

Change-Id: If8fecffda8ba5c4c47a035f0c622c538c6b58351
7934ac288acfb2552bb0b06ec1f61e5820d924a4 26-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comments issues

Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/indent issues

Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
2ce745c06271d5223d57dbf08117b20d5b60694a 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/braces issues

Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81