History log of /art/compiler/dex/quick/codegen_util.cc
Revision Date Author Comments
a26cb57f46fd3f27a930d9d688fe8670c1f24754 23-Apr-2015 David Srbecky <dsrbecky@google.com> ART stack unwinding fixes for libunwind/gdb/lldb.

dex2oat can already generate unwinding and symbol information which
allows tools to create backtrace of mixed native and Java code.

This is a cherry pick from aosp/master which fixes several issues.
Most notably:
* It enables generation of ELF-64 on 64-bit systems (in dex2oat, C
compilers already produce ELF-64). Libunwind requires ELF-64 on
64-bit systems for backtraces to work.
* It enables loading of ELF files with dlopen. This is required for
libunwind to be able to generate backtrace of current process (i.e.
the process requesting backtrace of itself).
* It adds unit test to test the above (32 vs 64 bit, in-proces vs
out-of-process, application code vs framework code).
* Some other fixes or clean-ups which should not be of much
significance but which are easier to include to make the
important CLs cherry-pick cleanly.

This is squash of the following commits from aosp/master:
7381010 ART: CFI Test
e1bbed2 ART: Blacklist CFI test for non-compiled run-tests
aab9f73 ART: Blacklist CFI test for JIT
4437219 ART: Blacklist CFI test for Heap Poisoning
a3a49fe Switch to using ELF-64 for 64-bit architectures.
297ed22 Write 64-bit address in DWARF if we are on 64-bit architecture.
24981a1 Set correct size of PT_PHDR ELF segment.
1a146bf Link .dynamic to .dynstr
67a0653 Make some parts of ELF more (pointer) aligned.
f50fa82 Enable 64-bit CFI tests.
49e1fab Use dlopen to load oat files.
5dedb80 Add more logging output for dlopen.
aa03870 Find the dlopened file using address rather than file path.
82e73dc Release dummy MemMaps corresponding to dlopen.
5c40961 Test that we can unwind framework code.
020c543 Add more log output to the CFI test.
88da3b0 ART: Fix CFI test wrt/ PIC
a70e5b9 CFI test: kill the other process in native code.
ad5fa8c Support generation of CFI in .debug_frame format.
90688ae Fix build - large frame size of ElfWriterQuick<ElfTypes>::Write.
97dabb7 Fix build breakage in dwarf_test.
388d286 Generate just single ARM mapping symbol.
f898087 Split .oat_patches to multiple sections.
491a7fe Fix build - large frame size of ElfWriterQuick<ElfTypes>::Write (again).
8363c77 Add --generate-debug-info flag and remove the other two flags.
461d72a Generate debug info for core.oat files.

Bug: 21924613
Change-Id: I3f944a08dd2ed1df4d8a807da4fee423fdd35eb7
b7fd412dd21eb362931b3a0716c94fd189a66295 04-Jun-2015 Vladimir Marko <vmarko@google.com> Revert "Quick: Create GC map based on compiler data. DO NOT MERGE"

This reverts commit 7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7.

Change-Id: Iadb4462bf8e834c6a847c01ee6eb332a325de22c
c8d000a12d853a72999c96e3b73587bad2be6954 04-Jun-2015 Vladimir Marko <vmarko@google.com> Revert "Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE"

This reverts commit fad2cbf97c71b9742ccd88cc1a5ba13fa918e677.

Change-Id: I175dd9e49014b71a300d987678032bd624a99cf1
fad2cbf97c71b9742ccd88cc1a5ba13fa918e677 25-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE

Follow-up to
https://android-review.googlesource.com/143222

(cherry picked from commit 6e07183e822a32856da9eb60006989496e06a9cc)

Change-Id: I916743c845d9568063cd6a4b2ef71e9cbc43dee8
7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7 20-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Create GC map based on compiler data. DO NOT MERGE

The Quick compiler and verifier sometimes disagree on dalvik
register types (fp/core/ref) for 0/null constants and merged
registers involving 0/null constants. Since the verifier is
more lenient it can mark a register as a reference for GC
where Quick considers it a floating point register or a dead
register (which would have a ref/fp conflict if not dead).
If the compiler used an fp register to hold the zero value,
the core register or stack location used by GC based on the
verifier data can hold an invalid value.

Previously, as a workaround we stored the fp zero value also
in the stack location or core register where GC would look
for it. This wasn't precise and may have missed some cases.

To fix this properly, we now generate GC maps based on the
compiler's notion of references if register promotion is
enabled.

Bug: https://code.google.com/p/android/issues/detail?id=147187

(cherry picked from commit 767c752fddc64e280dba507457e4f06002b5f678)

Change-Id: Id75428fd0a2f6bdd2ccb20ce75cdeab01150e455
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
c2d3221037734567442c5fb9b11a7967b5c87b79 07-May-2015 Vladimir Marko <vmarko@google.com> Quick: Abolish kMirOpCheckPart2.

The tricks played with kMirOpCheckPart2 are making the
native GC map generation unnecessarily complex. They have
caused problems in the past and now there is bad interaction
with the DCE. Rather than fixing it time and again, remove
the pseudo-insn.

(The whole purpose of those tricks seems to be to allow the
register tracking to be used for the throwing insn before
resetting the tracking for the next block. However, it's
questionable whether that's better than processing the
throwing insn with the subsequent instructions.)

Bug: 20736048

(cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb)

Change-Id: I8a60d26c5e6b6b608d68b8bb6b66d411f9a28f90
f80552b7e5f627a5dd07af017b7d65dec010ca48 07-May-2015 Vladimir Marko <vmarko@google.com> Quick: Abolish kMirOpCheckPart2.

The tricks played with kMirOpCheckPart2 are making the
native GC map generation unnecessarily complex. They have
caused problems in the past and now there is bad interaction
with the DCE. Rather than fixing it time and again, remove
the pseudo-insn.

(The whole purpose of those tricks seems to be to allow the
register tracking to be used for the throwing insn before
resetting the tracking for the next block. However, it's
questionable whether that's better than processing the
throwing insn with the subsequent instructions.)

Bug: 20736048

(cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb)

Change-Id: I8a60d26c5e6b6b608d68b8bb6b66d411f9a28f90
0b49f0234e5c01de23501b6a70f09f491fe7b3e0 07-May-2015 Vladimir Marko <vmarko@google.com> Quick: Abolish kMirOpCheckPart2.

The tricks played with kMirOpCheckPart2 are making the
native GC map generation unnecessarily complex. They have
caused problems in the past and now there is bad interaction
with the DCE. Rather than fixing it time and again, remove
the pseudo-insn.

(The whole purpose of those tricks seems to be to allow the
register tracking to be used for the throwing insn before
resetting the tracking for the next block. However, it's
questionable whether that's better than processing the
throwing insn with the subsequent instructions.)

Bug: 20736048

(cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb)

Change-Id: Ifae6c5bd961a2619b50fd3440261762cb9151460
e299f167c9559401548eab71678d4b779e46c2fb 07-May-2015 Vladimir Marko <vmarko@google.com> Quick: Abolish kMirOpCheckPart2.

The tricks played with kMirOpCheckPart2 are making the
native GC map generation unnecessarily complex. They have
caused problems in the past and now there is bad interaction
with the DCE. Rather than fixing it time and again, remove
the pseudo-insn.

(The whole purpose of those tricks seems to be to allow the
register tracking to be used for the throwing insn before
resetting the tracking for the next block. However, it's
questionable whether that's better than processing the
throwing insn with the subsequent instructions.)

Bug: 20736048
Change-Id: I4767e4609914d3b6990da4416e5093e4ca209780
2cebb24bfc3247d3e9be138a3350106737455918 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Replace NULL with nullptr

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
8dc7324da5bd0f2afd2ab558ab04882329a61fe8 12-Apr-2015 David Srbecky <dsrbecky@google.com> Add --include-cfi compiler option.

Decouple generation of CFI from the rest of debug symbols.
This makes it possible to generate oat with CFI but without
the rest of debug symbols.

This is in line with intention of the .eh_frame section.
The section does not have the .debug_ prefix because it
is considered somewhat different to the rest of debug symbols.

Change-Id: I32816ecd4f30ac4e0dc69d69a4993e349c737f96
c6b4dd8980350aaf250f0185f73e9c42ec17cd57 07-Apr-2015 David Srbecky <dsrbecky@google.com> Implement CFI for Optimizing.

CFI is necessary for stack unwinding in gdb, lldb, and libunwind.

Change-Id: I1a3480e3a4a99f48bf7e6e63c4e83a80cfee40a2
1961b609bfefaedb71cee3651c4f931cc3e7393d 08-Apr-2015 Vladimir Marko <vmarko@google.com> Quick: PC-relative loads from dex cache arrays on x86.

Rewrite all PC-relative addressing on x86 and implement
PC-relative loads from dex cache arrays. Don't adjust the
base to point to the start of the method, let it point to
the anchor, i.e. the target of the "call +0" insn.

Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
1109fb3cacc8bb667979780c2b4b12ce5bb64549 07-Apr-2015 David Srbecky <dsrbecky@google.com> Implement CFI for Quick.

CFI is necessary for stack unwinding in gdb, lldb, and libunwind.

Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
8c57831b2b07185ee1986b9af68a351e1ca584c3 07-Apr-2015 David Srbecky <dsrbecky@google.com> Remove the old CFI infrastructure.

Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
cc23481b66fd1f2b459d82da4852073e32f033aa 07-Apr-2015 Vladimir Marko <vmarko@google.com> Promote pointer to dex cache arrays on arm.

Do the use-count analysis on temps (ArtMethod* and the new
PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph
isn't really supposed to know how the ArtMethod* is used by
the backend.

Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
b207e1473dda1730604a28db2b4fa52f2998aeae 02-Apr-2015 Vladimir Marko <vmarko@google.com> Pass linker patches around as const.

Change-Id: I0eabd713d29475db9eb6e186f331dbfb00e0cf6b
6f7158927fee233255f8e96719c374694b10cad3 30-Mar-2015 David Srbecky <dsrbecky@google.com> Write .debug_line section using the new DWARF library.

Also simplify dex to java mapping and handle mapping
in prologues and epilogues.

Change-Id: I410f06024580f2a8788f2c93fe9bca132805029a
20f85597828194c12be10d3a927999def066555e 19-Mar-2015 Vladimir Marko <vmarko@google.com> Fixed layout for dex caches in boot image.

Define a fixed layout for dex cache arrays (type, method,
string and field arrays) for dex caches in the boot image.
This gives those arrays fixed offsets from the boot image
code and allows PC-relative addressing of their elements.

Use the PC-relative load on arm64 for relevant instructions,
i.e. invoke-static, invoke-direct, const-string,
const-class, check-cast and instance-of. This reduces the
arm64 boot.oat on Nexus 9 by 1.1MiB.

This CL provides the infrastructure and shows on the arm64
the gains that we can achieve by having fixed dex cache
arrays' layout. To fully use this for the boot images, we
need to implement the PC-relative addressing for other
architectures. To achieve similar gains for apps, we need
to move the dex cache arrays to a .bss section of the oat
file. These changes will be implemented in subsequent CLs.

(Also remove some compiler_driver.h dependencies to reduce
incremental build times.)

Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
356a1811f2f79d98194475fdbfb5f6b7768455b5 27-Mar-2015 Pavel Vyssotski <pavel.n.vyssotski@intel.com> Quick: Finding upper half of kMirOpCheckPart2 should passthough empty blocks

Mir2Lir::InitReferenceVRegs trying to find throwing instruction for
kMirOpCheckPart2 should traverse possible empty blocks which compiler
optimizations could generate between them.

Change-Id: I2ab29dd36635fd4c4ef2dd81b51e571e206775e6
Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
6e07183e822a32856da9eb60006989496e06a9cc 25-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Fix "select" pattern to update data used for GC maps.

Follow-up to
https://android-review.googlesource.com/143222

Change-Id: I1c12af9a19f76e64fd209f6cc2eaec5587b3083b
f6737f7ed741b15cfd60c2530dab69f897540735 23-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Clean up Mir2Lir codegen.

Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad().

Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
767c752fddc64e280dba507457e4f06002b5f678 20-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Create GC map based on compiler data.

The Quick compiler and verifier sometimes disagree on dalvik
register types (fp/core/ref) for 0/null constants and merged
registers involving 0/null constants. Since the verifier is
more lenient it can mark a register as a reference for GC
where Quick considers it a floating point register or a dead
register (which would have a ref/fp conflict if not dead).
If the compiler used an fp register to hold the zero value,
the core register or stack location used by GC based on the
verifier data can hold an invalid value.

Previously, as a workaround we stored the fp zero value also
in the stack location or core register where GC would look
for it. This wasn't precise and may have missed some cases.

To fix this properly, we now generate GC maps based on the
compiler's notion of references if register promotion is
enabled.

Bug: https://code.google.com/p/android/issues/detail?id=147187
Change-Id: Id3a2f863b16bdb8969df7004c868773084aec421
6ea651f0f4c7de4580beb2e887d86802c1ae0738 24-Feb-2015 Maja Gagic <maja.gagic@imgtec.com> Initial support for quick compiler on MIPS64r6.

Change-Id: I6f43027b84e4a98ea320cddb972d9cf39bf7c4f8
d37f91902048b23ad5fe5b20aba0ebc92e0b4896 04-Mar-2015 Andreas Gampe <agampe@google.com> ART: Do not produce CFI when not asked for

Insignificant time savings on the host, but also reduces native
allocation size.

Change-Id: Iea3d335e5375a0076306059d094e5b994e24b9e6
80b96d1a76790527f72a660ac03d9c215eed17ce 19-Feb-2015 Vladimir Marko <vmarko@google.com> Replace a few std::vector with ArenaVector in Mir2Lir.

Change-Id: I7867d60afc60f57cdbbfd312f02883854d65c805
a78ef44266c38cc4895554e973156a7c7896dd87 12-Feb-2015 Chao-ying Fu <chao-ying.fu@intel.com> ART: Fix InsertCaseLabel to return boundary_lir always

This patch doesn't return new_label when cu_->verbose, because
we will not assign offsets to new_label at this stage.

Change-Id: Ie7f625848b0cf7cabfbba694b5c20b0784bc8501
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
72f53af0307b9109a1cfc0671675ce5d45c66d3a 12-Nov-2014 Chao-ying Fu <chao-ying.fu@intel.com> ART: Remove MIRGraph::dex_pc_to_block_map_

This patch removes MIRGraph::dex_pc_to_block_map_, adds a local
variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and
updates several functions to pass dex_pc_to_block_map.
The goal is to limit the scope of dex_pc_to_block_map and
the usage of FindBlock, so that various compiler optimizations
cannot rely on dex pc to look up basic blocks to avoid
duplicated dex pc issues.
Also, this patch changes quick targets to use successor blocks
for switch case target generation at Mir2Lir::InstallSwitchTables().

Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
9c462086269324350516b3394d478f1d71a4b5d1 27-Jan-2015 Andreas Gampe <agampe@google.com> ART: Even more Quick cleanup

Remove Backend.

Change-Id: I247cc65ccda6a362ba1a8f5e73e7f12ecd980a87
0b9203e7996ee1856f620f95d95d8a273c43a3df 23-Jan-2015 Andreas Gampe <agampe@google.com> ART: Some Quick cleanup

Make several fields const in CompilationUnit. May benefit some Mir2Lir
code that repeats tests, and in general immutability is good.

Remove compiler_internals.h and refactor some other headers to reduce
overly broad imports (and thus forced recompiles on changes).

Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
7e499925f8b4da46ae51040e9322690f3df992e6 06-Jan-2015 Andreas Gampe <agampe@google.com> ART: Remove LowestSetBit and IsPowerOfTwo

Remove those functions from Mir2Lir and replace with functionality
from utils.h.

Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
e21dc3db191df04c100620965bee4617b3b24397 09-Dec-2014 Andreas Gampe <agampe@google.com> ART: Swap-space in the compiler

Introduce a swap-space and corresponding allocator to transparently
switch native allocations to memory backed by a file.

Bug: 18596910

(cherry picked from commit 62746d8d9c4400e4764f162b22bfb1a32be287a9)

Change-Id: I131448f3907115054a592af73db86d2b9257ea33
62746d8d9c4400e4764f162b22bfb1a32be287a9 09-Dec-2014 Andreas Gampe <agampe@google.com> ART: Swap-space in the compiler

Introduce a swap-space and corresponding allocator to transparently
switch native allocations to memory backed by a file.

Bug: 18596910
Change-Id: I131448f3907115054a592af73db86d2b9257ea33
717a3e447c6f7a922cf9c3efe522747a187a045d 13-Nov-2014 Serguei Katkov <serguei.i.katkov@intel.com> Re-factor Quick ABI support

Now every architecture must provide a mapper between
VRs parameters and physical registers. Additionally as
a helper function architecture can provide a bulk copy
helper for GenDalvikArgs utility.
All other things becomes a common code stuff:
GetArgMappingToPhysicalReg, GenDalvikArgsNoRange,
GenDalvikArgsRange, FlushIns.

Mapper now uses shorty representation of input
parameters. This is required due to location are not
enough to detect the type of parameter (fp or core).
For the details
see https://android-review.googlesource.com/#/c/113936/.

Change-Id: Ie762b921e0acaa936518ee6b63c9a9d25f83e434
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
27dee8bcd7b4a53840b60818da8d2c819ef199bd 02-Dec-2014 Mark Mendell <mark.p.mendell@intel.com> X86_64 QBE: use RIP addressing

Take advantage of RIP addressing in 64 bit mode to improve the code
generation for accesses to the constant area as well as packed switches.
Avoid computing the address of the start of the method, which is needed
in 32 bit mode.

To do this, we add a new 'pseudo-register' kRIPReg to minimize the
changes needed to get the new addressing mode to be generated.

Change-Id: Ia28c93f98b09939806d91ff0bd7392e58996d108
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ab972ef472001fa113d54486d7592979e33480b3 04-Dec-2014 Mathieu Chartier <mathieuc@google.com> Remove method verification results right after compiling a method

This saves memory since it allows the code arrays from methods
compiled in future methods to use the ram we just freed from the
verification results.

GmsCore.apk:
Before: dex2oat took 77.383s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=77MB free=13KB
After: dex2oat took 72.180s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=60MB free=13KB

Bug: 18596910
Change-Id: I5d6df380e4fe58751a2b304202083f4d30b33b7c
(cherry picked from commit 25fda92083d5b93b38cc1f6b12ac6a44d992d6a4)
25fda92083d5b93b38cc1f6b12ac6a44d992d6a4 04-Dec-2014 Mathieu Chartier <mathieuc@google.com> Remove method verification results right after compiling a method

This saves memory since it allows the code arrays from methods
compiled in future methods to use the ram we just freed from the
verification results.

GmsCore.apk:
Before: dex2oat took 77.383s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=77MB free=13KB
After: dex2oat took 72.180s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=60MB free=13KB

Bug: 18596910
Change-Id: I5d6df380e4fe58751a2b304202083f4d30b33b7c
7ab2fce83cd72c0963128b098a78606e77ea15d5 28-Nov-2014 Vladimir Marko <vmarko@google.com> Refactor handling of conditional branches with known result.

Detect IF_cc and IF_ccZ instructions with known results in
the basic block optimization phase (instead for the codegen
phase) and replace them with GOTO/NOP. Kill blocks that are
unreachable as a result.

Change-Id: I169c2fa6f1e8af685f4f3a7fe622f5da862ce329
743b98cd3d7db1cfd6b3d7f7795e8abd9d07a42d 24-Nov-2014 Vladimir Marko <vmarko@google.com> Skip null check in MarkGCCard() for known non-null values.

Use GVN's knowledge of non-null values to set a new MIR flag
for IPUT/SPUT/APUT to skip the value null check.

Change-Id: I97a8d1447acb530c9bbbf7b362add366d1486ee1
807140048f82a2b87ee5bcf337f23b6a3d1d5269 21-Nov-2014 Mathieu Chartier <mathieuc@google.com> Add fast string sharpening

String sharpening changes const strings to PC relative loads instead
of always going through the dex cache. This saves code size and
probably improves performance slightly.

Before: 49602992 system@framework@boot.oat
After: 49385904 system@framework@boot.oat

Pre-cursor to removing dex_cache_strings_ field from ArtMethod.

Bug: 17643507

Change-Id: I1787f48774631eee0accafeea257aa8d0e91e8d6
bf535be514570fc33fc0a6347a87dcd9097d9bfd 19-Nov-2014 Vladimir Marko <vmarko@google.com> Add card mark to filled-new-array.

Bug: 18032332
Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
785d2f2116bb57418d81bb55b55a087afee11053 04-Nov-2014 Andreas Gampe <agampe@google.com> ART: Replace COMPILE_ASSERT with static_assert (compiler)

Replace all occurrences of COMPILE_ASSERT in the compiler tree.

Change-Id: Icc40a38c8bdeaaf7305ab3352a838a2cd7e7d840
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
d8c3e3608a7b47e82186e4f8118541ef06d9eab2 08-Oct-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: X86: GenLongArith should handle overlapped VRs

In a case, when src and dest VRs are overlapped when we called
GenLongArith it may cause the incorrect use of regs.

The solution is to map src to an physical reg and work with this
reg instead of mem.

Renamed BadOverlap() to PartiallyIntersects() for consistency.

Change-Id: Ia3fc7f741f0a92556e1b2a1b084506662ef04c9d
Signed-off-by: Katkov, Serguei I <serguei.i.katkov@intel.com>
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
27cc09337cdff14f592f4e22fd235809ebe0d6a7 08-Sep-2014 Matteo Franchin <matteo.franchin@arm.com> AArch64: oat patches should be 32-bit ints.

This makes the arm64 backend consistent with the behaviour of the code
in oat_writer.cc and in the patchoat tool.
It also reduces the size of boot.oat by 1.6% (aosp_arm64-eng build).

Change-Id: Ia0b96737159c08955cd7b776ee396ff578cd58f6
750359753444498d509a756fa9a042e9f3c432df 12-Sep-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Deprecate CompilationUnit's code_item

The code_item field is tracked in both the CompilationUnit and the MIRGraph.
However, the existence of this field in CompilationUnit promotes bad practice
because it creates assumption only a single code_item can be part of method.

This patch deprecates this field and updates MIRGraph methods to make it
easy to get same information as before. Part of this is the update to
interface GetNumDalvikInsn which ensures to count all code_items in MIRGraph.

Some dead code was also removed because it was not friendly to these updates.

Change-Id: Ie979be73cc56350321506cfea58f06d688a7fe99
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
f4da675bbc4615c5f854c81964cac9dd1153baea 01-Aug-2014 Vladimir Marko <vmarko@google.com> Implement method calls using relative BL on ARM.

Store the linker patches with each CompiledMethod instead of
keeping them in CompilerDriver. Reorganize oat file creation
to apply the patches as we're writing the method code. Add
framework for platform-specific relative call patches in the
OatWriter. Implement relative call patches for ARM.

Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 22-Sep-2014 Vladimir Marko <vmarko@google.com> Deprecate GrowableArray, use ArenaVector instead.

Purge GrowableArray from Quick and Portable.
Remove GrowableArray<T>::Iterator.

Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
589e046c483ca0dbee6c28fb617997f43ee28b94 05-Sep-2014 Serguei Katkov <serguei.i.katkov@intel.com> Slow path should break def tracking

Slow path usually results in invocation of runtime. Runtime
should be ensured that all VR on stack are up to date.
To do this we reset def tracking system at the moment of
adding slow path and as a result actual writes to stack
for all VRs will not be optimized away.

The decision is conservative to be safe however
probably not all runtime calls can potentially require VRs
to be on stack. In this case we will need insert reset def
tracking in all places where dangerous slow path is used.

Change-Id: I2cb7698a12c17354060fdbb944e1da1fb922c23b
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
de0b996661351450fa4d918706c5322e001c29c9 27-Aug-2014 Andreas Gampe <agampe@google.com> ART: Fix read-out-of-bounds in the compiler

In case of a wide dalvik register, asking for the constant value
can lead to a read out of bounds.

Bug: 17302671

(cherry picked from commit ade731854d18839823e57fb2d3d67238c5467d15)

Change-Id: Ie1849cd67cc418c97cbd7a8524f027f9b66e4c96
ade731854d18839823e57fb2d3d67238c5467d15 27-Aug-2014 Andreas Gampe <agampe@google.com> ART: Fix read-out-of-bounds in the compiler

In case of a wide dalvik register, asking for the constant value
can lead to a read out of bounds.

Bug: 17302671
Change-Id: Ie1849cd67cc418c97cbd7a8524f027f9b66e4c96
8d0d03e24325463f0060abfd05dba5598044e9b1 07-Jun-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Change temporaries to positive names

Changes compiler temporaries to have positive names. The numbering now
puts them above the code VRs (locals + ins, in that order). The patch also
introduces APIs to query the number of temporaries, locals and ins.

The compiler temp infrastructure suffered from several issues
which are also addressed by this patch:
-There is no longer a queue of compiler temps. This would be polluted
with Method* when post opts were called multiple times.
-Sanity checks have been added to allow requesting of temps from BE
and to prevent temps after frame is committed.
-None of the structures holding temps can overflow because they are
allocated to allow holding maximum temps. Thus temps can be requested
by BE with no problem.
-Since the queue of compiler temps is no longer maintained, it is no
longer possible to refer to a temp that has invalid ssa (because it
was requested before ssa was run).
-The BE can now request temps after all ME allocations and it is guaranteed
to actually receive them.
-ME temps are now treated like normal VRs in all cases with no special
handling. Only the BE temps are handled specially because there are no
references to them from MIRs.
-Deprecated and removed several fields in CompilationUnit that saved
register information and updated callsites to call the new interface from
MIRGraph.

Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
e3ea83811d47152c00abea24a9b420651a33b496 08-Aug-2014 Yevgeny Rouban <yevgeny.y.rouban@intel.com> ART source line debug info in OAT files

OAT files have source line information enough for ART runtime needs like
jump to/from interpreter and thread suspension. But this information
is not enough for finer grained source level debugging and low-level
profiling (VTune or perf).

This patch adds to OAT files two additional sections:
.debug_line - DWARF formatted Elf32 section with detailed source line
information (mapping from native PC to Java source lines).

In addition to the debugging symbols added using the dex2oat option
--include-debug-symbols, the source line information is added to
the section .debug_line.

The source line info can be read by many Elf reading tools like objdump,
readelf, dwarfdump, gdb, perf, VTune, ...

gdb can use this debug line information in x86. In 64-bit mode
the information can be used if the oat file is mapped in the lower
address space (address has higher 32 bits zeroed). Relocation works.

Testing:
1. art/test/run-test --host --gdb [--64] 001-HelloWorld
2. in gdb: break Main.java:19
3. in gdb: break Runtime.java:111
4. in gdb: run - stops at void java.lang.Runtime.<init>()
5. in gdb: backtrace - shows call stack down to main()
6. in gdb: continue - stops at void Main.main() (only in 32-bit mode)
7. in gdb: backtrace - shows call stack down to main()
8. objdump -W <oat-file> - addresses are from VMA range of .text
section reported by objdump -h <file>
9. dwarfdump -ka <oat-file> - no errors expected

Size of aosp-x86-eng boot.oat increased by 11% from 80.5Mb to 89.2Mb
with two sections added .debug_line (7.2Mb) and .rel.debug (1.5Mb).

Change-Id: Ib8828832686e49782a63d5529008ff4814ed9cda
Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
4fc785398707ede68f29768748b7fe5fa39dde24 07-Aug-2014 Fred Shih <ffred@google.com> Fixed build breakage due to incorrect class TypeId.

Fixed incorrect type id being inserted in code buffer and got rid of
inefficient pointer wrapping in LoadClassType.

Change-Id: I7ee1d957ebcd816445c26199723ac50787d926d7
e7f82e2515f47f3c3292281312d7031a34a58ffc 06-Aug-2014 Fred Shih <ffred@google.com> Added support for patching classes from different dex files.

Added support for class patching from different dex files and moved
ScopedObjectAccess from the quick compiler to driver. Slight refactoring
for clarity.

Bug: 16656190
Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
547cdfd21ee21e4ab9ca8692d6ef47c62ee7ea52 05-Aug-2014 Tong Shen <endlessroad@google.com> Emit CFI for x86 & x86_64 JNI compiler.

Now for host-side x86 & x86_64 ART, we are able to get complete stacktrace with even mixed C/C++ & Java stack frames.

Testing:
1. art/test/run-test --host --gdb [--64] --no-relocate 005
2. In gdb, run 'b art::Class_classForName' which is implementation of a Java native method, then 'r'
3. In gdb, run 'bt'. You should see stack frames down to main()

Change-Id: I2d17e9aa0f6d42d374b5362a15ea35a2fce96302
8081d2b8d7a743729557051d0294e040e61c747a 31-Jul-2014 Vladimir Marko <vmarko@google.com> Create allocator adapter for using Arena in std containers.

Create ArenaAllocatorAdapter, similar to the existing
ScopedArenaAllocatorAdapter, for allocating memory for
standard containers via the ArenaAllocator. Add the ability
to specify allocation kind rather than just kArenaAllocSTL
to both adapters. Move the scoped arena allocator to the
scoped_arena_containers.h header file.

Define template aliases for containers using the new adapter
and change a few MIRGraph and Mir2Lir members to use them.

Change-Id: I9bbc50248e0fed81729497b848cb29bf68444268
147eb41b53729ec8d5c188d1cac90964a51afb8a 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73

Conflicts:
compiler/dex/quick/arm64/target_arm64.cc
compiler/image_test.cc
runtime/fault_handler.cc
69dfe51b684dd9d510dbcb63295fe180f998efde 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
ccc60264229ac96d798528d2cb7dbbdd0deca993 05-Jul-2014 Andreas Gampe <agampe@google.com> ART: Rework TargetReg(symbolic_reg, wide)

Make the standard implementation in Mir2Lir and the specialized one
in the x86 backend return a pair when wide = "true". Introduce
WideKind enumeration to improve code readability. Simplify generic
code based on this implementation.

Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Add implicit null and stack checks for x86""

Fixes x86_64 cross compile issue. Removes command line options
and property to set implicit checks - this is hard coded now.

This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791.

Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
c380191f3048db2a3796d65db8e5d5a5e7b08c65 08-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86_64: Enable fp-reg promotion

Patch introduces 4 register XMM12-15 available for promotion of
fp virtual registers.

Change-Id: I3f89ad07fc8ae98b70f550eada09be7b693ffb67
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
0025a86411145eb7cd4971f9234fc21c7b4aced1 11-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Revert "Add implicit null and stack checks for x86"""

Broke the build.

This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e.

Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf 29-May-2014 Dave Allison <dallison@google.com> Add implicit null and stack checks for x86

This adds compiler and runtime changes for x86
implicit checks. 32 bit only.

Both host and target are supported.
By default, on the host, the implicit checks are null pointer and
stack overflow. Suspend is implemented but not switched on.

Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
3d14eb620716e92c21c4d2c2d11a95be53319791 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Add implicit null and stack checks for x86"

It breaks cross compilation with x86_64.

This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf.

Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
a77ee5103532abb197f492c14a9e6fb437054e2a 02-Jul-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: TargetReg update for x86

Also includes changes in common code. Elimination of use of TargetReg
with one parameter and direct access to special target registers.

Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 22-Jun-2014 buzbee <buzbee@google.com> Register promotion support for 64-bit targets

Not sufficiently tested for 64-bit targets, but should be
fairly close.

A significant amount of refactoring could stil be done, (in
later CLs).

With this change we are not making any changes to the vmap
scheme. As a result, it is a requirement that if a vreg
is promoted to both a 32-bit view and the low half of a
64-bit view it must share the same physical register. We
may change this restriction later on to allow for more flexibility
for 32-bit Arm.

For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to
promote, we'd end up with something like:

v4 (as an int) -> r10
v4/v5 (as a long) -> r10
v5 (as an int) -> r11
v5/v6 (as a long) -> r11

Fix a couple of ARM64 bugs on the way...

Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
4b537a851b686402513a7c4a4e60f5457bb8d7c1 01-Jul-2014 Andreas Gampe <agampe@google.com> ART: Quick compiler: More size checks, add TargetReg variants

Add variants for TargetReg for requesting specific register usage,
e.g., wide and ref. More register size checks.

With code adapted from https://android-review.googlesource.com/#/c/98605/.

Change-Id: I852d3be509d4dcd242c7283da702a2a76357278d
de68676b24f61a55adc0b22fe828f036a5925c41 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"

This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.

Breaks the build.

Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter""

This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.

Fixes an API comment, and differentiates between inserting and appending.

Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d 23-Jun-2014 Andreas Gampe <agampe@google.com> ART: Split out more cases of Load/StoreRef, volatile as parameter

Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.

On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.

Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
85089dd28a39dd20f42ac258398b2a08668f9ef1 26-May-2014 buzbee <buzbee@google.com> Quick compiler: generalize NarrowRegLoc()

Some of the RegStorage utilites (DoubleToLowSingle(),
DoubleToHighSingle(), etc.) worked only for targets which
which treat double precision registers as a pair of aliased
single precision registers.

This CL elminates those utilities, and replaces them with
a new RegisterInfo utility that will search an aliased register
set and return the member matching the required storage
configuration (if it exists).

Change-Id: Iff5de10f467d20a56e1a89df9fbf30d1cf63c240
a51a0b0300268b605e3ad71b0e87ff394032c5e7 21-May-2014 Vladimir Marko <vmarko@google.com> Method inlining across dex files in boot image.

Fix LoadCodeAddress() and LoadMethodAddress() to use the dex
file in addition to the method index to uniquely identify
the literal. With that fix in place, when we have both the
direct code and the direct method, we can safely pass the
actual target method id instead of the method id from the
same dex file in the method lowering info. This was already
done for calls from apps into boot image (and thus there was
a bug with a tiny risk of the wrong literal being used) and
now we also do that for calls within the boot image. The
latter allows the inlining pass to inline many more methods
than before in the boot image.

Bug: 15021903
Change-Id: Ic765ce9809b43ef07e7db32b8e3fbc9acb09147f
b01bf15d18f9b08d77e7a3c6e2897af0e02bf8ca 14-May-2014 buzbee <buzbee@google.com> 64-bit temp register support.

Add a 64-bit temp register allocation path. The recent physical
register handling rework supports multiple views of the same
physical register (or, such as for Arm's float/double regs,
different parts of the same physical register).

This CL adds a 64-bit core register view for 64-bit targets. In
short, each core register will have a 64-bit name, and a 32-bit
name. The different views will be kept in separate register pools,
but aliasing will be tracked. The core temp register allocation
routines will be largely identical - except for 32-bit targets,
which will continue to use pairs of 32-bit core registers for holding
long values.

Change-Id: I8f118e845eac7903ad8b6dcec1952f185023c053
700a402244a1a423da4f3ba8032459f4b65fa18f 20-May-2014 Ian Rogers <irogers@google.com> Now we have a proper C++ library, use std::unique_ptr.

Also remove the Android.libcxx.mk and other bits of stlport compatibility
mechanics.

Change-Id: Icdf7188ba3c79cdf5617672c1cfd0a68ae596a61
d65c51a556e6649db4e18bd083c8fec37607a442 29-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Add support for constant vector literals

Add in some vector instructions. Implement the ConstVector
instruction, which takes 4 words of data and loads it into
an XMM register.

Initially, only the ConstVector MIR opcode is implemented. Others will
be added after this one goes in.

Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e45fb9e7976c8462b94a58ad60b006b0eacec49f 06-May-2014 Matteo Franchin <matteo.franchin@arm.com> AArch64: Change arm64 backend to produce A64 code.

The arm backend clone is changed to produce A64 code. At the moment
this backend can only compile simple methods (both leaf and non-leaf).

Most of the work on the assembler (assembler_arm64.cc) has been done.
Some work on the LIR generation layer (functions such as OpRegRegImm
& friends) is still necessary. The register allocator still needs to
be adapted to the A64 instruction set (it is mostly unchanged from
the arm backend). Offsets for helpers in gen_invoke.cc still need to
be changed to work on 64-bit.

Change-Id: I388f99eeb832857981c7d9d5cb5b71af64a4b921
72d32629303f8f39362a4099481f48646aed042f 07-May-2014 Ian Rogers <irogers@google.com> Give Compiler a back reference to the driver.

The compiler driver is a single object delegating work to the compiler, rather
than passing it through to every Compiler call make it a member of Compiler so
that it maybe queried. This simplifies the Compiler API and makes the
relationship to CompilerDriver more explicit.
Remove reference arguments that contravene code style.

Change-Id: Iba47f2e3cbda679a7ec7588f26188d77643aa2c6
660188264dee3c8f3510e2e24c11816c6b60f197 06-May-2014 Andreas Gampe <agampe@google.com> ART: Use utils.h::RoundUp instead of explicit bit-fiddling

Change-Id: I249a2cfeb044d3699d02e13d42b8e72518571640
f29a4244bbc278843237f0ae242de077e093b580 05-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Fix frame size calculation for 64-bit

Calculate frame size in the same way as calculated in patch
"64bit changes to the stack walker for the Quick ABI"

Change-Id: I8c2458f5973536a84f3fd6ad56167b5cfafa9ab4
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
8194963098247be6bca9cc4a54dbfa65c73e8ccc 02-May-2014 Vladimir Marko <vmarko@google.com> Replace CountOneBits and __builtin_popcount with POPCOUNT.

Clean up utils.h, make some functions constexpr.

Change-Id: I2399100280cbce81c3c4f5765f0680c1ddcb5883
ff093b31d75658c3404f9b51ee45760f346f06d9 01-May-2014 Ian Rogers <irogers@google.com> Fix a few 64-bit compilation of 32-bit code issues.

Bug: 13423943

Change-Id: I939389413af0a68c0d95b23cd598b7c42afa4383
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 25-Apr-2014 Mingyao Yang <mingyao@google.com> Rewrite suspend test check with LIRSlowPath.

Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6 29-Apr-2014 buzbee <buzbee@google.com> Quick compiler: debugging assists

A few minor assists to ease A/B debugging in the Quick
compiler:
1. To save time, the assemblers for some targets only
update the object code offsets on instructions involved with
pc-relative fixups. We add code to fix up all offsets when
doing a verbose codegen listing.
2. Temp registers are normally allocated in a round-robin
fashion. When disabling liveness tracking, we now reset the
round-robin pool to 0 on each instruction boundary. This makes
it easier to spot real codegen differences.
3. Self-register copies were previously emitted, but
marked as nops. Minor change to avoid generating them in the
first place and reduce clutter.

Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb 19-Apr-2014 buzbee <buzbee@google.com> Update load/store utilities for 64-bit backends

This CL replaces the typical use of LoadWord/StoreWord
utilities (which, in practice, were 32-bit load/store) in
favor of a new set that make the size explicit. We now have:

LoadWordDisp/StoreWordDisp:
32 or 64 depending on target. Load or store the natural
word size. Expect this to be used infrequently - generally
when we know we're dealing with a native pointer or flushed
register not holding a Dalvik value (Dalvik values will flush
to home location sizes based on Dalvik, rather than the target).

Load32Disp/Store32Disp:
Load or store 32 bits, regardless of target.

Load64Disp/Store64Disp:
Load or store 64 bits, regardless of target.

LoadRefDisp:
Load a 32-bit compressed reference, and expand it to the
natural word size in the target register.

StoreRefDisp:
Compress a reference held in a register of the natural word
size and store it as a 32-bit compressed reference.

Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
3a74d15ccc9a902874473ac9632e568b19b91b1c 22-Apr-2014 Mingyao Yang <mingyao@google.com> Delete throw launchpads.

Bug: 13170824

Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b 10-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Use trampolines for calls to helpers"""

This reverts commit f9487c039efb4112616d438593a2ab02792e0304.

Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304 09-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Use trampolines for calls to helpers""

This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff.

Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61

Conflicts:
compiler/dex/quick/mir_to_lir.h
081f73e888b3c246cf7635db37b7f1105cf1a2ff 07-Apr-2014 Dave Allison <dallison@google.com> Revert "Use trampolines for calls to helpers"

This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6.

Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6 19-Feb-2014 Dave Allison <dallison@google.com> Use trampolines for calls to helpers

This is an ARM specific optimization to the compiler
that uses trampoline islands to make calls to runtime
helper functions. The intention is to reduce the size
of the generated code (by 2 bytes per call) without
affecting performance.

By default this is on when generating an OAT file. It is
off when compiling to memory.

To switch this off in dex2oat, use the command line option:
--no-helper-trampolines

Enhances disassembler to print the trampoline entry on the
BL instruction like this:

0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend

Bug: 12607709
Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
6a58cb16d803c9a7b3a75ccac8be19dd9d4e520d 02-Apr-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> art: Handle x86_64 architecture equal to x86

This patch forces FE/ME to treat x86_64 as x86 exactly.
The x86_64 logic will be revised later when assembly will be ready.

Change-Id: I4a92477a6eeaa9a11fd710d35c602d8d6f88cbb6
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
f943914730db8ad2ff03d49a2cacd31885d08fd7 27-Mar-2014 Dave Allison <dallison@google.com> Implement implicit stack overflow checks

This also fixes some failing run tests due to missing
null pointer markers.

The implementation of the implicit stack overflow checks introduces
the ability to have a gap in the stack that is skipped during
stack walk backs. This gap is protected against read/write and
is used to trigger a SIGSEGV at function entry if the stack
will overflow.

Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 07-Mar-2014 buzbee <buzbee@google.com> Continuing register cleanup

Ready for review.

Continue the process of using RegStorage rather than
ints to hold register value in the top layers of codegen.
Given the huge number of changes in this CL, I've attempted
to minimize the number of actual logic changes. With this
CL, the use of ints for registers has largely been eliminated
except in the lowest utility levels. "Wide" utility routines
have been updated to take a single RegStorage rather than
a pair of ints representing low and high registers.

Upcoming CLs will be smaller and more targeted. My expectations:
o Allocate float double registers as a single double rather than
a pair of float single registers.
o Refactor to push code which assumes long and double Dalvik
values are held in a pair of register to the target dependent
layer.
o Clean-up of the xxx_mir.h files to reduce the amount of #defines
for registers. May also do a register renumbering to bring all
of our targets' register naming more consistent. Possibly
introduce a target-independent float/non-float test at the
RegStorage level.

Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
92cf83e001357329cbf41fa15a6e053fab6f4933 18-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Run Java tests with the optimizing compiler.

Also fix a vector.reserve -> vector.resize braino, and build
a GC map that dex2oat expects.

Change-Id: I6acf2f90a4c32f90b79bf7709bf2e43931b98757
3bc8615332b7848dec8c2297a40f7e4d176c0efb 13-Mar-2014 Vladimir Marko <vmarko@google.com> Use LIRSlowPath for intrinsics, improve String.indexOf().

Rewrite intrinsic launchpads to use the LIRSlowPath.
Improve String.indexOf for constant chars by avoiding
the check for code points over 0xFFFF.

Change-Id: I7fd5583214c5b4ab9c38ee36c5d6f003dd6345a8
49161cef10a308aedada18e9aa742498d6e6c8c7 12-Mar-2014 Jeff Hao <jeffhao@google.com> Allow patching between dex files in the boot classpath.

Change-Id: I53f219a5382d0fcd580e96e50025fdad4fc399df
83cc7ae96d4176533dd0391a1591d321b0a87f4f 12-Feb-2014 Vladimir Marko <vmarko@google.com> Create a scoped arena allocator and use that for LVN.

This saves more than 0.5s of boot.oat compilation time
on Nexus 5.

TODO: Move other stuff to the scoped allocator. This CL
alone increases the peak memory allocation. By reusing
the memory for other parts of the compilation we should
reduce this overhead.

Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
a1a7074eb8256d101f7b5d256cda26d7de6ce6ce 03-Mar-2014 Vladimir Marko <vmarko@google.com> Rewrite kMirOpSelect for all IF_ccZ opcodes.

Also improve special cases for ARM and add tests.

Change-Id: I06f575b9c7b547dbc431dbfadf2b927151fe16b9
2da882315a61072664f7ce3c212307342e907207 27-Feb-2014 Andreas Gampe <agampe@google.com> Initial changes towards Generic JNI option

Some initial changes that lead to an UNIMPLEMENTED. Works
by not compiling for JNI right now and tracking native methods
which have neither quick nor portable code. Uses new trampoline.

Change-Id: I5448654044eb2717752fd7359f4ef8bd5c17be6e
be0e546730e532ef0987cd4bde2c6f5a1b14dd2a 26-Feb-2014 Vladimir Marko <vmarko@google.com> Cache field lowering info in mir_graph.

Change-Id: I9f9d76e3ae6c31e88bdf3f59820d31a625da020f
ae9fd93c39a341e2dffe15c61cc7d9e841fa92c4 11-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Tell GDB about Quick ART generated code

This is actually a lot of work. To do this, we need:
.debug_info
.debug_abbrev
.debug_frame
.debug_str

These are generated into the OAT file by OatWriter and ElfWriterQuick.

Since the Quick ART runtime doesn't use dlopen to load the OAT files,
GDB can't find this information. Use the alternate GDB JIT interface,
which can be invoked at runtime. To use this interface, an ELF image
needs to be built in memory. Read the information from the OAT file,
fixup the addresses to point to the real locations, add a symbol table
to hold the .text symbol, and then let GDB know about the information,
which will be read from the runtime address space.

This is quite primitive now, and could be cleaned up considerably. It
probably needs symbol table entries for the methods, and descriptions of
parameters and return types.

Currently only supported for X86.

This defaults to enabled for debug builds. Added dexoat --gen-gdb-info
and --no-gen-gdb-info flags to override.

Change-Id: I4d18b2370f6dfaa00c8cc1925f10717be3bd1a62
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2e589aa58a1372909f95e731fd6b8895f6359c3a 25-Feb-2014 Vladimir Marko <vmarko@google.com> Encode VmapTable entries offset by 2 to reduce size.

We're using special values 0xffff and 0xfffe for an
fp register marker and for method pointer, respectively.
These values were being encoded as 3 bytes each and
this changes their encoding to 1 byte.

Bug: 9437697
Change-Id: Ic1720e898b131a5d3f6ca87d8e1ecdf76fb4160a
3bc01748ef1c3e43361bdf520947a9d656658bf8 06-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> GenSpecialCase support for x86

Moved GenSpecialCase from being ARM specific to common code to allow
it to be used by x86 quick as well.

Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
55d0eac918321e0525f6e6491f36a80977e0d416 06-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Support Direct Method/Type access for X86

Thumb generates code to optimize calls to methods within core.oat.
Implement this for X86 as well, but take advantage of mov with 32 bit
immediate and call relative with 32 bit immediate.

Fix some incorrect return locations for long inlines.

Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bcec6fba95ee7974d3f7b81c3c02e7eb3ca3df00 17-Jan-2014 Dave Allison <dallison@google.com> Make slow paths easier to write

This adds a class LIRSlowPath that allows for deferred compilation
of slow paths. Using this object you can add code that will be
invoked out of line using a forward branch. The intention is to
move the slow paths out of the main flow and avoid branch-over
constructs that will almost always trigger. The forward branch
to the slow path code will be predicted false and this will
be correct most of the time. The slow path code returns to the
instruction after the original branch using an unconditional branch.

This is used in the following opcodes: sput, sget, const-string,
check-cast, const-class.

Others will follow.

Bug: 10864890
Change-Id: I17130c5dc20d369bc6bbf50b8cf04343263e888e
d69835d841cb7663faaa2f1996e73e8c0b3f6d76 03-Feb-2014 buzbee <buzbee@google.com> Art Compiler: fix compiler temps

AOSP CL 78835 "Enable compiler temporaries" built on some earlier
work to enable the compiler to add temps in the style of Dalvik's
vRegs during MIR optimizations. However, it missed an existing
fixed-size array whose size depended on the number of temps allocated.
The allocation of this array must be delayed until after the
number of compiler temps is known.

The result was array overrun, and strange failures.

Change-Id: I986a3b557e2323e00ba852584de03a02931b3c78
21caf91a53d34be300fee7aef4589990dce0fbde 03-Feb-2014 buzbee <buzbee@google.com> Art Compiler: fix compiler temps

Fix for b/12871909 "Art compiler segfaults in several top apps"

AOSP CL 78835 "Enable compiler temporaries" built on some earlier
work to enable the compiler to add temps in the style of Dalvik's
vRegs during MIR optimizations. However, it missed an existing
fixed-size array whose size depended on the number of temps allocated.
The allocation of this array must be delayed until after the
number of compiler temps is known.

The result was array overrun, and strange failures.

Change-Id: I986a3b557e2323e00ba852584de03a02931b3c78
da7a69b3fa7bb22d087567364b7eb5a75824efd8 09-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Enable compiler temporaries

Compiler temporaries are a facility for having virtual register sized space
for dealing with intermediate values during MIR transformations. They receive
explicit space in managed frames so they can have a home location in case they
need to be spilled. The facility also supports "special" temporaries which
have specific semantic purpose and their location in frame must be tracked.

The compiler temporaries are treated in the same way as virtual registers
so that the MIR level transformations do not need to have special logic. However,
generated code needs to know stack layout so that it can distinguish between
home locations.

MIRGraph has received an interface for dealing with compiler temporaries. This
interface allows allocation of wide and non-wide virtual register temporaries.

The information about how temporaries are kept on stack has been moved to
stack.h. This is was necessary because stack layout is dependent on where the
temporaries are placed.

Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
2730db03beee4d6687ddfb5000c33c0370fbc6eb 27-Jan-2014 Vladimir Marko <vmarko@google.com> Add VerfiedMethod to DexCompilationUnit.

Avoid some mutex locking and map lookups.

Change-Id: I8e0486af77e38dcd065569572a6b985eb57f4f63
c7f832061fea59fd6abd125f26c8ca1faec695a5 24-Jan-2014 Vladimir Marko <vmarko@google.com> Refactor verification results.

Rename VerificationMethodsData to VerificationResults.
Create new class VerifiedMethod to hold all the data for
a given method.

Change-Id: Ife1ac67cede20f3a2f9c7f5345f08a851cf1ed20
766e9295d2c34cd1846d81610c9045b5d5093ddd 27-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve GenConstString, GenS{get,put} for x86

Rewrite GenConstString for x86 to skip calling ResolveString when the
string is already resolved. Also try to avoid a register copy if the
Method* is in a promoted register.

Implement the TODO for GenS{get,put} to use compare to memory for x86 by
adding a new codegen function to compare directly to memory. Implement
a default implementation that uses a temporary register for RISC
architectures.

Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
4708dcd68eebf1173aef1097dad8ab13466059aa 22-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long multiply and shifts

Generate inline code for long shifts by constants and do long
multiplication inline. Convert multiplication by a constant to a
shift when we can. Fix some x86 assembler problems and add the new
instructions that were needed (64 bit shifts).

Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
107c31e598b649a8bb8d959d6a0377937e63e624 24-Jan-2014 Ian Rogers <irogers@google.com> 64bit friendly printf modifiers in LIR dumping.

Also correct header file inclusion ordering.

Change-Id: I8fb99e80cf1487e8b2278d4c1d110d14ed18c086
be1ca55db3362f5b100c4c65da5342fd299520bb 15-Jan-2014 Hiroshi Yamauchi <yamauchi@google.com> Use direct class pointers at allocation sites in the compiled code.

- Rather than looking up a class from its type ID (and checking if
it's resolved/initialized, resolving/initializing if not), use
direct class pointers, if possible (boot-code-to-boot-class pointers
and app-code-to-boot-class pointers.)
- This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4.
- Embedding the object size (along with class pointers) caused a 1-2%
slowdown in MemAllocTest and isn't implemented in this change.
- TODO: do the same for array allocations.
- TODO: when/if an application gets its own image, implement
app-code-to-app-class pointers.
- Fix a -XX:gc bug.
cf. https://android-review.googlesource.com/79460/
- Add /tmp/android-data/dalvik-cache to the list of locations to
remove oat files in clean-oat-host.
cf. https://android-review.googlesource.com/79550
- Add back a dropped UNLIKELY in FindMethodFromCode().
cf. https://android-review.googlesource.com/74205

Bug: 9986565
Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
5115473c81ec855a5646a5f755afb26aa7f2b1e9 02-Jan-2014 Vladimir Marko <vmarko@google.com> Fix oatdump "compilercallbacks" option for runtime.

The "compilercallbacks" runtime option replaced "compiler"
in I708ca13227c809e07917ff3879a89722017e83a9 .

Fix a comment in codegen_util.cc .

Change-Id: I2c5ebd56dd96f0ee8e62b602bfe45357565471ff
5816ed48bc339c983b40dc493e96b97821ce7966 27-Nov-2013 Vladimir Marko <vmarko@google.com> Detect special methods at the end of verification.

This moves special method handling to method inliner
and prepares for eventual inlining of these methods.

Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
2b5eaa2b49f7489bafdadc4b4463ae27e4261817 13-Dec-2013 Vladimir Marko <vmarko@google.com> Move compiler code out of method verifier.

We want to detect small methods for inlining at the end of
the method verification. Instead of adding more compiler
code to the runtime, we create a callback from the runtime
into the compiler, so that we can keep the code there.
Additionally, we move the compiler-related code that was
already in the method verifier to the compiler since it
doesn't really belong to the runtime in the first place.

Change-Id: I708ca13227c809e07917ff3879a89722017e83a9
8171fc34bf74ed0df02385787d916bc13eb7f160 26-Nov-2013 Vladimir Marko <vmarko@google.com> Don't prefix GC map by length.

Bug: 11767815
Change-Id: I063917aefdf7674ee1a77736db059c9ee95ea075
06606b9c4a1c00154ed15f719ad8ea994e54ee8e 02-Dec-2013 Vladimir Marko <vmarko@google.com> Performance improvement for mapping table creation.

Avoid the raw mapping tables altogether.

Change-Id: I6d1c786325d369e899a75f15701edbafdd14363f
1e6cb63d77090ddc6aa19c755d7066f66e9ff87e 28-Nov-2013 Vladimir Marko <vmarko@google.com> Delta-encoding of mapping tables.

Both PC offsets and dalvik offsets are delta-encoded. Since
PC offsets are increasing, the deltas are then compressed as
unsigned LEB128. Dalvik offsets are not monotonic, so their
deltas are compressed as signed LEB128.

This reduces the size of the mapping tables by about 30%
on average, 25% from the PC offset and 5% from the dalvik
offset delta encoding.

Bug: 9437697
Change-Id: I600ab9c22dec178088d4947a811cca3bc8bd4cf4
5c96e6b4dc354a7439b211b93462fbe8edea5e57 14-Nov-2013 Vladimir Marko <vmarko@google.com> Rewrite intrinsics detection.

Intrinsic methods should be treated as a special case of
inline methods. They should be detected early and used to
guide other optimizations. This CL rewrites the intrinsics
detection so that it can be moved to any compilation phase.

Change-Id: I4424a6a869bd98b9c478953c9e3bcaf1c6de2b33
0b1191cfece83f6f8d4101575a06555a2d13387a 28-Oct-2013 Bill Buzbee <buzbee@google.com> Revert "Revert "Null check elimination improvement""

This reverts commit 31aa97cfec5ee76b2f2496464e1b6f9e11d21a29.

..and thereby brings back change 380165, which was reverted
because it was buggy.

Three problems with the original CL:

1. The author ran the pre-submit tests, but used -j24 and
failed to search the output for fail messages.
2. The new null check analysis pass uses an interative
approach to identify whether a null check is needed. It
is possible that the null-check-required state may
oscillate, and a logic error caused it to stick in the
"no check needed" state.
3. Our old nemesis Dalvik untyped constants, in which 0 values
can be used both as object reference and non-object references.
This CL conservatively treats all CONST definitions as
potential object definitions for the purposes of null
check elimination.

Change-Id: I3c1744e44318276e42989502a314585e56ac57a0
31aa97cfec5ee76b2f2496464e1b6f9e11d21a29 26-Oct-2013 Ian Rogers <irogers@google.com> Revert "Null check elimination improvement"

This reverts commit 4db179d1821a9e78819d5adc8057a72f49e2aed8.

Change-Id: I059c15c85860c6c9f235b5dabaaef2edebaf1de2
a61f49539a59b610e557b5513695295639496750 23-Aug-2013 buzbee <buzbee@google.com> Add timing logger to Quick compiler

Current Quick compiler breakdown for compiling the boot class path:

MIR2LIR: 29.674%
MIROpt:SSATransform: 17.656%
MIROpt:BBOpt: 11.508%
BuildMIRGraph: 7.815%
Assemble: 6.898%
MIROpt:ConstantProp: 5.151%
Cleanup: 4.916%
MIROpt:NullCheckElimination: 4.085%
RegisterAllocation: 3.972%
GcMap: 2.359%
Launchpads: 2.147%
PcMappingTable: 2.145%
MIROpt:CodeLayout: 0.697%
LiteralData: 0.654%
SpecialMIR2LIR: 0.323%

Change-Id: I9f77e825faf79e6f6b214bb42edcc4b36f55d291
4db179d1821a9e78819d5adc8057a72f49e2aed8 23-Oct-2013 buzbee <buzbee@google.com> Null check elimination improvement

See b/10862777

Improves the null check elimination pass by tracking visibility
of object definitions, rather than successful uses of object
dereferences. For boot class path, increases static null
check elimination success rate from 98.4% to 98.6%. Reduces
size of boot.oat by ~300K bytes.

Fixes loop nesting depth computation, which is used by register
promotion, and tweaked the heuristics.

Fixes a bug in verbose listing output in which a basic block
id is directly dereferenced, rather than first being converted
to a pointer.

Change-Id: Id01c20b533cdb12ea8fc4be576438407d0a34cec
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
b48819db07f9a0992a72173380c24249d7fc648a 15-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: assembly phase

Not as much compile-time gain from reworking the assembly phase as I'd
hoped, but still worthwhile. Should see ~2% improvement thanks to
the assembly rework. On the other hand, expect some huge gains for some
application thanks to better detection of large machine-generated init
methods. Thinkfree shows a 25% improvement.

The major assembly change was to establish thread the LIR nodes that
require fixup into a fixup chain. Only those are processed during the
final assembly pass(es). This doesn't help for methods which only
require a single pass to assemble, but does speed up the larger methods
which required multiple assembly passes.

Also replaced the block_map_ basic block lookup table (which contained
space for a BasicBlock* for each dex instruction unit) with a block id
map - cutting its space requirements by half in a 32-bit pointer
environment.

Changes:
o Reduce size of LIR struct by 12.5% (one of the big memory users)
o Repurpose the use/def portion of the LIR after optimization complete.
o Encode instruction bits to LIR
o Thread LIR nodes requiring pc fixup
o Change follow-on assembly passes to only consider fixup LIRs
o Switch on pc-rel fixup kind
o Fast-path for small methods - single pass assembly
o Avoid using cb[n]z for null checks (almost always exceed displacement)
o Improve detection of large initialization methods.
o Rework def/use flag setup.
o Remove a sequential search from FindBlock using lookup table of 16-bit
block ids rather than full block pointers.
o Eliminate pcRelFixup and use fixup kind instead.
o Add check for 16-bit overflow on dex offset.

Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
d91d6d6a80748f277fd938a412211e5af28913b1 26-Sep-2013 Ian Rogers <irogers@google.com> Introduce Signature type to avoid string comparisons.

Method resolution currently creates strings to then compare with strings formed
from methods in other dex files. The temporary strings are purely created for
the sake of comparisons. This change creates a new Signature type that
represents a method signature but not as a string. This type supports
comparisons and so can be used when searching for methods in resolution.

With this change malloc is no longer the hottest method during dex2oat (now its
memset) and allocations during verification have been reduced. The verifier is
commonly what is populating the dex cache for methods and fields not declared
in the dex file itself.

Change-Id: I5ef0542823fbcae868aaa4a2457e8da7df0e9dae
ee39a10e45a6a0880e8b829525c40d6055818560 19-Sep-2013 Ian Rogers <irogers@google.com> Use class def index from java.lang.Class.

Bug: 10244719
This removes the computation of the dex file index, when necessary this is
computed by searching the dex file. Its only necessary in
dalvik.system.DexFile.defineClassNative and DexFile::FindInClassPath, the
latter not showing up significantly in profiling with this change.

(cherry-picked from 8b2c0b9abc3f520495f4387ea040132ba85cae69)
Change-Id: I20c73a3b17d86286428ab0fd21bc13f51f36c85c
8b2c0b9abc3f520495f4387ea040132ba85cae69 19-Sep-2013 Ian Rogers <irogers@google.com> Use class def index from java.lang.Class.

Bug: 10244719
Depends on:
https://googleplex-android-review.git.corp.google.com/362363
This removes the computation of the dex file index, when necessary this is
computed by searching the dex file. Its only necessary in
dalvik.system.DexFile.defineClassNative and DexFile::FindInClassPath, the
latter not showing up significantly in profiling with this change.

Change-Id: I20c73a3b17d86286428ab0fd21bc13f51f36c85c
bd663de599b16229085759366c56e2ed5a1dc7ec 11-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: register/bb utilities

This CL yeilds about a 4% improvement in the compilation phase
of dex2oat (single-threaded; multi-threaded compilation is
more difficult to accurately measure). The register utilities
could stand to be completely rewritten, but this gets most of the
easy benefit.

Next up: the assembly phase.

Change-Id: Ife5a474e9b1a6d9e501e888dda6749d34eb77e96
252254b130067cd7a5071865e793966871ae0246 09-Sep-2013 buzbee <buzbee@google.com> More Quick compile-time tuning: labels & branches

This CL represents a roughly 3.5% performance improvement for the
compile phase of dex2oat. Move of the gain comes from avoiding
the generation of dex boundary LIR labels unless a debug listing
is requested. The other significant change is moving from a basic block
ending branch model of "always generate a fall-through branch, and then
delete it if we can" to a "only generate a fall-through branch if we need
it" model.

The data motivating these changes follow. Note that two area of
potentially attractive gain remain: restructing the assembler model and
reworking the register handling utilities. These will be addressed
in subsequent CLs.

--- data follows

The Quick compiler's assembler has shown up on profile reports a bit
more than seems reasonable. We've tried a few quick fixes to apparently
hot portions of the code, but without much gain. So, I've been looking at
the assembly process at a somewhat higher level. There look to be several
potentially good opportunities.

First, an analysis of the makeup of the LIR graph showed a surprisingly
high proportion of LIR pseudo ops. Using the boot classpath as a basis,
we get:

32.8% of all LIR nodes are pseudo ops.
10.4% are LIR instructions which require pc-relative fixups.
11.8% are LIR instructions that have been nop'd by the various
optimization passes.

Looking only at the LIR pseudo ops, we get:
kPseudoDalvikByteCodeBoundary 43.46%
kPseudoNormalBlockLabel 21.14%
kPseudoSafepointPC 20.20%
kPseudoThrowTarget 6.94%
kPseudoTarget 3.03%
kPseudoSuspendTarget 1.95%
kPseudoMethodExit 1.26%
kPseudoMethodEntry 1.26%
kPseudoExportedPC 0.37%
kPseudoCaseLabel 0.30%
kPseudoBarrier 0.07%
kPseudoIntrinsicRetry 0.02%
Total LIR count: 10167292

The standout here is the Dalvik opcode boundary marker. This is just a
label inserted at the beginning of the codegen for each Dalvik bytecode.
If we're also doing a verbose listing, this is also where we hang the
pretty-print disassembly string. However, this label was also
being used as a convenient way to find the target of switch case
statements (and, I think at one point was used in the Mir->GBC conversion
process).

This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only
verbose listing runs, and replaces the codegen uses of the label with
the kPseudoNormalBlockLabel attached to the basic block that contains the
switch case target. Great savings here - 14.3% reduction in the number of
LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6%
of all LIR. That's still a lot, but much better. Possible further
improvements via combining normal labels with kPseudoSafepointPC labels
where appropriate, and also perhaps reduce memory usage by using a
short-hand form for labels rather than a full LIR node. Also, many
of the basic block labels are no longer branch targets by the time
we get to assembly - cheaper to delete, or just ingore?

Here's the "after" LIR pseudo op breakdown:

kPseudoNormalBlockLabel 37.39%
kPseudoSafepointPC 35.72%
kPseudoThrowTarget 12.28%
kPseudoTarget 5.36%
kPseudoSuspendTarget 3.45%
kPseudoMethodEntry 2.24%
kPseudoMethodExit 2.22%
kPseudoExportedPC 0.65%
kPseudoCaseLabel 0.53%
kPseudoBarrier 0.12%
kPseudoIntrinsicRetry 0.04%
Total LIR count: 5748232

Not done in this CL, but it will be worth experimenting with actually
deleting LIR nodes from the graph when they are optimized away, rather
than just setting the NOP bit. Keeping them around is invaluable
during debugging - but when not debugging it may pay off if the cost of
node removal is less than the cost of traversing through dead nodes
in subsequent passes.

Next up (and partially in this CL - but mostly to be done in follow-on
CLs) is the overall assembly process. Inherited from the trace JIT,
the Quick compiler has a fairly simple-minded approach to instruction
assembly. First, a pass is made over the LIR list to assign offsets
to each instruction. Then, the assembly pass is made - which generates
the actual machine instruction bit patterns and pushes the instruction
data into the code_buffer. However, the code generator takes the "always
optimistic" approach to instruction selection and emits the shortest
instruction. If, during assembly, we find that a branch or load doesn't
reach, that short-form instruction is replaces with a longer sequence.

Of course, this invalidates the previously-computed offset calculations.
Assembly thus is an iterative process: compute offsets and then assemble
until we survive an assembly pass without invalidation. This seems
like a likely candidate for improvement. First, I analyzed the
number of retries required, and the reason for invalidation over the
boot classpath load.

The results: more than half of methods don't require a retry, and
very few require more than 1 extra pass:

5 or more: 6 of 96334
4 or more: 22 of 96334
3 or more: 140 of 96334
2 or more: 1794 of 96334 - 2%
1 or more: 40911 of 96334 - 40%
0 retries: 55423 of 96334 - 58%

The interesting group here is the one that requires 1 retry. Looking
at the reason, we see three typical reasons:

1. A cbnz/cbz doesn't reach (only 7 bits of offset)
2. A 16-bit Thumb1 unconditional branch doesn't reach.
3. An unconditional branch which branches to the next instruction
is encountered, and deleted.

The first 2 cases are the cost of the optimistic strategy - nothing
much to change there. However, the interesting case is #3 - dead
branch elimination. A further analysis of the single retry group showed
that 42% of the methods (16305) that required a single retry did so
*only* because of dead branch elimination. The big question here is
why so many dead branches survive to the assembly stage. We have
a dead branch elimination pass which is supposed to catch these - perhaps
it's not working correctly, should be moved later in the optimization
process, or perhaps run multiple times.

Other things to consider:

o Combine the offset generation pass with the assembly pass. Skip
pc-relative fixup assembly (other than assigning offset), but push
LIR* for them into work list. Following the main pass, zip through
the work list and assemble the pc-relative instructions (now that we
know the offsets). This would significantly cut back on traversal
costs.

o Store the assembled bits into both the code buffer and the LIR.
In the event we have to retry, only the pc-relative instructions
would need to be assembled, and we'd finish with a pass over the
LIR just to dumb the bits into the code buffer.

Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
28c2300d9a85f4e7288fb5d94280332f923b4df3 07-Sep-2013 buzbee <buzbee@google.com> More compile-time tuning

Small, but measurable, improvement.

Change-Id: Ie3c7180f9f9cbfb1729588e7a4b2cf6c6d291c95
56c717860df2d71d66fb77aa77f29dd346e559d3 06-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning

Specialized the dataflow iterators and did a few other minor tweaks.
Showing ~5% compile-time improvement in a single-threaded environment;
less in multi-threaded (presumably because we're blocked by something
else).

Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 06-Sep-2013 Ian Rogers <irogers@google.com> Refactor CompilerDriver::Compute..FieldInfo

Don't use non-const reference arguments.
Move ins before outs.

Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
193bad9b9cfd10642043fa2ebbfc68bd5f9ede4b 30-Aug-2013 Mathieu Chartier <mathieuc@google.com> Multi threaded hashed deduplication during compilation.

Moved deduplication to be in the compiler driver instead of oat
writer. This enables deduplication to be performed on multiple
threads. Also added a hash function to avoid excessive comparison
of byte arrays.

Improvements:
Before (alloats host):
real 1m6.967s
user 4m22.940s
sys 1m22.610s

Thinkfree.apk (target mako):
0m23.74s real 0m50.95s user 0m9.50s system
0m24.62s real 0m50.61s user 0m10.07s system
0m24.22s real 0m51.44s user 0m10.09s system
0m23.70s real 0m51.05s user 0m9.97s system
0m23.50s real 0m50.74s user 0m10.63s system

After (alloats host):
real 1m5.705s
user 4m44.030s
sys 1m29.990s

Thinkfree.apk (target mako):
0m23.32s real 0m51.38s user 0m10.00s system
0m23.49s real 0m51.20s user 0m9.80s system
0m23.18s real 0m50.80s user 0m9.77s system
0m23.52s real 0m51.22s user 0m10.02s system
0m23.50s real 0m51.55s user 0m9.46s system

Bug: 10552630

Change-Id: Ia6d06a747b86b0bfc4473b3cd68f8ce1a1c7eb22
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db 25-Aug-2013 Mathieu Chartier <mathieuc@google.com> New arena memory allocator.

Before we were creating arenas for each method. The issue with doing this
is that we needed to memset each memory allocation. This can be improved
if you start out with arenas that contain all zeroed memory and recycle
them for each method. When you give memory back to the arena pool you do
a single memset to zero out all of the memory that you used.

Always inlined the fast path of the allocation code.

Removed the "zero" parameter since the new arena allocator always returns
zeroed memory.

Host dex2oat time on target oat apks (2 samples each).
Before:
real 1m11.958s
user 4m34.020s
sys 1m28.570s

After:
real 1m9.690s
user 4m17.670s
sys 1m23.960s

Target device dex2oat samples (Mako, Thinkfree.apk):
Without new arena allocator:
0m26.47s real 0m54.60s user 0m25.85s system
0m25.91s real 0m54.39s user 0m26.69s system
0m26.61s real 0m53.77s user 0m27.35s system
0m26.33s real 0m54.90s user 0m25.30s system
0m26.34s real 0m53.94s user 0m27.23s system

With new arena allocator:
0m25.02s real 0m54.46s user 0m19.94s system
0m25.17s real 0m55.06s user 0m20.72s system
0m24.85s real 0m55.14s user 0m19.30s system
0m24.59s real 0m54.02s user 0m20.07s system
0m25.06s real 0m55.00s user 0m20.42s system

Correctness of Thinkfree.apk.oat verified by diffing both of the oat files.

Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
96faf5b363d922ae91cf25404dee0e87c740c7c5 10-Aug-2013 Ian Rogers <irogers@google.com> Uleb128 compression of vmap and mapping table.

Bug 9437697.

Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a
(cherry picked from commit 1809a72a66d245ae598582d658b93a24ac3bf01e)
1809a72a66d245ae598582d658b93a24ac3bf01e 10-Aug-2013 Ian Rogers <irogers@google.com> Uleb128 compression of vmap and mapping table.

Bug 9437697.

Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a
7934ac288acfb2552bb0b06ec1f61e5820d924a4 26-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comments issues

Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
479f83c196d5a95e36196eac548dc6019e70a5be 19-Jul-2013 buzbee <buzbee@google.com> Dex compiler: re-enable method pattern matching

The dex compiler's mechanism to detect simple methods and emit
streamlined code was disabled during the last big restructuring
(there was a question of how to make it useful for Portable as
well as Quick). This CL does not address the Portable question,
but turns the optimization back on for Quick.

See b/9428200

Change-Id: I9f25b41219d7a243ec64efb18278e5a874766f4d
2d88862f0752a7a0e65145b088f49dabd49d4284 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fixing cpplint readability/casting issues

Change-Id: I6821da0e23737995a9b884a04e9b63fac640cd05
02c8cc6d1312a2b55533f02f6369dc7c94672f90 19-Jul-2013 Brian Carlstrom <bdc@google.com> Fixing cpplint whitespace/blank_line, whitespace/end_of_line, whitespace/labels, whitespace/semicolon issues

Change-Id: Ide4f8ea608338b3fed528de7582cfeb2011997b6
df62950e7a32031b82360c407d46a37b94188fbb 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/parens issues

Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
0cd7ec2dcd8d7ba30bf3ca420b40dac52849876c 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/blank_line issues

Change-Id: Ice937e95e23dd622c17054551d4ae4cebd0ef8a2
f69863b3039fc621ff4250e262d2a024d5e79ec8 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/newline issues

Change-Id: Ie2049d9f667339e41f36c4f5d09f0d10d8d2c762
2ce745c06271d5223d57dbf08117b20d5b60694a 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/braces issues

Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81