History log of /art/compiler/dex/quick/x86/codegen_x86.h
Revision Date Author Comments
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
c4013ea00d9e63533f3badeed0131bb2eb859c90 22-Apr-2015 Chao-ying Fu <chao-ying.fu@intel.com> ART: Fix addpd opcode, add Quick x86 assembler test

This patch fixes the addpd opcode that may be used by vectorizations,
and adds an assembler test for the Quick x86 assembler, currently
lightly testing addpd, subpd and mulpd.

Change-Id: I29455a86212829c75fd75737679280f167da7b5b
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
1961b609bfefaedb71cee3651c4f931cc3e7393d 08-Apr-2015 Vladimir Marko <vmarko@google.com> Quick: PC-relative loads from dex cache arrays on x86.

Rewrite all PC-relative addressing on x86 and implement
PC-relative loads from dex cache arrays. Don't adjust the
base to point to the start of the method, let it point to
the anchor, i.e. the target of the "call +0" insn.

Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
8c57831b2b07185ee1986b9af68a351e1ca584c3 07-Apr-2015 David Srbecky <dsrbecky@google.com> Remove the old CFI infrastructure.

Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
dc56cc509d8e1718ad321f7a91661dbe85ec8cef 27-Mar-2015 Vladimir Marko <vmarko@google.com> PC-relative loads from dex cache arrays for x86-64.

Change-Id: I6cfe22c7e69512b3c0f95b073aaa572db74ec189
f6737f7ed741b15cfd60c2530dab69f897540735 23-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Clean up Mir2Lir codegen.

Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad().

Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
6ce3eba0f2e6e505ed408cdc40d213c8a512238d 16-Feb-2015 Vladimir Marko <vmarko@google.com> Add suspend checks to special methods.

Generate suspend checks at the beginning of special methods.
If we need to call to runtime, go to the slow path where we
create a simplified but valid frame, spill all arguments,
call art_quick_test_suspend, restore necessary arguments and
return back to the fast path. This keeps the fast path
overhead to a minimum.

Bug: 19245639
Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
72f53af0307b9109a1cfc0671675ce5d45c66d3a 12-Nov-2014 Chao-ying Fu <chao-ying.fu@intel.com> ART: Remove MIRGraph::dex_pc_to_block_map_

This patch removes MIRGraph::dex_pc_to_block_map_, adds a local
variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and
updates several functions to pass dex_pc_to_block_map.
The goal is to limit the scope of dex_pc_to_block_map and
the usage of FindBlock, so that various compiler optimizations
cannot rely on dex pc to look up basic blocks to avoid
duplicated dex pc issues.
Also, this patch changes quick targets to use successor blocks
for switch case target generation at Mir2Lir::InstallSwitchTables().

Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
966c3ae95d3c699ee9fbdbccc1acdaaf02325faf 27-Jan-2015 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)""

This reverts commit 949c91fb91f40a4a80b2b492913cf8541008975e.

This time, don't clobber EBX before saving it.

Redo some of the macros to make register usage explicit.

Change-Id: I8db8662877cd006816e16a28f42444ab7c36bfef
949c91fb91f40a4a80b2b492913cf8541008975e 27-Jan-2015 Vladimir Marko <vmarko@google.com> Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)"

And the 3 Mac build fixes. Fix conflicts in context_x86.* .

This reverts commits
3d2c8e74c27efee58e24ec31441124f3f21384b9 ,
34eda1dd66b92a361797c63d57fa19e83c08a1b4 ,
f601d1954348b71186fa160a0ae6a1f4f1c5aee6 ,
bc503348a1da573488503cc2819c9e30807bea31 .

Bug: 19150481
Change-Id: I6650ee30a7d261159380fe2119e14379e4dc9970
0b9203e7996ee1856f620f95d95d8a273c43a3df 23-Jan-2015 Andreas Gampe <agampe@google.com> ART: Some Quick cleanup

Make several fields const in CompilationUnit. May benefit some Mir2Lir
code that repeats tests, and in general immutability is good.

Remove compiler_internals.h and refactor some other headers to reduce
overly broad imports (and thus forced recompiles on changes).

Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
3d2c8e74c27efee58e24ec31441124f3f21384b9 13-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> ART: Implement X86 hard float (Quick/JNI/Baseline)

Use XMM0-XMM3 as parameter registers for float/double on X86. X86_64
already uses XMM0-XMM7 for parameters.

Change the 'hidden' argument register from XMM0 to XMM7 to avoid a
conflict.

Add support for FPR save/restore in runtime/arch/x86.

Minimal support for Optimizing baseline compiler.

Bump the version in runtime/oat.h because this is an ABI change.

Change-Id: Ia6fe150e8488b9e582b0178c0dda65fc81d5a8ba
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
0f9b03c65e0ee8bdc5ddf58af100f5fc356cc98b 12-Jan-2015 Vladimir Marko <vmarko@google.com> Revert "ART: Implement hard float for X86"

This reverts commit 59b9cf7ec0ccc13df91be0bd5c723b8c52410739.

Change-Id: I08333b528032480def474286dc368d916a07e17f
59b9cf7ec0ccc13df91be0bd5c723b8c52410739 09-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> ART: Implement hard float for X86

Use XMM0-XMM3 as parameter registers for float/double on X86. X86_64
already uses XMM0-XMM7 for parameters.

Change the 'hidden' argument register from XMM0 to XMM7 to avoid a
conflict.

This change was requested to simplify the Optimizing compiler
implementation.

Change-Id: I89ba8ade99b9a8a5b1ad1ee5f5cbfd33d656bfaa
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
717a3e447c6f7a922cf9c3efe522747a187a045d 13-Nov-2014 Serguei Katkov <serguei.i.katkov@intel.com> Re-factor Quick ABI support

Now every architecture must provide a mapper between
VRs parameters and physical registers. Additionally as
a helper function architecture can provide a bulk copy
helper for GenDalvikArgs utility.
All other things becomes a common code stuff:
GetArgMappingToPhysicalReg, GenDalvikArgsNoRange,
GenDalvikArgsRange, FlushIns.

Mapper now uses shorty representation of input
parameters. This is required due to location are not
enough to detect the type of parameter (fp or core).
For the details
see https://android-review.googlesource.com/#/c/113936/.

Change-Id: Ie762b921e0acaa936518ee6b63c9a9d25f83e434
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
6af820639c74e769ffc1f54930f6ebc11364f894 26-Nov-2014 Yevgeny Rouban <yevgeny.y.rouban@intel.com> ART: x86 specific clearing higher bits when converting long to int

The following problem description is taken from
https://android-review.googlesource.com/107261
If destination and source of long-to-int is the same physical
register on 64-bit then we do not emit any instructions but
consider that destination is a 32-bit view of source register.
As a result high part contains garbage. If the destination is
used later as index to array access then this garbage is used
in computation of address because address is 64-bit. For all
other cases garbage is just ignored.

A generic solution (113023) for all hw platforms was suggested
but rejected later for the sake of HW specific solution:
https://android-review.googlesource.com/113023
https://android-review.googlesource.com/114436

This patch is a rework of patch 113023 to stick with x86_64
specific changes: for 64-bit target this patch forces generating
reg-to-reg copy if the src and dest are the same physical
registers. This makes the higher bits be zeroed by 32-bit move
instruction.

Change-Id: Id29af839506ff9319ffba08b2e86e240fef4dafd
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
da96aeda912ff317de2c41e5a49bd244427238ac 27-Oct-2014 Chao-ying Fu <chao-ying.fu@intel.com> ART: Generate switch targets from successor blocks

This patch relies on the successor blocks to generate switch targets
in GenSmallPackedSwitch and GenSmallSparseSwitch for all quick targets.
In x86, we create a new packed switch table by storing basic block
ids instead of dex offsets, and we override MarkPackedCaseLabels and
InsertCaseLabel to avoid calling FindBlock.

Change-Id: Ibb5983db582f0965aba787b520bd106522453564
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
bf535be514570fc33fc0a6347a87dcd9097d9bfd 19-Nov-2014 Vladimir Marko <vmarko@google.com> Add card mark to filled-new-array.

Bug: 18032332
Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
b28c1c06236751aa5c9e64dcb68b3c940341e496 08-Nov-2014 Ian Rogers <irogers@google.com> Tidy RegStorage for X86.

Don't use global variables initialized in constructors to hold onto constant
values, instead use the TargetReg32 helper. Improve this helper with the use
of lookup tables. Elsewhere prefer to use constexpr values as they will have
less runtime cost.
Add an ostream operator to RegStorage for CHECK_EQ and use.

Change-Id: Ib8d092d46c10dac5909ecdff3cc1e18b7e9b1633
675e09b2753c2fcd521bd8f0230a0abf06e9b0e9 23-Oct-2014 Ningsheng Jian <ningsheng.jian@arm.com> ARM: Strength reduction for floating-point division

For floating-point division by power of two constants, generate
multiplication by the reciprocal instead.

Change-Id: I39c79eeb26b60cc754ad42045362b79498c755be
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5c5676b26a08454b3f0133783778991bbe5dd681 30-Sep-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Add div/rem zero check elimination flag

Just as with other throwing bytecodes, it is possible to prove in some cases
that a divide/remainder won't throw ArithmeticException. For example, in case
two divides with same denominator are in order, then provably the second one
cannot throw if the first one did not.

This patch adds the elimination flag and updates the signature of several
Mir2Lir methods to take the instruction optimization flags into account.

Change-Id: I0b078cf7f29899f0f059db1f14b65a37444b84e8
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
832336b3c9eb892045a8de1bb12c9361112ca3c5 09-Oct-2014 Ian Rogers <irogers@google.com> Don't copy fill array data to quick literal pool.

Currently quick copies the fill array data from the dex file to the literal
pool. It then has to go through hoops to pass this PC relative address down
to out-of-line code. Instead, pass the offset of the table to the out-of-line
code and use the CodeItem data associated with the ArtMethod. This reduces
the size of oat code while greatly simplifying it.
Unify the FillArrayData implementation in quick, portable and the interpreters.

Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
f4da675bbc4615c5f854c81964cac9dd1153baea 01-Aug-2014 Vladimir Marko <vmarko@google.com> Implement method calls using relative BL on ARM.

Store the linker patches with each CompiledMethod instead of
keeping them in CompilerDriver. Reorganize oat file creation
to apply the patches as we're writing the method code. Add
framework for platform-specific relative call patches in the
OatWriter. Implement relative call patches for ARM.

Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 22-Sep-2014 Vladimir Marko <vmarko@google.com> Deprecate GrowableArray, use ArenaVector instead.

Purge GrowableArray from Quick and Portable.
Remove GrowableArray<T>::Iterator.

Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
0a1174efd81fc25110ad106a84063c62af9ce7e5 11-Sep-2014 Mark Mendell <mark.p.mendell@intel.com> X86 QBE: Make some X86 routines virtual

Add virtual in one place, and move some code into a virtual routine.
This allows subclassing and overriding for my purposes.

Change-Id: Ie415df943b17b56ad1f057513b2df2a31801a72f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
6dccdc2511c9f22d3cc2ea83386ce9db2688fa19 18-Aug-2014 Maxim Kazantsev <maxim.kazantsev@intel.com> ART: Reduce LockCallTemps usage

Using FlushAllRegs/LockCallTemps in integer arithmetics causes
excess register flushing and clobbering. This patch adds API that
allows to flush, clobber and lock only those registers we really
need for calculations.

Change-Id: Idabaa4fff4d18a33e5040a80f66f2df6432f8be0
Signed-off-by: Max Kazantsev <maxim.kazantsev@intel.com>
b3a84e2f308b3ed7d17b8e96fc7adfcac36ebe77 28-Jul-2014 Lupusoru, Razvan A <razvan.a.lupusoru@intel.com> ART: Vectorization opcode implementation fixes

This patch fixes the implementation of the x86 vectorization opcodes.

Change-Id: I0028d54a9fa6edce791b7e3a053002d076798748
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
Signed-off-by: Philbert Lin <philbert.lin@intel.com>
8d0d03e24325463f0060abfd05dba5598044e9b1 07-Jun-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Change temporaries to positive names

Changes compiler temporaries to have positive names. The numbering now
puts them above the code VRs (locals + ins, in that order). The patch also
introduces APIs to query the number of temporaries, locals and ins.

The compiler temp infrastructure suffered from several issues
which are also addressed by this patch:
-There is no longer a queue of compiler temps. This would be polluted
with Method* when post opts were called multiple times.
-Sanity checks have been added to allow requesting of temps from BE
and to prevent temps after frame is committed.
-None of the structures holding temps can overflow because they are
allocated to allow holding maximum temps. Thus temps can be requested
by BE with no problem.
-Since the queue of compiler temps is no longer maintained, it is no
longer possible to refer to a temp that has invalid ssa (because it
was requested before ssa was run).
-The BE can now request temps after all ME allocations and it is guaranteed
to actually receive them.
-ME temps are now treated like normal VRs in all cases with no special
handling. Only the BE temps are handled specially because there are no
references to them from MIRs.
-Deprecated and removed several fields in CompilationUnit that saved
register information and updated callsites to call the new interface from
MIRGraph.

Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
b5bce7cc9f1130ab4932ba8e6917c362bf871f24 25-Jul-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Add non-temporal store support

Added non-temporal store support as a hint from the ME.
Added the implementation of the memory barrier
extended instruction that supports non-temporal stores
by explicitly serializing all previous store-to-memory instructions.

Change-Id: I8205a92083f9725253d8ce893671a133a0b6849d
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
53c913bb71b218714823c8c87a1f92830c336f61 13-Aug-2014 Andreas Gampe <agampe@google.com> ART: Clean up compiler

Clean up the compiler: less extern functions, dis-entangle
compilers, hide some compiler specifics, lower global includes.

Change-Id: Ibaf88d02505d86994d7845cf0075be5041cc8438
f40f890ae3acd7b3275355ec90e2814bba8d4fd6 14-Aug-2014 Yixin Shou <yixin.shou@intel.com> Implement inlined shift long for 32bit

Added support for x86 inlined shift long for 32bit

Change-Id: I6caef60dd7d80227c3057fd6f64b0ecb11025afa
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
8c914c02415d7673f75166e1f1efdcdc7fcadc65 28-Jul-2014 Yixin Shou <yixin.shou@intel.com> Implement GenInlinedReverseBits

Added support for x86 inlined version of reverse method of int and long

Change-Id: I7dbdc13b4afedd56557e9eff038a31517cdb1843
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
8c18c2aaedb171f9b03ec49c94b0e33449dc411b 06-Aug-2014 Andreas Gampe <agampe@google.com> ART: Generate chained compare-and-branch for short switches

Refactor Mir2Lir to generate chained compare-and-branch sequences
for short switches on all architectures.

Bug: 16241558

(cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13)

Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
79273802f2b788bcd3eb76edf4df1bcaa57f886f 06-Aug-2014 Andreas Gampe <agampe@google.com> ART: Rework CFA frame initialization and writing code

Move eh_frame initialization code and CFI writing code to
elf_writer_quick to remove hard-wired dependencies on specific
Quick-compiler backends.

Change-Id: I27ee8ce7245da33a20c90e0086b8d4fd0a2baf4d
e7f82e2515f47f3c3292281312d7031a34a58ffc 06-Aug-2014 Fred Shih <ffred@google.com> Added support for patching classes from different dex files.

Added support for class patching from different dex files and moved
ScopedObjectAccess from the quick compiler to driver. Slight refactoring
for clarity.

Bug: 16656190
Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
48971b3242e5126bcd800cc9c68df64596b43d13 06-Aug-2014 Andreas Gampe <agampe@google.com> ART: Generate chained compare-and-branch for short switches

Refactor Mir2Lir to generate chained compare-and-branch sequences
for short switches on all architectures.

Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
547cdfd21ee21e4ab9ca8692d6ef47c62ee7ea52 05-Aug-2014 Tong Shen <endlessroad@google.com> Emit CFI for x86 & x86_64 JNI compiler.

Now for host-side x86 & x86_64 ART, we are able to get complete stacktrace with even mixed C/C++ & Java stack frames.

Testing:
1. art/test/run-test --host --gdb [--64] --no-relocate 005
2. In gdb, run 'b art::Class_classForName' which is implementation of a Java native method, then 'r'
3. In gdb, run 'bt'. You should see stack frames down to main()

Change-Id: I2d17e9aa0f6d42d374b5362a15ea35a2fce96302
c76c614d681d187d815760eb909e5faf488a3c35 05-Aug-2014 Andreas Gampe <agampe@google.com> ART: Refactor long ops in quick compiler

Make GenArithOpLong virtual. Let the implementation in gen_common be
very basic, without instruction-set checks, and meant as a fall-back.
Backends should implement and dispatch to code for better implementations.
This allows to remove the GenXXXLong virtual methods from Mir2Lir, and
clean up the backends (especially removing some LOG(FATAL) implementations).

Change-Id: I6366443c0c325c1999582d281608b4fa229343cf
6bbf0967d217ab2b7bdbb78bfd076b8fb07a44e8 14-Jul-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: Implement the easy long division/remainder by a constant

Also optimizes long/int divisions by power-of-two values.

Also do some clean-up.

Change-Id: Ie414e64aac251c81361ae107d157c14439e6dab5
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
35e1e6ad4b50f1adbe9f93fe467766f042491896 30-Jul-2014 Tong Shen <endlessroad@google.com> 1. Fix CFI for quick compiled code in x86 & x86_64;
2. Emit CFI in .eh_frame instead of .debug_frame.

With CFI, we can correctly unwind past quick generated code.
Now gdb should unwind to main() for both x86 & x86_64 host-side ART.

Note that it does not work with relocation yet.

Testing:
1. art/test/run-test --host --gdb [--64] --no-relocate 005
2. In gdb, run 'b art_quick_invoke_stub', then 'r', then 'c' a few times
3. In gdb, run 'bt'. You should see stack frames down to main()

Change-Id: I5350d4097dc3d360a60cb17c94f1d02b99bc58bb
984305917bf57b3f8d92965e4715a0370cc5bcfb 28-Jul-2014 Andreas Gampe <agampe@google.com> ART: Rework quick entrypoint code in Mir2Lir, cleanup

To reduce the complexity of calling trampolines in generic code,
introduce an enumeration for entrypoints. Introduce a header that lists
the entrypoint enum and exposes a templatized method that translates an
enum value to the corresponding thread offset value.

Call helpers are rewritten to have an enum parameter instead of the
thread offset. Also rewrite LoadHelper and GenConversionCall this way.
It is now LoadHelper's duty to select the right thread offset size.

Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to
further simplify the call helpers, as well as make OpThreadMem specific
to X86 only (removed from Mir2Lir).

Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify
both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they
are now specific to X86 only.

Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the
X86 backend.

Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend.

Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend.

Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented.

Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
bebee4fd10e5db6cb07f59bc0f73297c900ea5f0 16-Jul-2014 Andreas Gampe <agampe@google.com> ART: Refactor GenSelect, refactor gen_common accordingly

This adds a GenSelect method meant for selection of constants. The
general-purpose GenInstanceof code is refactored to take advantage of
this. This cleans up code and squashes a branch-over on ARM64 to a
cset.

Also add a slow-path for type initialization in GenInstanceof.

Bug: 16241558

(cherry picked from commit 90969af6deb19b1dbe356d62fe68d8f5698d3d8f)

Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
9ee4519afd97121f893f82d41d23164fc6c9ed34 17-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86: GenSelect utility update

The is follow-up https://android-review.googlesource.com/#/c/101396/
to make x86 GenSelectConst32 implementation complete.

Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
147eb41b53729ec8d5c188d1cac90964a51afb8a 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73

Conflicts:
compiler/dex/quick/arm64/target_arm64.cc
compiler/image_test.cc
runtime/fault_handler.cc
1222c96fafe98061cfc57d3bd115f46edb64e624 15-Jul-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: inline Math.Max/Min (float and double)

This implements the inlined version of Math.Max/Min intrinsics.

Change-Id: I2db8fa7603db3cdf01016ec26811a96f91b1e6ed
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
Signed-off-by: Shou, Yixin <yixin.shou@intel.com>
90969af6deb19b1dbe356d62fe68d8f5698d3d8f 16-Jul-2014 Andreas Gampe <agampe@google.com> ART: Refactor GenSelect, refactor gen_common accordingly

This adds a GenSelect method meant for selection of constants. The
general-purpose GenInstanceof code is refactored to take advantage of
this. This cleans up code and squashes a branch-over on ARM64 to a
cset.

Also add a slow-path for type initialization in GenInstanceof.

Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
69dfe51b684dd9d510dbcb63295fe180f998efde 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
d9cb8ae2ed78f957a773af61759432d7a7bf78af 09-Jul-2014 Douglas Leung <douglas@mips.com> Fix art test failures for Mips.

This patch fixes the following art test failures for Mips:
003-omnibus-opcodes
030-bad-finalizer
041-narrowing
059-finalizer-throw

Change-Id: I4e0e9ff75f949c92059dd6b8d579450dc15f4467
Signed-off-by: Douglas Leung <douglas@mips.com>
ccc60264229ac96d798528d2cb7dbbdd0deca993 05-Jul-2014 Andreas Gampe <agampe@google.com> ART: Rework TargetReg(symbolic_reg, wide)

Make the standard implementation in Mir2Lir and the specialized one
in the x86 backend return a pair when wide = "true". Introduce
WideKind enumeration to improve code readability. Simplify generic
code based on this implementation.

Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
59a42afc2b23d2e241a7e301e2cd68a94fba51e5 04-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> Update counting VR for promotion

For 64-bit it makes sense to compute VR uses together for
int and long because core reg is shared.

Change-Id: Ie8676ece12c928d090da2465dfb4de4e91411920
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Add implicit null and stack checks for x86""

Fixes x86_64 cross compile issue. Removes command line options
and property to set implicit checks - this is hard coded now.

This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791.

Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
c380191f3048db2a3796d65db8e5d5a5e7b08c65 08-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86_64: Enable fp-reg promotion

Patch introduces 4 register XMM12-15 available for promotion of
fp virtual registers.

Change-Id: I3f89ad07fc8ae98b70f550eada09be7b693ffb67
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
0025a86411145eb7cd4971f9234fc21c7b4aced1 11-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Revert "Add implicit null and stack checks for x86"""

Broke the build.

This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e.

Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf 29-May-2014 Dave Allison <dallison@google.com> Add implicit null and stack checks for x86

This adds compiler and runtime changes for x86
implicit checks. 32 bit only.

Both host and target are supported.
By default, on the host, the implicit checks are null pointer and
stack overflow. Suspend is implemented but not switched on.

Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
3d14eb620716e92c21c4d2c2d11a95be53319791 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Add implicit null and stack checks for x86"

It breaks cross compilation with x86_64.

This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf.

Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43 09-Jul-2014 Udayan Banerji <udayan.banerji@intel.com> X86 Backend support for vectorized float and byte 16x16 operations

Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.

Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.

Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
407a9d2847161b843966a443b71760b1280bd396 04-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> Clean-up call_x86.cc

Also adds some DCHECKs and fixes for the bugs found by them.

Change-Id: I455bbfe2c6018590cf491880cd9273edbe39c4c7
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
70c4f06f9965cdb9319a2c85f65acda20086d765 25-Jun-2014 DaniilSokolov <daniil.y.sokolov@intel.com> ART: Intrinsic implementation for java.lang.System.arraycopy.

Implements intrinsic for java.lang.System.arraycopy(char[], int, char[], int, int) -
this method is internal to android class libraries and used in such classes as StringBuffer and
StringBuilder. It is not possible to call it from application code. The intrinsic for
this method is implemented as inline method (assembly code is generated manually).

The intrinsic is x86 32 bit only.

Change-Id: Id1b1e0a20d5f6d5f5ebfe1fdc2447b6d8a515432
Signed-off-by: Daniil Sokolov <daniil.y.sokolov@intel.com>
a77ee5103532abb197f492c14a9e6fb437054e2a 02-Jul-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: TargetReg update for x86

Also includes changes in common code. Elimination of use of TargetReg
with one parameter and direct access to special target registers.

Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 22-Jun-2014 buzbee <buzbee@google.com> Register promotion support for 64-bit targets

Not sufficiently tested for 64-bit targets, but should be
fairly close.

A significant amount of refactoring could stil be done, (in
later CLs).

With this change we are not making any changes to the vmap
scheme. As a result, it is a requirement that if a vreg
is promoted to both a 32-bit view and the low half of a
64-bit view it must share the same physical register. We
may change this restriction later on to allow for more flexibility
for 32-bit Arm.

For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to
promote, we'd end up with something like:

v4 (as an int) -> r10
v4/v5 (as a long) -> r10
v5 (as an int) -> r11
v5/v6 (as a long) -> r11

Fix a couple of ARM64 bugs on the way...

Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
23abec955e2e733999a1e2c30e4e384e46e5dde4 02-Jul-2014 Serban Constantinescu <serban.constantinescu@arm.com> AArch64: Add few more inline functions

This patch adds inlining support for the following functions:
* Math.max/min(long, long)
* Math.max/min(float, float)
* Math.max/min(double, double)
* Integer.reverse(int)
* Long.reverse(long)

Change-Id: Ia2b1619fd052358b3a0d23e5fcbfdb823d2029b9
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
dd64450b37776f68b9bfc47f8d9a88bc72c95727 01-Jul-2014 Elena Sayapina <elena.v.sayapina@intel.com> x86_64: Unify 64-bit check in x86 compiler

Update x86-specific Gen64Bit() check with the CompilationUnit target64 field
which is set using unified Is64BitInstructionSet(InstructionSet) check.

Change-Id: Ic00ac863ed19e4543d7ea878d6c6c76d0bd85ce8
Signed-off-by: Elena Sayapina <elena.v.sayapina@intel.com>
de68676b24f61a55adc0b22fe828f036a5925c41 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"

This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.

Breaks the build.

Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11 24-Jun-2014 Andreas Gampe <agampe@google.com> Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter""

This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.

Fixes an API comment, and differentiates between inserting and appending.

Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d 23-Jun-2014 Andreas Gampe <agampe@google.com> ART: Split out more cases of Load/StoreRef, volatile as parameter

Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.

On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.

Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
7071c8d5885175a746723a3b38a347855965be08 05-Mar-2014 Yixin Shou <yixin.shou@intel.com> Add x86 inlined abs method for float/double

Add the optimized implementation of inlined abs method for
float/double for X86 side.

Change-Id: I2f367542f321d88a976129f9f7156fd3c2965c8a
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
bd3682eada753de52975ae2b4a712bd87dc139a6 11-Jun-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: Implement rem_double/rem_float for x86/x86-64

This adds inlined version of the rem_double/rem_float bytecodes
for x86/x86-64 platforms. This patch also removes unnecessary
fmod and fmodf stubs from runtime.

Change-Id: I2311aa2adf08d6614527e0da070e3b6ce2343a20
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
4c115b85cc48f4dfc8fc2b0484ddfeb29f02d658 17-Jun-2014 Vladimir Marko <vmarko@google.com> Revert "Add x86 inlined abs method for float/double"

This reverts commit e88b89ad1d1a583daf205c7a387ba13f549f95f1.

Change-Id: I2ba21b7442ba3696482d45001e6bd32e8baf9d1f
e88b89ad1d1a583daf205c7a387ba13f549f95f1 05-Mar-2014 Yixin Shou <yixin.shou@intel.com> Add x86 inlined abs method for float/double

Add the optimized implementation of inlined abs method for
float/double for X86 side.

Change-Id: I4e095644a90524354040174954c1e127c7bb4ee2
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
5aa6e04061ced68cca8111af1e9c19781b8a9c5d 14-Jun-2014 Ian Rogers <irogers@google.com> Tidy x86 assembler.

Use helper functions to compute when the kind has a SIB, a ModRM and RegReg
form.

Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
7e399fd3a99ba9c9dbfafdf14f75dd318fa7d454 11-Jun-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: Disable all optimizations and fix bugs

This disables all optimizations and ensures that art tests still pass.

Change-Id: I43217378d6889bb04f4d064f8d53cb3ff4c20aa0
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
0f9b9c508814a62c6e21c6a06cfe4de39b5036c0 09-Jun-2014 Ian Rogers <irogers@google.com> Tidy up x86 assembler and fix byte register encoding.

Also fix reg storage int size issues.
Also fix bad use of byte registers in GenInlinedCas.

Change-Id: Id47424f36f9000e051110553e0b51816910e2fe8
a014776f4474579d4dfc72e3374ba45c6f6e5f35 07-Jun-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: Add long bytecode supports (2/2)

This patch adds implementation of math and complex long bytcodes,
and basic long arithmetic.

Change-Id: I811397d7e0ee8ad0d12b23d32ba58314d479d714
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
58994cdb00b323339bd83828eddc53976048006f 16-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Hard Float ABI support in QCG

This patch shows our efforts on resolving the ART limitations:
- passing "float"/"double" arguments via FPR
- passing "long" arguments via single GPR, not pair
- passing more than 3 agruments via GPR.

Work done:
- Extended SpecialTargetRegister enum with kARG4, kARG5, fARG4..fARG7.
- Created initial LoadArgRegs/GenDalvikX/FlushIns version in X86Mir2Lir.
- Unlimited number of long/double/float arguments support
- Refactored (v2)

Change-Id: I5deadd320b4341d5b2f50ba6fa4a98031abc3902
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 01-Jun-2014 buzbee <buzbee@google.com> Quick compiler: reference cleanup

For 32-bit targets, object references are 32 bits wide both in
Dalvik virtual registers and in core physical registers. Because of
this, object references and non-floating point values were both
handled as if they had the same register class (kCoreReg).

However, for 64-bit systems, references are 32 bits in Dalvik vregs, but
64 bits in physical registers. Although the same underlying physical
core registers will still be used for object reference and non-float
values, different register class views will be used to represent them.
For example, an object reference in arm64 might be held in x3 at some
point, while the same underlying physical register, w3, would be used
to hold a 32-bit int.

This CL breaks apart the handling of object reference and non-float values
to allow the proper register class (or register view) to be used. A
new register class, kRefReg, is introduced which will map to a 32-bit
core register on 32-bit targets, and 64-bit core registers on 64-bit
targets. From this point on, object references should be allocated
registers in the kRefReg class rather than kCoreReg.

Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
a20468c004264592f309a548fc71ba62a69b8742 30-Apr-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Support r8-r15, xmm8-xmm15 in assembler

Added REX support. The TARGET_REX_SUPPORT should be used during build.

Change-Id: I82b457ff5085c8192ad873923bd939fbb91022ce
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
96992e8f2eddba05dc38a15cc7d4e705e8db4022 19-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Add 64-bit version of instructions in asm

Add missed 64-bit versions of instructions.

Change-Id: I8151484d909dff487cb7e521494a0be249a42214
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
fe94578b63380f464c3abd5c156b7b31d068db6c 22-May-2014 Mark Mendell <mark.p.mendell@intel.com> Implement all vector instructions for X86

Add X86 code generation for the vector operations. Added support for
X86 disassembler for the new instructions.

Change-Id: I72b48f5efa3a516a16bb1dd4bdb5c9270a8db53a
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ed65c5e982705defdb597d94d1aa3f2997239c9b 22-May-2014 Serban Constantinescu <serban.constantinescu@arm.com> AArch64: Enable LONG_* and INT_* opcodes.

This patch fixes some of the issues with LONG and INT opcodes. The patch
has been tested and passes all the dalvik tests except for 018 and 107.

Change-Id: Idd1923ed935ee8236ab0c7e5fa969eaefeea8708
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
b01bf15d18f9b08d77e7a3c6e2897af0e02bf8ca 14-May-2014 buzbee <buzbee@google.com> 64-bit temp register support.

Add a 64-bit temp register allocation path. The recent physical
register handling rework supports multiple views of the same
physical register (or, such as for Arm's float/double regs,
different parts of the same physical register).

This CL adds a 64-bit core register view for 64-bit targets. In
short, each core register will have a 64-bit name, and a 32-bit
name. The different views will be kept in separate register pools,
but aliasing will be tracked. The core temp register allocation
routines will be largely identical - except for 32-bit targets,
which will continue to use pairs of 32-bit core registers for holding
long values.

Change-Id: I8f118e845eac7903ad8b6dcec1952f185023c053
e87f9b5185379c8cf8392d65a63e7bf7e51b97e7 30-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> Allow X86 QBE to be extended

Enhancements and updates to allow X86Mir2LIR Backend to be subclassed
for experimentation. Add virtual in a whole bunch of places, and make
some other changes to get this to work.

Change-Id: I0980a19bc5d5725f91660f98c95f1f51c17ee9b6
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
082833c8d577db0b2bebc100602f31e4e971613e 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This could result in "out of registers" failures, as well
as other more subtle problems.

This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Cherry-pick of internal change 468162

Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
05d3aeb33683b16837741f9348d6fba9a8432068 18-May-2014 buzbee <buzbee@google.com> Quick compiler, out of registers fix

Fixes b/15024623

It turns out that the register pool sanity checker was not
working as expected, leaving some inconsistencies unreported.
This CL fixes the sanity checker, adds a lot more check and cleans
up the previously undetected episodes of insanity.

Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
d65c51a556e6649db4e18bd083c8fec37607a442 29-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Add support for constant vector literals

Add in some vector instructions. Implement the ConstVector
instruction, which takes 4 words of data and loads it into
an XMM register.

Initially, only the ConstVector MIR opcode is implemented. Others will
be added after this one goes in.

Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b14329f90f725af0f67c45dfcb94933a426d63ce 15-May-2014 Andreas Gampe <agampe@google.com> ART: Fix MonitorExit code on ARM

We do not emit barriers on non-SMP systems. But on ARM, we have
places that need to conditionally execute, which is done through
an IT instruction. The guide of said instruction thus changes
between SMP and non-SMP systems.

To cleanly approach this, change the API so that GenMemBarrier
returns whether it generated an instruction. ARM will have to
query the result and update any dependent IT.

Throw a build system error if TARGET_CPU_SMP is not set.

Fix runtime/Android.mk to work with new multilib host.

Bug: 14989275
Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
9ee801f5308aa3c62ae3bedae2658612762ffb91 12-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> Add x86_64 code generation support

Utilizes r0..r7 in register allocator, implements spill/unsill
core regs as well as operations with stack pointer.

Change-Id: I973d5a1acb9aa735f6832df3d440185d9e896c67
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
2f244e9faccfcca68af3c5484c397a01a1c3a342 08-May-2014 Andreas Gampe <agampe@google.com> ART: Add more ThreadOffset in Mir2Lir and backends

This duplicates all methods with ThreadOffset parameters, so that
both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic
checks against the compilation unit's instruction set determine
which pointer size to use and therefore which methods to call.

Methods with unsupported pointer sizes should fatally fail, as
this indicates an issue during method selection.

Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
30adc7383a74eb3cb6db3bf42cea3a5595055ce1 10-May-2014 buzbee <buzbee@google.com> Quick compiler: Fix liveness tracking

Rework temp register liveness tracking to play nicely with aliased
physical registers, and re-enable liveness tracking optimization.

Add a pair of x86 utility routines that act like UpdateLoc(),
but only show in-register live temps if they are of the expected
register class.

Change-Id: I92779e0da2554689103e7488025be281f1a58989
674744e635ddbdfb311fbd25b5a27356560d30c3 24-Apr-2014 Vladimir Marko <vmarko@google.com> Use atomic load/store for volatile IGET/IPUT/SGET/SPUT.

Bug: 14112919
Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 07-May-2014 Vladimir Marko <vmarko@google.com> Cleanup ARM load/store wide and remove unused param s_reg.

Use a single LDRD/VLDR instruction for wide load/store on
ARM, adjust the base pointer if needed. Remove unused
parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp()
and StoreBaseIndexedDisp() on all architectures.

Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
455759b5702b9435b91d1b4dada22c4cce7cae3c 06-May-2014 Vladimir Marko <vmarko@google.com> Remove LoadBaseDispWide and StoreBaseDispWide.

Just pass k64 or kDouble to non-wide versions.

Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
2637f2e9bf4fc5591994b7c0158afead88321a7c 30-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Update and correct assemble_x86.cc

Correct the definition of some X86 instructions in the file.
Add some new instructions and the code to emit them properly.

Added EmitMemCond()

Change-Id: Icf4b70236cf0ca857c85dcb3edb218f26be458eb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6 29-Apr-2014 buzbee <buzbee@google.com> Quick compiler: debugging assists

A few minor assists to ease A/B debugging in the Quick
compiler:
1. To save time, the assemblers for some targets only
update the object code offsets on instructions involved with
pc-relative fixups. We add code to fix up all offsets when
doing a verbose codegen listing.
2. Temp registers are normally allocated in a round-robin
fashion. When disabling liveness tracking, we now reset the
round-robin pool to 0 on each instruction boundary. This makes
it easier to spot real codegen differences.
3. Self-register copies were previously emitted, but
marked as nops. Minor change to avoid generating them in the
first place and reduce clutter.

Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
3a74d15ccc9a902874473ac9632e568b19b91b1c 22-Apr-2014 Mingyao Yang <mingyao@google.com> Delete throw launchpads.

Bug: 13170824

Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
80365d9bb947edef0eae0bfe62b9f7a239416e6b 18-Apr-2014 Mingyao Yang <mingyao@google.com> Revert "Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException.""

This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException.
And fix the host test crash.

Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
7fff544c38f0dec3a213236bb785c3ca13d21a0f 18-Apr-2014 Brian Carlstrom <bdc@google.com> Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException."

This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
9d46314a309aff327f9913789b5f61200c162609 18-Apr-2014 Mingyao Yang <mingyao@google.com> Use LIRSlowPath for throwing ArrayOutOfBoundsException.

Get rid of launchpads for throwing ArrayOutOfBoundsException
and use LIRSlowPath instead.

Bug: 13170824
Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
e643a179cf5585ba6bafdd4fa51730d9f50c06f6 08-Apr-2014 Mingyao Yang <mingyao@google.com> Use LIRSlowPath for throwing NPE.

Get rid of launchpads for throwing NPE and use LIRSlowPath instead.
Also clean up some code of using LIRSlowPath for checking div
by zero.

Bug: 13170824

Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b 10-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Use trampolines for calls to helpers"""

This reverts commit f9487c039efb4112616d438593a2ab02792e0304.

Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304 09-Apr-2014 Dave Allison <dallison@google.com> Revert "Revert "Use trampolines for calls to helpers""

This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff.

Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61

Conflicts:
compiler/dex/quick/mir_to_lir.h
081f73e888b3c246cf7635db37b7f1105cf1a2ff 07-Apr-2014 Dave Allison <dallison@google.com> Revert "Use trampolines for calls to helpers"

This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6.

Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6 19-Feb-2014 Dave Allison <dallison@google.com> Use trampolines for calls to helpers

This is an ARM specific optimization to the compiler
that uses trampoline islands to make calls to runtime
helper functions. The intention is to reduce the size
of the generated code (by 2 bytes per call) without
affecting performance.

By default this is on when generating an OAT file. It is
off when compiling to memory.

To switch this off in dex2oat, use the command line option:
--no-helper-trampolines

Enhances disassembler to print the trampoline entry on the
BL instruction like this:

0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend

Bug: 12607709
Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
3da67a558f1fd3d8a157d8044d521753f3f99ac8 03-Apr-2014 Dave Allison <dallison@google.com> Add OpEndIT() for marking the end of OpIT blocks

In ARM we need to prevent code motion to the inside of an
IT block. This was done using a GenBarrier() to mark the end, but
it wasn't obvious that this is what was happening. This CL adds
an explicit OpEndIT() that takes the LIR of the OpIT for future
checks.

Bug: 13751744
Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
dd7624d2b9e599d57762d12031b10b89defc9807 15-Mar-2014 Ian Rogers <irogers@google.com> Allow mixing of thread offsets between 32 and 64bit architectures.

Begin a more full implementation x86-64 REX prefixes.
Doesn't implement 64bit thread offset support for the JNI compiler.

Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 28-Mar-2014 Ian Rogers <irogers@google.com> Revert "Revert "Optimize easy multiply and easy div remainder.""

This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088.
Remove the part of the change that confused !is_div with being multiply rather
than implying remainder.

Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.

(cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088)

Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
3654a6f50a948ead89627f398aaf86a2c2db0088 28-Mar-2014 Brian Carlstrom <bdc@google.com> Revert "Optimize easy multiply and easy div remainder."

This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
08df4b3da75366e5db37e696eaa7e855cba01deb 25-Mar-2014 Zheng Xu <zheng.xu@arm.com> Optimize easy multiply and easy div remainder.

Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters.
Add special cases for *0 and *1. Add more easy multiply special cases for
Arm.
Reuse easy multiply in SmallLiteralDivRem() to support remainder cases.

Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 07-Mar-2014 buzbee <buzbee@google.com> Continuing register cleanup

Ready for review.

Continue the process of using RegStorage rather than
ints to hold register value in the top layers of codegen.
Given the huge number of changes in this CL, I've attempted
to minimize the number of actual logic changes. With this
CL, the use of ints for registers has largely been eliminated
except in the lowest utility levels. "Wide" utility routines
have been updated to take a single RegStorage rather than
a pair of ints representing low and high registers.

Upcoming CLs will be smaller and more targeted. My expectations:
o Allocate float double registers as a single double rather than
a pair of float single registers.
o Refactor to push code which assumes long and double Dalvik
values are held in a pair of register to the target dependent
layer.
o Clean-up of the xxx_mir.h files to reduce the amount of #defines
for registers. May also do a register renumbering to bring all
of our targets' register naming more consistent. Possibly
introduce a target-independent float/non-float test at the
RegStorage level.

Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
99ad7230ccaace93bf323dea9790f35fe991a4a2 26-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Relaxed memory barriers for x86

X86 provides stronger memory guarantees and thus the memory barriers can be
optimized. This patch ensures that all memory barriers for x86 are treated
as scheduling barriers. And in cases where a barrier is needed (StoreLoad case),
an mfence is used.

Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 21-Feb-2014 Dave Allison <dallison@google.com> Implicit null/suspend checks (oat version bump)

This adds the ability to use SEGV signals
to throw NullPointerException exceptions from Java code rather
than having the compiler generate explicit comparisons and
branches. It does this by using sigaction to trap SIGSEGV and when triggered
makes sure it's in compiled code and if so, sets the return
address to the entry point to throw the exception.

It also uses this signal mechanism to determine whether to check
for thread suspension. Instead of the compiler generating calls
to a function to check for threads being suspended, the compiler
will now load indirect via an address in the TLS area. To trigger
a suspend, the contents of this address are changed from something
valid to 0. A SIGSEGV will occur and the handler will check
for a valid instruction pattern before invoking the thread
suspension check code.

If a user program taps SIGSEGV it will prevent our signal handler
working. This will cause a failure in the runtime.

There are two signal handlers at present. You can control them
individually using the flags -implicit-checks: on the runtime
command line. This takes a string parameter, a comma
separated set of strings. Each can be one of:

none switch off
null null pointer checks
suspend suspend checks
all all checks

So to switch only suspend checks on, pass:
-implicit-checks:suspend

There is also -explicit-checks to provide the reverse once
we change the default.

For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar

The default is -implicit-checks:none

There is also a property 'dalvik.vm.implicit_checks' whose value is the same
string as the command option. The default is 'none'. For example to switch on
null checks using the option:

setprop dalvik.vm.implicit_checks null

It only works for ARM right now.

Bumps OAT version number due to change to Thread offsets.

Bug: 13121132
Change-Id: If743849138162f3c7c44a523247e413785677370
49161cef10a308aedada18e9aa742498d6e6c8c7 12-Mar-2014 Jeff Hao <jeffhao@google.com> Allow patching between dex files in the boot classpath.

Change-Id: I53f219a5382d0fcd580e96e50025fdad4fc399df
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 28-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Revert "Rework Quick compiler's register handling""

This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace.

Ready. Fixed the original type, plus some mechanical changes
for rebasing.

Still needs additional testing, but the problem with the original
CL appears to have been a typo in the definition of the x86
double return template RegLocation.

Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
ae9fd93c39a341e2dffe15c61cc7d9e841fa92c4 11-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Tell GDB about Quick ART generated code

This is actually a lot of work. To do this, we need:
.debug_info
.debug_abbrev
.debug_frame
.debug_str

These are generated into the OAT file by OatWriter and ElfWriterQuick.

Since the Quick ART runtime doesn't use dlopen to load the OAT files,
GDB can't find this information. Use the alternate GDB JIT interface,
which can be invoked at runtime. To use this interface, an ELF image
needs to be built in memory. Read the information from the OAT file,
fixup the addresses to point to the real locations, add a symbol table
to hold the .text symbol, and then let GDB know about the information,
which will be read from the runtime address space.

This is quite primitive now, and could be cleaned up considerably. It
probably needs symbol table entries for the methods, and descriptions of
parameters and return types.

Currently only supported for X86.

This defaults to enabled for debug builds. Added dexoat --gen-gdb-info
and --no-gen-gdb-info flags to override.

Change-Id: I4d18b2370f6dfaa00c8cc1925f10717be3bd1a62
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
86ec520fc8b696ed6f164d7b756009ecd6e4aace 26-Feb-2014 Bill Buzbee <buzbee@android.com> Revert "Rework Quick compiler's register handling"

This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c.

Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
2c1ed456dcdb027d097825dd98dbe48c71599b6c 20-Feb-2014 buzbee <buzbee@google.com> Rework Quick compiler's register handling

For historical reasons, the Quick backend found it convenient
to consider all 64-bit Dalvik values held in registers
to be contained in a pair of 32-bit registers. Though this
worked well for ARM (with double-precision registers also
treated as a pair of 32-bit single-precision registers) it doesn't
play well with other targets. And, it is somewhat problematic
for 64-bit architectures.

This is the first of several CLs that will rework the way the
Quick backend deals with physical registers. The goal is to
eliminate the "64-bit value backed with 32-bit register pair"
requirement from the target-indendent portions of the backend
and support 64-bit registers throughout.

The key RegLocation struct, which describes the location of
Dalvik virtual register & register pairs, previously contained
fields for high and low physical registers. The low_reg and
high_reg fields are being replaced with a new type: RegStorage.
There will be a single instance of RegStorage for each RegLocation.
Note that RegStorage does not increase the space used. It is
16 bits wide, the same as the sum of the 8-bit low_reg and
high_reg fields.

At a target-independent level, it will describe whether the physical
register storage associated with the Dalvik value is a single 32
bit, single 64 bit, pair of 32 bit or vector. The actual register
number encoding is left to the target-dependent code layer.

Because physical register handling is pervasive throughout the
backend, this restructuring necessarily involves large CLs with
lots of changes. I'm going to roll these out in stages, and
attempt to segregate the CLs with largely mechanical changes from
those which restructure or rework the logic.

This CL is of the mechanical change variety - it replaces low_reg
and high_reg from RegLocation and introduces RegStorage. It also
includes a lot of new code (such as many calls to GetReg())
that should go away in upcoming CLs.

The tentative plan for the subsequent CLs is:

o Rework standard register utilities such as AllocReg() and
FreeReg() to use RegStorage instead of ints.
o Rework the target-independent GenXXX, OpXXX, LoadValue,
StoreValue, etc. routines to take RegStorage rather than
int register encodings.
o Take advantage of the vector representation and eliminate
the current vector field in RegLocation.
o Replace the "wide" variants of codegen utilities that take
low_reg/high_reg pairs with versions that use RegStorage.
o Add 64-bit register target independent codegen utilities
where possible, and where not virtualize with 32-bit general
register and 64-bit general register variants in the target
dependent layer.
o Expand/rework the LIR def/use flags to allow for more registers
(currently, we lose out on 16 MIPS floating point regs as
well as ARM's D16..D31 for lack of space in the masks).
o [Possibly] move the float/non-float determination of a register
from the target-dependent encoding to RegStorage. In other
words, replace IsFpReg(register_encoding_bits).

At the end of the day, all code in the target independent layer
should be using RegStorage, as should much of the target dependent
layer. Ideally, we won't be using the physical register number
encoding extracted from RegStorage (i.e. GetReg()) until the
NewLIRx() layer.

Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
4028a6c83a339036864999fdfd2855b012a9f1a7 20-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Inline x86 String.indexOf

Take advantage of the presence of a constant search char or start index
to tune the generated code.

Change-Id: I0adcf184fb91b899a95aa4d8ef044a14deb51d88
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
3bc01748ef1c3e43361bdf520947a9d656658bf8 06-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> GenSpecialCase support for x86

Moved GenSpecialCase from being ARM specific to common code to allow
it to be used by x86 quick as well.

Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
614c2b4e219631e8c190fd9fd5d4d9cd343434e1 29-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Support to generate inline long to FP bytecodes for x86

long-to-float and long-to-double are now generated inline instead of calling
a helper routine. The conversion is done by using x87.

Change-Id: I196e526afec1be212898baceca8527549c3655b6
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
6607d97166984ce578817269f9775c15b9044190 10-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Tweak Mir2Lir::GenInstanceofCallingHelper for X86

Make this virtual, and split out the X86 logic. Take advantage of SETcc
instruction for X86.

I don't think I can do much more due to need to preserve arguments for
the calls.

Change-Id: I10e3eaa61b61ceac384267e3078bb6f75c37cee4
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
55d0eac918321e0525f6e6491f36a80977e0d416 06-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Support Direct Method/Type access for X86

Thumb generates code to optimize calls to methods within core.oat.
Implement this for X86 as well, but take advantage of mov with 32 bit
immediate and call relative with 32 bit immediate.

Fix some incorrect return locations for long inlines.

Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2c498d1f28e62e81fbdb477ff93ca7454e7493d7 30-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Specializing x86 range argument copying

The ARM implementation of range argument copying was specialized in some cases.
For all other architectures, it would fall back to generating memcpy. This patch
updates the x86 implementation so it does not call memcpy and instead generates
loads and stores, favoring movement of 128-bit chunks.

Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
67c39c4aefca23cb136157b889c09ee200b3dec6 01-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Support Literal pools for x86

They are being used to store double constants, which are very
expensive to generate into XMM registers. Uses the 'Compiler
Temporary' support just added. The MIR instructions are scanned for
a reference to a double constant, a packed switch or a FillArray.
These all need the address of the start of the method, since 32
bit x86 doesn't have a PC-relative addressing mode.

If needed, a compiler temporary is allocated, and the address of
the base of the method is calculated, and stored. Later uses can
just refer to the saved value.

Trickiness comes when generating the load from the literal area,
as the offset is unknown before final assembler. Assume a 32 bit
displacement is needed, and fix this if it wasn't necessary.

Use LoadValue to load the 'base of method' pointer. Fix an incorrect
test in GetRegLocation.

Change-Id: I53ffaa725dabc370e9820c4e0e78664ede3563e6
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
feb2b4e2d1c6538777bb80b60f3a247537b6221d 28-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Redo x86 int arithmetic

Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it
to allow use of memory operands and knowledge of the fact that x86 has
(mostly) two operand instructions. Remove x86 specific code from the
generic version.

Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide
cases. Add some x86 helper routines to simplify generation.

Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
df8ee2ea9908db3dde463fed68391b0040517653 28-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> x86 updates GenInlinedUnsafePut/GenInstanceofFinal

Allow x86 to inline GenInlinedUnsafePut by freeing up a temporary
register early. Make an x86 specific version of GenInstanceofFinal that
uses compare to memory and a setCC instruction.

Change-Id: I67788d7ae83776b0b9069fe4b379452190774992
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
766e9295d2c34cd1846d81610c9045b5d5093ddd 27-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve GenConstString, GenS{get,put} for x86

Rewrite GenConstString for x86 to skip calling ResolveString when the
string is already resolved. Also try to avoid a register copy if the
Method* is in a promoted register.

Implement the TODO for GenS{get,put} to use compare to memory for x86 by
adding a new codegen function to compare directly to memory. Implement
a default implementation that uses a temporary register for RISC
architectures.

Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
4708dcd68eebf1173aef1097dad8ab13466059aa 22-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long multiply and shifts

Generate inline code for long shifts by constants and do long
multiplication inline. Convert multiplication by a constant to a
shift when we can. Fix some x86 assembler problems and add the new
instructions that were needed (64 bit shifts).

Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2bf31e67694da24a19fc1f328285cebb1a4b9964 23-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long divide

Implement inline division for literal and variable divisors. Use the
general case for dividing by a literal by using a double length multiply
by the appropriate constant with fixups. This is the Hacker's Delight
algorithm.

Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e02d48fb24747f90fd893e1c3572bb3c500afced 15-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Optimize x86 long arithmetic

Be smarter about taking advantage of a constant operand for x86 long
add/sub/and/or/xor. Using instructions with immediates and generating
results directly into memory reduces the number of temporary registers
and avoids hardcoded register usage.

Also rewrite the existing non-const x86 arithmetic to avoid fixed
register use, and use the fact that x86 instructions are two operand.
Pass the opcode to the XXXLong() routines to easily detect two operand
DEX opcodes.

Add a new StoreFinalValueWide() routine, which is similar to StoreValueWide,
but doesn't do an EvalLoc to allocate registers. The src operand must
already be in registers, and it just updates the dest location, and
calls the right live/dirty routines to get the src into the dest
properly.

Change-Id: Iefc16e7bc2236a73dc780d3d5137ae8343171f62
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d61ba4ba6fcde666adb5d5c81b1c32f0534fb2c8 13-Jan-2014 Bill Buzbee <buzbee@android.com> Revert "Revert "Better support for x86 XMM registers""

This reverts commit 8ff67e3338952c70ccf3b609559bf8cc0f379cfd.

Fix applied to loc.fp usage.

Change-Id: I1eb3005392544fcf30c595923ed25bcee2dc4859
8ff67e3338952c70ccf3b609559bf8cc0f379cfd 11-Jan-2014 Bill Buzbee <buzbee@android.com> Revert "Better support for x86 XMM registers"

The invalid usage of loc.fp must be corrected before this change can be submitted.

This reverts commit 766a5e5940b469ab40e52770862c81cfec1d835b.

Change-Id: I1173a9bf829da89cccd9c2898f5e11164987a22b
766a5e5940b469ab40e52770862c81cfec1d835b 10-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Better support for x86 XMM registers

Currently, ART Quick mode assumes that a double FP register is composed
of two single consecutive FP registers. This is true for ARM and MIPS,
but not x86. This means that only half of the 8 XMM registers are
available for use by x86 doubles.

This patch breaks the assumption that a wide FP RegisterLocation must be
a paired set of FP registers. This is done by making some routines in
common code virtual and overriding them in the X86Mir2Lir class. For
these wide fp locations, the high register is set to the same value as
the low register, in order to minimize changes to common code. In a
couple of places, the common code checks for this case.

The changes are also supposed to allow the possibility of using the XMM
registers for vector operations,but that support is still WIP.

Change-Id: Ic6ef24ea764991c6f4d9fb88d483a619f5a468cb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bd288c2c1206bc99fafebfb9120a83f13cf9723b 21-Dec-2013 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Add conditional move support to x86 and allow GenMinMax to use it

X86 supports conditional moves which is useful for reducing branchiness.
This patch adds support to the x86 backend to generate conditional reg
to reg operations. Both encoder and decoder support was added for cmov.

The x86 version of GenMinMax used for generating inlined version Math.min/max
has been updated to make use of the conditional move support.

Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
412d4f833d8c6b43ef9725cda15bc97012d9ecdf 18-Dec-2013 Mark Mendell <mark.p.mendell@intel.com> Improve x86 Fused long compare to literal

Generate better x86 code for the fused long comparison/branch
if one of the arguments is a literal. Use the algorithm from ARM,
tweaked for x86.

Change-Id: I872ba5dfaeeaaba6beff756d2eb6f9c6d018ce3e
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
343adb52d3f031b6b5e005ff51f9cb04df219b21 18-Dec-2013 Mark Mendell <mark.p.mendell@intel.com> Enhance GenArrayGet, GenArrayPut for x86

As pointed out by Ian Rogers, the x86 versions didn't optimize
handling of constant index expressions. Added that support,
simplified checking of constant indices, and removed the use of
a temporary register for the 'wide' cases by using x86 scaled
addressing mode.

Change-Id: I82174e4e3674752d00d7c4730496f59d69f5f173
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
5816ed48bc339c983b40dc493e96b97821ce7966 27-Nov-2013 Vladimir Marko <vmarko@google.com> Detect special methods at the end of verification.

This moves special method handling to method inliner
and prepares for eventual inlining of these methods.

Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
31c2aac7137b69d5622eea09597500731fbee2ef 09-Dec-2013 Vladimir Marko <vmarko@google.com> Rename ClobberCalleeSave to *Caller*, fix it for x86.

Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
057c74a3a2d50d1247d4e6472763ca6f59060762 03-Dec-2013 Vladimir Marko <vmarko@google.com> Add support for emitting x86 kArray instructions.

And factor out a lot of common code.

Change-Id: Ib1f135e341404f8a6f92fcef0047ec04577d32cd
1c282e2b9a9b432e132b2c332f861cad9feb4a73 21-Nov-2013 Vladimir Marko <vmarko@google.com> Refactor intrinsic CAS, prepare for 64-bit version.

Bug: 11391018
Change-Id: Ic0f740e0cd0eb47f2c915f81be02f52f7721f8a3
e508a2090b19fe705fbc6b99d76474037a74bbfb 04-Nov-2013 Vladimir Marko <vmarko@google.com> Fix unaligned Memory peek/poke intrinsics.

Change-Id: Id454464d0b28aa37f5239f1c6589ceb0b3bbbdea
a8b4caf7526b6b66a8ae0826bd52c39c66e3c714 24-Oct-2013 Vladimir Marko <vmarko@google.com> Add byte swap instructions for ARM and x86.

Change-Id: I03fdd61ffc811ae521141f532b3e04dda566c77d
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
a9a8254c920ce8e22210abfc16c9842ce0aea28f 04-Oct-2013 Ian Rogers <irogers@google.com> Improve quick codegen for aput-object.

1) don't type check known null.
2) if we know types in verify don't check at runtime.
3) if we're runtime checking then move all the code out-of-line.

Also, don't set up a callee-save frame for check-cast, do an instance-of test
then throw an exception if that fails.
Tidy quick entry point of Ldivmod to Lmod which it is on x86 and mips.
Fix monitor-enter/exit NPE for MIPS.
Fix benign bug in mirror::Class::CannotBeAssignedFromOtherTypes, a byte[]
cannot be assigned to from other types.

Change-Id: I9cb3859ec70cca71ed79331ec8df5bec969d6745
d9c4fc94fa618617f94e1de9af5f034549100753 02-Oct-2013 Ian Rogers <irogers@google.com> Inflate contended lock word by suspending owner.

Bug 6961405.
Don't inflate monitors for Notify and NotifyAll.
Tidy lock word, handle recursive lock case alongside unlocked case and move
assembly out of line (except for ARM quick). Also handle null in out-of-line
assembly as the test is quick and the enter/exit code is already a safepoint.
To gain ownership of a monitor on behalf of another thread, monitor contenders
must not hold the monitor_lock_, so they wait on a condition variable.
Reduce size of per mutex contention log.
Be consistent in calling thin lock thread ids just thread ids.
Fix potential thread death races caused by the use of FindThreadByThreadId,
make it invariant that returned threads are either self or suspended now.

Code size reduction on ARM boot.oat 0.2%.
Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%,
nexus 4 speedup 2.09% on DeltaBlue.

Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
b48819db07f9a0992a72173380c24249d7fc648a 15-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: assembly phase

Not as much compile-time gain from reworking the assembly phase as I'd
hoped, but still worthwhile. Should see ~2% improvement thanks to
the assembly rework. On the other hand, expect some huge gains for some
application thanks to better detection of large machine-generated init
methods. Thinkfree shows a 25% improvement.

The major assembly change was to establish thread the LIR nodes that
require fixup into a fixup chain. Only those are processed during the
final assembly pass(es). This doesn't help for methods which only
require a single pass to assemble, but does speed up the larger methods
which required multiple assembly passes.

Also replaced the block_map_ basic block lookup table (which contained
space for a BasicBlock* for each dex instruction unit) with a block id
map - cutting its space requirements by half in a 32-bit pointer
environment.

Changes:
o Reduce size of LIR struct by 12.5% (one of the big memory users)
o Repurpose the use/def portion of the LIR after optimization complete.
o Encode instruction bits to LIR
o Thread LIR nodes requiring pc fixup
o Change follow-on assembly passes to only consider fixup LIRs
o Switch on pc-rel fixup kind
o Fast-path for small methods - single pass assembly
o Avoid using cb[n]z for null checks (almost always exceed displacement)
o Improve detection of large initialization methods.
o Rework def/use flag setup.
o Remove a sequential search from FindBlock using lookup table of 16-bit
block ids rather than full block pointers.
o Eliminate pcRelFixup and use fixup kind instead.
o Add check for 16-bit overflow on dex offset.

Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
bd663de599b16229085759366c56e2ed5a1dc7ec 11-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: register/bb utilities

This CL yeilds about a 4% improvement in the compilation phase
of dex2oat (single-threaded; multi-threaded compilation is
more difficult to accurately measure). The register utilities
could stand to be completely rewritten, but this gets most of the
easy benefit.

Next up: the assembly phase.

Change-Id: Ife5a474e9b1a6d9e501e888dda6749d34eb77e96
11b63d13f0a3be0f74390b66b58614a37f9aa6c1 27-Aug-2013 buzbee <buzbee@google.com> Quick compiler: division by literal fix

The constant propagation optimization pass attempts to identify
constants in Dalvik virtual registers and handle them more efficiently.
The use of small constants in divison, though, was handled incorrectly
in that the high level code correctly detected the use of a constant,
but the actual code generation routine was only expecting the use of
a special constant form opcode.

see b/10503566

Change-Id: I88aa4d2eafebb2b1af1a1e88049f1845aefae261
468532ea115657709bc32ee498e701a4c71762d4 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
(cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
848871b4d8481229c32e0d048a9856e5a9a17ef9 05-Aug-2013 Ian Rogers <irogers@google.com> Entry point clean up.

Create set of entry points needed for image methods to avoid fix-up at load time:
- interpreter - bridge to interpreter, bridge to compiled code
- jni - dlsym lookup
- quick - resolution and bridge to interpreter
- portable - resolution and bridge to interpreter

Fix JNI work around to use JNI work around argument rewriting code that'd been
accidentally disabled.
Remove abstact method error stub, use interpreter bridge instead.
Consolidate trampoline (previously stub) generation in generic helper.
Simplify trampolines to jump directly into assembly code, keeps stack crawlable.
Dex: replace use of int with ThreadOffset for values that are thread offsets.
Tidy entry point routines between interpreter, jni, quick and portable.

Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
0cd7ec2dcd8d7ba30bf3ca420b40dac52849876c 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/blank_line issues

Change-Id: Ice937e95e23dd622c17054551d4ae4cebd0ef8a2
fc0e3219edc9a5bf81b166e82fd5db2796eb6a0d 17-Jul-2013 Brian Carlstrom <bdc@google.com> Fix multiple inclusion guards to match new pathnames

Change-Id: Id7735be1d75bc315733b1773fba45c1deb8ace43
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81