History log of /art/compiler/optimizing/code_generator_x86.h
Revision Date Author Comments
c393d63aa2b8f6984672fdd4de631bbeff14b6a2 15-Apr-2016 Alexandre Rames <alexandre.rames@linaro.org> Fix: correctly destruct VIXL labels.

(cherry picked from commit c01a66465a398ad15da90ab2bdc35b7f4a609b17)

Bug: 27505766
Change-Id: I077465e3d308f4331e7a861902e05865f9d99835
dee58d6bb6d567fcd0c4f39d8d690c3acaf0e432 07-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals""

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

This CL fixed an issue with parsing quickened instructions.

Bug: 27894376
Bug: 27998571
Bug: 27995065

Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
60328910cad396589474f8513391ba733d19390b 04-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals"

Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.

Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
e3ff7b293be2a6791fe9d135d660c0cffe4bd73f 02-Mar-2016 David Brazdil <dbrazdil@google.com> Refactor HGraphBuilder and SsaBuilder to remove HLocals

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
cac5a7e871f1f346b317894359ad06fa7bd67fba 22-Feb-2016 Vladimir Marko <vmarko@google.com> Optimizing: Improve const-string code generation.

For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.

For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)

Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
2ae48182573da7087bffc2873730bc758ec29696 16-Mar-2016 Calin Juravle <calin@google.com> Clean up NullCheck generation and record stats about it.

This removes redundant code from the generators and allows for easier
stat recording.

Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
c7098ff991bb4e00a800d315d1c36f52a9cb0149 09-Feb-2016 David Srbecky <dsrbecky@google.com> Remove HNativeDebugInfo from start of basic blocks.

We do not require full environment at the start of basic block.
The dex pc contained in basic block is sufficient for line mapping.

Change-Id: I5ba9e5f5acbc4a783ad544769f9a73bb33e2bafa
0c5b18edd1308975804ccf29a02a130a7b6f7fa7 06-Feb-2016 Mark Mendell <mark.p.mendell@intel.com> Support CMOV for x86 Select

If possible, generate CMOV to implement HSelect. Tricky cases are a
long or FP condition (no single CC generated), FP inputs (no FP CMOV)
and when the condition is a boolean or not emitted at the use site.
In these cases, keep using the existing HSelect code.

Change-Id: I4ff1e152b8ef126fbbabeb3316e9e2b6a6b74aeb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
6e332529c33be4d7dae5dad3609a839f4c0d3bfc 02-Feb-2016 David Brazdil <dbrazdil@google.com> ART: Remove HTemporary

Change-Id: I21b984224370a9ce7a4a13a9652503cfb03c5f03
a19616e3363276e7f2c471eb2839fb16f1d43f27 02-Feb-2016 Aart Bik <ajcbik@google.com> Implemented compare/signum intrinsics as HCompare
(with all code generation for all)

Rationale:
At HIR level, many more optimizations are possible, while ultimately
generated code can take advantage of full semantics.

Change-Id: I6e2ee0311784e5e336847346f7f3c4faef4fd17e
2f10a5fb8c236a6786928f0323bd312c3ee9a4cc 25-Jan-2016 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "X86: Use the constant area for more operations.""

This reverts commit cf8d1bb97e193e02b430d707d3b669565fababb4.

Handle the case of an intrinsic where CurrentMethod is still an input.
This will be the case when there are unresolved classes in the
hierarchy.

Add a test case to confirm that we don't crash when handling Math.abs,
which wants to add a pointer to the constant area for the bitmask to be
used to remove the sign bit.

Enhance 565-checker-condition-liveness to check for the case of deeply
nested EmitAtUseSite chains.

Change-Id: I022e8b96a32f5bf464331d0c318c56b9d0ac3c9a
cf8d1bb97e193e02b430d707d3b669565fababb4 25-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "X86: Use the constant area for more operations."

Hits a DCHECK:

dex2oatd F 19461 20411 art/compiler/optimizing/pc_relative_fixups_x86.cc:196] Check failed: !invoke_static_or_direct->HasCurrentMethodInput()


This reverts commit dc00454f0b9a134f01f79b419200f4044c2af5c6.

Change-Id: Idfcacf12eb9e1dd7e68d95e880fda0f76f90e9ed
dc00454f0b9a134f01f79b419200f4044c2af5c6 30-Oct-2015 Mark Mendell <mark.p.mendell@intel.com> X86: Use the constant area for more operations.

Allow FP HNeg to use the constant area to hold the constant to flip the
sign bit.

Enhance some math intrinsics to allow the use of the constant
area: Abs{Float,Double}, {Min,Max}{FloatFloat,DoubleDouble}.

Allow compares of floats/doubles to constants using the constant area.

These eliminate almost all uses of loading constants from the stack.

Change-Id: Ic4b831565825cbe9f0801b1b53c1013be7c87ae4
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
95e7ffc28ea4d6deba356e636b16120ae49b62e2 22-Jan-2016 Roland Levillain <rpl@google.com> Improve documentation and assertions of read barrier instrumentation.

For ARM, x86, x86-64 back ends. The case of the ARM64 back
end is already handled in
https://android-review.googlesource.com/#/c/197870/.

Bug: 12687968
Change-Id: I6df1128cc100cbdb89020876e1a54de719508be3
e3f43ac79e50a4693ea4d46acf5cffca64910cee 19-Jan-2016 Roland Levillain <rpl@google.com> Some read barrier clean-up in Optimizing.

These changes make the read barrier compiler instrumentation
code more uniform among the ARM, ARM64, x86 and x86-64 back
ends.

Bug: 12687968
Change-Id: I6b1c0cf2bc22ed6cd6b14754136bef4a2a036ea5
58282f4510961317b8d5a364a6f740a78926716f 14-Jan-2016 David Brazdil <dbrazdil@google.com> ART: Remove Baseline compiler

We don't need Baseline any more and it hasn't been maintained for
a while anyway. Let's remove it.

Change-Id: I442ed26855527be2df3c79935403a25b1ee55df6
42249c3602c3d0243396ee3627ffb5906aa77c1e 08-Jan-2016 Aart Bik <ajcbik@google.com> Reduce code size by sharing slow paths.

Rationale:
Sharing identical slow path code reduces code size.

Background:
Currently, slow paths with the same dex-pc, same physical register
spilling code, and identical stack maps are shared (making this
only useful for deopt slow paths). The newly introduced mechanism
is sufficiently general to allow future improvements by e.g.
allowing different dex-pc (by passing this to runtime) or even
the kind of slow paths (by passing runtime addresses to the slowpath).

Change-Id: I819615c47b4fd98440a241f681f93e4fc22d12e0
152408f8c2188a7ed950cad04883b2f67dc74e84 31-Dec-2015 Mark Mendell <mark.p.mendell@intel.com> X86: templatize GenerateTestAndBranch and friends

Allow the use of NearLabel as well as Label. This will be used by the
HSelect patch.

Replace a couple of Label(s) with NearLabel(s) as well.

Change-Id: I8e674c89e691bcdbccf4a5cdc07ad13b29ec21dd
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
5f7b58ea1adfc0639dd605b65f59198d3763f801 23-Nov-2015 Vladimir Marko <vmarko@google.com> Rewrite HInstruction::Is/As<type>().

Make Is<type>() and As<type>() non-virtual for concrete
instruction types, relying on GetKind(), and mark GetKind()
as PURE to improve optimization opportunities. This reduces
the number of relocations in libart-compiler.so's .rel.dyn
section by ~4K, or ~44%, and in .data.rel.ro by ~18K, or
~65%. The file is 96KiB smaller for Nexus 5, including 8KiB
reduction of the .text section.

Unfortunately, the g++/clang++ __attribute__((pure)) is not
strong enough to avoid duplicated virtual calls and we would
need the C++ [[pure]] attribute proposed in n3744 instead.
To work around this deficiency, we introduce an extra
non-virtual indirection for GetKind(), so that the compiler
can optimize common expressions such as
instruction->IsAdd() || instruction->IsSub()
or
instruction->IsAdd() && instruction->AsAdd()->...
which contain two virtual calls to GetKind() after inlining.

Change-Id: I83787de0671a5cb9f5b0a5f4a536cef239d5b401
f3e0ee27f46aa6434b900ab33f12cd3157578234 17-Dec-2015 Vladimir Marko <vmarko@google.com> Revert "Revert "ART: Reduce the instructions generated by packed switch.""

This reverts commit b4c137630fd2226ad07dfd178ab15725374220f1.

The underlying issue was fixed by https://android-review.googlesource.com/188271 .

Bug: 26121945
Change-Id: I58b08eb1a9f0a5c861f8cda93522af64bcf63920
17077d888a6752a2e5f8161eee1b2c3285783d12 16-Dec-2015 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "X86: Use locked add rather than mfence""

This reverts commit 0da3b9117706760e8722029f407da6d0297cc943.

Fix a compilation failure that slipped in somehow.

Change-Id: Ide8681cdc921febb296ea47aa282cc195f154049
0da3b9117706760e8722029f407da6d0297cc943 16-Dec-2015 Aart Bik <ajcbik@google.com> Revert "X86: Use locked add rather than mfence"

This reverts commit 7b3e4f99b25c31048a33a08688557b133ad345ab.

Reason: build error on sdk (linux) in git_mirror-aosp-master-with-vendor , please fix first

art/compiler/optimizing/code_generator_x86_64.cc:4032:7: error: use of
undeclared identifier 'codegen_'
codegen_->MemoryFence();

Change-Id: I91f8542cfd944b7425d1981c35872dcdcb901e18
b4c137630fd2226ad07dfd178ab15725374220f1 16-Dec-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "ART: Reduce the instructions generated by packed switch."

This reverts commit 59f054d98f519a3efa992b1c688eb97bdd8bbf55.

bug:26121945

Change-Id: I8a5ad7ef1f1de8d44787c27528fa3f7f5c2e9cd3
7b3e4f99b25c31048a33a08688557b133ad345ab 19-Nov-2015 Mark Mendell <mark.p.mendell@intel.com> X86: Use locked add rather than mfence

Java semantics for memory ordering can be satisfied using
lock addl $0,0(SP)
rather than mfence. The locked add synchronizes the memory caches, but
doesn't affect device memory.

Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop
with 600000000 iterations:
time ./mfence
real 0m5.411s
user 0m5.408s
sys 0m0.000s

time ./locked_add
real 0m3.552s
user 0m3.550s
sys 0m0.000s

Implement this as an instruction-set-feature lock_add. This is off by
default (uses mfence), and enabled for atom & silvermont variants.
Generation of mfence can be forced by a parameter to MemoryFence.

Change-Id: I5cb4fded61f4cbbd7b7db42a1b6902e43e458911
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
7c1559a06041c9c299d5ab514d54b2102f204a84 15-Dec-2015 Roland Levillain <rpl@google.com> x86 Baker's read barrier fast path implementation.

Introduce an x86 fast path implementation in Optimizing for
Baker's read barriers (for both heap reference loads and GC
root loads). The marking phase of the read barrier is
performed by a slow path, invoking a new runtime entry point
(artReadBarrierMark).

Other read barrier algorithms continue to use the original
slow path based implementation, which has been renamed as
GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow.

Bug: 12687968
Change-Id: Ie610c4befc19ff22378a8cba38b422dcacb54320
59f054d98f519a3efa992b1c688eb97bdd8bbf55 07-Dec-2015 Zheng Xu <zheng.xu@linaro.org> ART: Reduce the instructions generated by packed switch.

Implement Vladimir Marko's suggestion. The new compare/jump series
reduce the number of instructions from (2*n+1) to (1.5*n+3).

Generate normal compare/jump series when numEntries <= 3.
Generate optimal compare/jump series when numEntries <= threshold.
Generate jump tables otherwise.

Change-Id: I425547b6787057c7fa84e71f17c145b63b208633
0debae7bc89eb05f7a2bf7dccd223318fad7c88d 12-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Refactor GenerateTestAndBranch

Each code generator implements a method for generating condition
evaluation and branching to arbitrary labels. This patch refactors
it for better clarity but also to generate fewer jumps when the true
branch is the fallthrough successor.

This is preliminary work for implementing HSelect.

Change-Id: Iaa545a5ecbacb761c5aa241fa69140cf6eb5952f
0d5a281c671444bfa75d63caf1427a8c0e6e1177 13-Nov-2015 Roland Levillain <rpl@google.com> x86/x86-64 read barrier support for concurrent GC in Optimizing.

This first implementation uses slow paths to instrument heap
reference loads and GC root loads for the concurrent copying
collector, respectively calling the artReadBarrierSlow and
artReadBarrierForRootSlow (new) runtime entry points.

Notes:
- This implementation does not instrument HInvokeVirtual
nor HInvokeInterface instructions (for class reference
loads), as the corresponding read barriers are not stricly
required with the current concurrent copying collector.
- Intrinsics which may eventually call (on slow path) are
disabled when read barriers are enabled, as the current
slow path infrastructure does not support this case.
- When read barriers are enabled, the code generated for a
HArraySet instruction always go into the array set slow
path for object arrays (delegating the operation to the
runtime), as we are lacking a mechanism to keep a
temporary register live accross a runtime call (needed for
the instrumentation of type checking code, which requires
two successive read barriers).

Bug: 12687968
Change-Id: I14cd6107233c326389120336f93955b28ffbb329
0f7dca4ca0be8d2f8776794d35edf8b51b5bc997 02-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing/X86: PC-relative dex cache array addressing.

Add PC-relative dex cache array addressing for X86 and use
it for better invoke-static/-direct dispatch. Also delay
the initialization to the PC-relative base until needed.

Change-Id: Ib8634d5edce4920cd70172fd13211809cf6948d1
dc151b2346bb8a4fdeed0c06e54c2fca21d59b5d 15-Oct-2015 Vladimir Marko <vmarko@google.com> Optimizing: Determine invoke-static/-direct dispatch early.

Determine the dispatch type of invoke-static/-direct in a
special pass right after the type inference. This allows the
inliner to pass the "needs dex cache" check and inline more.
It also allows the code generator to avoid requesting a
register location for the ArtMethod* for kDexCachePcRelative
and direct methods.

The supported dispatch check handles also situations that
the CompilerDriver currently doesn't allow. The cleanup of
the CompilerDriver and required changes to Quick will come
in a separate change.

Change-Id: I3f8e903a119949e95871d8ab0a995f4731a13a07
805b3b56c6eb542298db33e0181f135dc9fed3d9 18-Sep-2015 Mark Mendell <mark.p.mendell@intel.com> X86 jump tables for PackedSwitch

Implement X86PackedSwitch using a jump table of offsets to blocks. The
X86PackedSwitch version just adds an input to address the constant area.

Change-Id: Id2752a1ee79222493040c6fd0e59aee9a544b76a
Bug: 21119474
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e460d1df1f789c7c8bb97024a8efbd713ac175e9 29-Sep-2015 Calin Juravle <calin@google.com> Revert "Revert "Support unresolved fields in optimizing"

The CL also changes the calling convetion for 64bit static field set
to use kArg2 instead of kArg1. This allows optimizing to keep
the asumptions:
- arm pairs are always of form (even_reg, odd_reg)
- ecx_edx is not used as a register on x86.

This reverts commit e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1.

Change-Id: I93159917565824084abc96775f31be1a4249f2f3
225b6464a58ebe11c156144653f11a1c6607f4eb 28-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in code generators.

And completely remove the deprecated GrowableArray.

Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.

Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.

Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.

Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
85b62f23fc6dfffe2ddd3ddfa74611666c9ff41d 09-Sep-2015 Andreas Gampe <agampe@google.com> ART: Refactor intrinsics slow-paths

Refactor slow paths so that there is a default implementation for
common cases (only arm64 with vixl is special). Write a generic
intrinsic slow-path that can be reused for the specific architectures.
Move helper functions into CodeGenerator so that they are accessible.

Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1 17-Sep-2015 Calin Juravle <calin@google.com> Revert "Support unresolved fields in optimizing"
breaks debuggable tests.

This reverts commit 23a8e35481face09183a24b9d11e505597c75ebb.

Change-Id: I8e60b5c8f48525975f25d19e5e8066c1c94bd2e5
23a8e35481face09183a24b9d11e505597c75ebb 08-Sep-2015 Calin Juravle <calin@google.com> Support unresolved fields in optimizing

Change-Id: I9941fa5fcb6ef0a7a253c7a0b479a44a0210aad4
175dc732c80e6f2afd83209348124df349290ba8 25-Aug-2015 Calin Juravle <calin@google.com> Support unresolved methods in Optimizing

Change-Id: If2da02b50d2fa668cd58f134a005f1752e7746b1
fa6b93c4b69e6d7ddfa2a4ed0aff01b0608c5a3a 15-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in HGraph.

Replace GrowableArray with ArenaVector in HGraph and related
classes HEnvironment, HLoopInformation, HInvoke and HPhi,
and tag allocations with new arena allocation types.

Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
bfb5ba90cd6425ce49c2125a87e3b12222cc2601 01-Sep-2015 Andreas Gampe <agampe@google.com> Revert "Revert "Do a second check for testing intrinsic types.""

This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7.

When an intrinsic with invoke-type virtual is recognized, replace
the instruction with a new HInvokeStaticOrDirect.

Minimal update for dex-cache rework. Fix includes.

Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
0616ae081e648f4b9b64b33e2624a943c5fce977 17-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Add support for x86 constant area

Use the Quick trick of finding the address of the method by calling the
next instruction and popping the return address into a register. This
trick is used because of the lack of PC-relative addressing in 32 bit
mode on the X86.

Add a HX86ComputeBaseMethodAddress instruction to trigger generation
of the method address, which is referenced by instructions needing
access to the constant area.

Add a HX86LoadFromConstantTable instruction that takes a
HX86ComputeBaseMethodAddress and a HConstant that will be used to load
the value when needed.

Change Add/Sub/Mul/Div to detect a HX86LoadFromConstantTable right hand
side, and generate code that directly references the constant area.
Other uses will be added later.

Change the inputs to HReturn and HInvoke(s), replacing the FP constants
with HX86LoadFromConstantTable instead. This allows values to be
loaded from the constant area into the right location.

Port the X86_64 assembler constant area handling to the X86.

Use the new per-backend optimization framework to do this conversion.

Change-Id: I6d235a72238262e4f9ec0f3c88319a187f865932
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ecc4366670e12b4812ef1653f7c8d52234ca1b1f 13-Aug-2015 Serban Constantinescu <serban.constantinescu@linaro.org> Add OptimizingCompilerStats to the CodeGenerator class.

Just refactoring, not yet used, but will be used by the incoming patch
series and future CodeGen specific stats.

Change-Id: I7d20489907b82678120518a77bdab9c4cc58f937
Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
581550137ee3a068a14224870e71aeee924a0646 19-Aug-2015 Vladimir Marko <vmarko@google.com> Revert "Revert "Optimizing: Better invoke-static/-direct dispatch.""

Fixed kCallArtMethod to use correct callee location for
kRecursive. This combination is used when compiling with
debuggable flag set.

This reverts commit b2c431e80e92eb6437788cc544cee6c88c3156df.

Change-Id: Idee0f2a794199ebdf24892c60f8a5dcf057db01c
b2c431e80e92eb6437788cc544cee6c88c3156df 19-Aug-2015 Vladimir Marko <vmarko@google.com> Revert "Optimizing: Better invoke-static/-direct dispatch."

Reverting due to failing ndebug tests.

This reverts commit 9b688a095afbae21112df5d495487ac5231b12d0.

Change-Id: Ie4f69da6609df3b7c8443412b6cf7f5c43c2c5d9
9b688a095afbae21112df5d495487ac5231b12d0 06-May-2015 Vladimir Marko <vmarko@google.com> Optimizing: Better invoke-static/-direct dispatch.

Add framework for different types of loading ArtMethod*
and code pointer retrieval. Implement invoke-static and
invoke-direct calls the same way as Quick. Document the
dispatch kinds in HInvokeStaticOrDirect's new enumerations
MethodLoadKind and CodePtrLocation.

PC-relative loads from dex cache arrays are used only for
x86-64 and arm64. The implementation for other architectures
will be done in separate CLs.

Change-Id: I468ca4d422dbd14748e1ba6b45289f0d31734d94
8158f28b6689314213eb4dbbe14166073be71f7e 07-Aug-2015 Alexandre Rames <alexandre.rames@linaro.org> Ensure coherency of call kinds for LocationSummary.

The coherency is enforced with checks added in the `InvokeRuntime`
helper, that we now also use on x86 and x86_64.

Change-Id: I8cb92b042f25dc3c5fd390e9c61a45b477d081f4
c470193cfc522fc818eb2eaab896aef9caf0c75a 10-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> Fuse long and FP compare & condition on x86/x86-64 in Optimizing.

This is a preliminary implementation of fusing long/float/double
compares with conditions to avoid materializing the result from the
compare and condition.

The information from a HCompare is transferred to the HCondition if it
is legal. There must be only a single use of the HCompare, the HCompare
and HCondition must be in the same block, the HCondition must not need
materialization.

Added GetOppositeCondition() to HCondition to return the flipped
condition.

Bug: 21120453
Change-Id: I1f1db206e6dc336270cd71070ed3232dedc754d6
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
fc6a86ab2b70781e72b807c1798b83829ca7f931 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "Revert "ART: Implement try/catch blocks in Builder""

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

This reverts commit 3e18738bd338e9f8363b26bc895f38c0ec682824.

Change-Id: I4f5ea961848a0b83d8db3673763861633e9bfcfb
3e18738bd338e9f8363b26bc895f38c0ec682824 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Implement try/catch blocks in Builder"

Causes OutOfMemory issues, need to investigate.

This reverts commit 0b5c7d1994b76090afcc825e737f2b8c546da2f8.

Change-Id: I263e6cc4df5f9a56ad2ce44e18932ca51d7e349f
0b5c7d1994b76090afcc825e737f2b8c546da2f8 11-Jun-2015 David Brazdil <dbrazdil@google.com> ART: Implement try/catch blocks in Builder

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

Change-Id: I415b985596d5bebb7b1bb358a46e08b7b04bb53a
eb7b7399dbdb5e471b8ae00a567bf4f19edd3907 19-Jun-2015 Alexandre Rames <alexandre.rames@linaro.org> Opt compiler: Add disassembly to the '.cfg' output.

This is automatically added to the '.cfg' output when using the usual
`--dump-cfg` option.

Change-Id: I864bfc3a8299c042e72e451cc7730ad8271e4deb
ef20f71e16f035a39a329c8524d7e59ca6a11f04 09-Jun-2015 Alexandre Rames <alexandre.rames@linaro.org> Add boilerplate code for architecture-specific HInstructions.

Change-Id: I2723cd96e5f03012c840863dd38d7b2168117db8
69aa60163989c33a008115205d39732a76ecc1dc 09-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Pass current method to HNewInstance and HNewArray.""

Problem exposed by this change was fixed in:
https://android-review.googlesource.com/#/c/154031/

This reverts commit 7b0e353b49ac3f464c662f20e20e240f0231afff.

Change-Id: I680c13dc9db9ba223ab11c7af255222860b4e6d2
7b0e353b49ac3f464c662f20e20e240f0231afff 09-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Pass current method to HNewInstance and HNewArray."

082-inline-execute fails on x86.

This reverts commit e21aa42e1341d34250742abafdd83311ad9fa737.

Change-Id: Ib3fd25faee2e0128001e40d3d51a74f959bc4449
94015b939060f5041d408d48717f22443e55b6ad 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect.""

Fix was to special case baseline for x86, which does not have enough
registers to allocate the current method.

This reverts commit c345f141f11faad177aa9635a78088d00cf66086.

Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
e21aa42e1341d34250742abafdd83311ad9fa737 08-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Pass current method to HNewInstance and HNewArray.

Also remove unsed CodeGenerator::LoadCurrentMethod.

Change-Id: I4b8d3f2a30b8e2c76b6b329a72555483c993cb73
c345f141f11faad177aa9635a78088d00cf66086 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Use HCurrentMethod in HInvokeStaticOrDirect."

Fails on baseline/x86.

This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a.

Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
38207af82afb6f99c687f64b15601ed20d82220a 01-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Use HCurrentMethod in HInvokeStaticOrDirect.

Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
fd88f16100cceafbfde1b4f095f17e89444d6fa8 03-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Factorize code for common LocationSummary of HInvoke.

This is one step forward, we could factorize more, but
I wanted to get this out of the way first.

Change-Id: I6ae411a737eebaecb64974f47af507ce0cfbae85
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
e401d146407d61eeb99f8d6176b2ac13c4df1e33 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
07276db28d654594e0e86e9e467cad393f752e6e 18-May-2015 Nicolas Geoffray <ngeoffray@google.com> Don't do a null test in MarkGCCard if the value cannot be null.

Change-Id: I45687f6d3505178e2fc3689eac9cb6ab1b2c1e29
7394569c9252b277710b2d7d3fc35fb0dd48fc4b 29-Apr-2015 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "Revert "Revert "[optimizing] Improve x86 shifts""""

This reverts commit 2a7a1d7808f003bea908023ebd11eb442d2fca39.

Fix the problem that a long long >> 63 got the wrong answer. The
problem was that a shr was used instead of a sar.

Change-Id: I0327f79c718016ddec9272a605fc50ec15ec4566
2d27c8e338af7262dbd4aaa66127bb8fa1758b86 28-Apr-2015 Roland Levillain <rpl@google.com> Refactor InvokeDexCallingConventionVisitor in Optimizing.

Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
2a7a1d7808f003bea908023ebd11eb442d2fca39 29-Apr-2015 Roland Levillain <rpl@google.com> Revert "Revert "Revert "[optimizing] Improve x86 shifts"""

This reverts commit 9b95a057ee20e4b1ca2e9c663726482172dc9ba3.

Reverting this CL as it breaks libcore tests:

org.apache.harmony.tests.java.lang.DoubleTest#test_compare
junit.framework.AssertionFailedError: compare() -0.0 should be less 0.0
at junit.framework.Assert.assertTrue(Assert.java:140)
at org.apache.harmony.tests.java.lang.DoubleTest.test_compare(DoubleTest.java:258)
org.apache.harmony.tests.java.lang.DoubleTest#test_compare FAIL (EXEC_FAILED)

org.apache.harmony.tests.java.lang.DoubleTest#test_compareToLjava_lang_Double
junit.framework.AssertionFailedError: Assert 2: compare() -0.0 should be less 0.0
at junit.framework.Assert.assertTrue(Assert.java:140)
at org.apache.harmony.tests.java.lang.DoubleTest.test_compareToLjava_lang_Double(DoubleTest.java:1320)
org.apache.harmony.tests.java.lang.DoubleTest#test_compareToLjava_lang_Double FAIL (EXEC_FAILED)

Change-Id: I10f0ec8cc9495cc225fef1940b3f1a9fe87d996f
9b95a057ee20e4b1ca2e9c663726482172dc9ba3 29-Apr-2015 Roland Levillain <rpl@google.com> Revert "Revert "[optimizing] Improve x86 shifts""

This reverts commit f9aac1e9f442c2486cd54f045d43e15791601205.

Don't use Location::Any() for the first input if the output is
Location::SameAsFirstInput().

Change-Id: I400834052b114abf0d616da1b4b6506f7bba10ab
232ade0b9401404ad4b61b1003551b58b96195a8 20-Apr-2015 Roland Levillain <rpl@google.com> Revert "Revert "Optimizing: Fix long-to-fp conversion on x86.""

This reverts commit 386ce406f150645158d6067c4e0a36565aefc44f.

Bug: 20413424
Change-Id: I6e93ff132907f2653f1ae12d6676ff2298f62ca1
ad4450e5c3ffaa9566216cc6fafbf5c11186c467 17-Apr-2015 Zheng Xu <zheng.xu@arm.com> Opt compiler: Implement parallel move resolver without using swap.

The algorithm of ParallelMoveResolverNoSwap() is almost the same with
ParallelMoveResolverWithSwap(), except the way we resolve the circular
dependency. NoSwap() uses additional scratch register to resolve the
circular dependency. For example, (0->1) (1->2) (2->0) will be performed
as (2->scratch) (1->2) (0->1) (scratch->0).

On architectures without swap register support, NoSwap() can reduce the
number of moves from 3x(N-1) to (N+1) when there is circular dependency
with N moves.

And also, NoSwap() algorithm does not depend on architecture register
layout information, which means it can support register pairs on arm32
and X/W, D/S registers on arm64 without additional modification.

Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
e14590bdfed24df30e6b7545fc819ba03ff8bba1 15-Apr-2015 Guillaume Sanchez <guillaumesa@google.com> Revert "[optimizing] Improve x86 parallel moves/swaps"

This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933.

This commit introduces a performance regression on CaffeineLogic of 30%.

Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
386ce406f150645158d6067c4e0a36565aefc44f 13-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Optimizing: Fix long-to-fp conversion on x86."

Test fails on arm.

This reverts commit 2d45b4df3838d9c0e5a213305ccd1d7009e01437.

Change-Id: Id2864917b52f7ffba459680303a2d15b34f16a4e
2d45b4df3838d9c0e5a213305ccd1d7009e01437 07-Apr-2015 Serguei Katkov <serguei.i.katkov@intel.com> Optimizing: Fix long-to-fp conversion on x86.

long-to-fp conversion implemented using SSE loses the precision.
The test is included. CL uses FPU to provide the correct result.

Change-Id: I8eaf3c46819a8cb52642a7e7d7c4e3e0edbc88db
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
f9aac1e9f442c2486cd54f045d43e15791601205 10-Apr-2015 Roland Levillain <rpl@google.com> Revert "[optimizing] Improve x86 shifts"

This reverts commit 222fcf96c9b73bbb739012575e7e413caf9348ec.

Reverting this CL as it is breaking a few tests (see http://build.chromium.org/p/client.art/builders/host-x86/builds/3251/steps/test%20optimizing/logs/stdio). Will investigate ASAP.

Change-Id: Iddd8363e83a24aa49fbdf0f0c9dc12e63b4848de
a5c19ce8d200d68a528f2ce0ebff989106c4a933 01-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Improve x86 parallel moves/swaps

Add a new constructor to ScratchRegisterScope that will supply a
register if there is a free one, but not spill to force one. Use this
to generated alternate code that doesn't use a temporary, as the
spill/restore of a register generates extra instructions that aren't
necessary on x86.

Here is the benefit for a 32 bit memory-to-memory exchange with no
free registers:
< 50 push eax
< 53 push ebx
< 8B44244C mov eax, [esp + 76]
< 8B5C246C mov ebx, [esp + 108]
< 8944246C mov [esp + 108], eax
< 895C244C mov [esp + 76], ebx
< 5B pop ebx
< 58 pop eax
---
> FF742444 push [esp + 68]
> FF742468 push [esp + 104]
> 8F44244C pop [esp + 72]
> 8F442468 pop [esp + 100]

Avoid using xchg instruction, as it is slow on smaller processors.

Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
222fcf96c9b73bbb739012575e7e413caf9348ec 30-Mar-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Improve x86 shifts

Support memory operands for integer shifts. Generate better code for
long shifts by constants.

Change-Id: Icc92fa1b59cc280d4894af6f054e19b01977d5ce
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b19930c5cba3cf662dce5ee057fcc9829b4cbb9c 09-Apr-2015 Guillaume Sanchez <guillaumesa@google.com> Follow up of "div/rem on x86 and x86_64", to tidy up the code a little.

Change-Id: Ibf39cbc8ac1d773599d70be2cb1e941674b60f1d
0f88e87085b7cf6544dadff3f555773966a6853e 30-Mar-2015 Guillaume Sanchez <guillaumesa@google.com> Speedup div/rem by constants on x86 and x86_64

This is done using the algorithms in Hacker's Delight chapter 10.

Change-Id: I7bacefe10067569769ed31a1f7834f796fb41119
d43b3ac88cd46b8815890188c9c2b9a3f1564648 01-Apr-2015 Mingyao Yang <mingyao@google.com> Revert "Revert "Deoptimization-based bce.""

This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430.

Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
fb8d279bc011b31d0765dc7ca59afea324fd0d0c 01-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Implement x86/x86_64 math intrinsics

Implement floor/ceil/round/RoundFloat on x86 and x86_64.
Implement RoundDouble on x86_64.

Add support for roundss and roundsd on both architectures. Support them
in the disassembler as well.

Add the instruction set features for x86, as the 'round' instruction is
only supported if SSE4.1 is supported.

Fix the tests to handle the addition of passing the instruction set
features to x86 and x86_64.

Add assembler tests for roundsd and roundss to x86_64 assembler tests.

Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d75948ac93a4a317feaf136cae78823071234ba5 27-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify String.compareTo.

Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
09ed1a3125849ec6ac07cb886e3c502e1dcfada2 25-Mar-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Implement X86 intrinsic support

Implement the supported intrinsics for X86.

Enhance the graph visualizer to print <U> for unallocated locations, to
allow calling the graph dumper from within register allocation for
debugging purposes.

Change-Id: I3b0319eb70a9a4ea228f67065b4c52d13a1ae775
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430 24-Mar-2015 Andreas Gampe <agampe@google.com> Revert "Deoptimization-based bce."

This breaks compiling the core image:

Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1.

This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129.

Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
e295e6ec5beaea31be5d7d3c996cd8cfa2053129 07-Mar-2015 Mingyao Yang <mingyao@google.com> Deoptimization-based bce.

A mechanism is introduced that a runtime method can be called
from code compiled with optimizing compiler to deoptimize into
interpreter. This can be used to establish invariants in the managed code
If the invariant does not hold at runtime, we will deoptimize and continue
execution in the interpreter. This allows to optimize the managed code as
if the invariant was proven during compile time. However, the exception
will be thrown according to the semantics demanded by the spec.

The invariant and optimization included in this patch are based on the
length of an array. Given a set of array accesses with constant indices
{c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and
max(ci) < array-length. The first can be proven statically. The second can be
established with a deoptimization-based invariant. This replaces n bounds
checks with one invariant check (plus slow-path code).

Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
234d69d075d1608f80adb647f7935077b62b6376 09-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "[optimizing] Enable x86 long support.""

This reverts commit 154552e666347d41d95d7619c6ee56249ff4feca.

Change-Id: Idc726551c249a888b7ff5fde8508ae50e81b2e13
154552e666347d41d95d7619c6ee56249ff4feca 06-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "[optimizing] Enable x86 long support."

Few libcore failures.

This reverts commit b4ba354cf8d22b261205494875cc014f18587b50.

Change-Id: I4a28d853e730dff9b69aec9555505803cf2fcd63
b4ba354cf8d22b261205494875cc014f18587b50 05-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> [optimizing] Enable x86 long support.

Change-Id: I9006972a65a1f191c45691104a960366747f9d16
dc23d8318db08cb42e20f1d16dbc416798951a8b 16-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Avoid generating jmp +0.

When a block branches to a non-following block, but blocks
in-between do branch to it, we can avoid doing the branch.

Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
1cf95287364948689f6a1a320567acd7728e94a3 12-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Small optimization for recursive calls: avoid dex cache.

Change-Id: I044757a2f06e535cdc1480c4fc8182b89635baf6
7c8d009552545e6f1fd6036721e4e42e3fd14697 26-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing compiler] Support x86 hard float ABI

Add support for the new ABI passing FP parameters in XMM0-XMM3. This
allows us to optimize for x86 methods that don't use 'long'.

Change-Id: Ic79a24767173451e7d7095ccc2a00b307593a868
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
966c3ae95d3c699ee9fbdbccc1acdaaf02325faf 27-Jan-2015 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)""

This reverts commit 949c91fb91f40a4a80b2b492913cf8541008975e.

This time, don't clobber EBX before saving it.

Redo some of the macros to make register usage explicit.

Change-Id: I8db8662877cd006816e16a28f42444ab7c36bfef
949c91fb91f40a4a80b2b492913cf8541008975e 27-Jan-2015 Vladimir Marko <vmarko@google.com> Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)"

And the 3 Mac build fixes. Fix conflicts in context_x86.* .

This reverts commits
3d2c8e74c27efee58e24ec31441124f3f21384b9 ,
34eda1dd66b92a361797c63d57fa19e83c08a1b4 ,
f601d1954348b71186fa160a0ae6a1f4f1c5aee6 ,
bc503348a1da573488503cc2819c9e30807bea31 .

Bug: 19150481
Change-Id: I6650ee30a7d261159380fe2119e14379e4dc9970
3d2c8e74c27efee58e24ec31441124f3f21384b9 13-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> ART: Implement X86 hard float (Quick/JNI/Baseline)

Use XMM0-XMM3 as parameter registers for float/double on X86. X86_64
already uses XMM0-XMM7 for parameters.

Change the 'hidden' argument register from XMM0 to XMM7 to avoid a
conflict.

Add support for FPR save/restore in runtime/arch/x86.

Minimal support for Optimizing baseline compiler.

Bump the version in runtime/oat.h because this is an ABI change.

Change-Id: Ia6fe150e8488b9e582b0178c0dda65fc81d5a8ba
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d97dc40d186aec46bfd318b6a2026a98241d7e9c 22-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Support callee save floating point registers on x64.

- Share the computation of core_spill_mask and fpu_spill_mask
between backends.
- Remove explicit stack overflow check support: we need to adjust
them and since they are not tested, they will easily bitrot.

Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
988939683c26c0b1c8808fc206add6337319509a 21-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Enable core callee-save on x64.

Will work on other architectures and FP support in other CLs.

Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
24f2dfae084b2382c053f5d688fd6bb26cb8a328 15-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing compiler] Implement inline x86 FP '%'

Replace the calls to fmod/fmodf by inline code as is done in the Quick
compiler.

Remove the quick fmod/fmodf runtime entries, as they are no longer in
use.

64 bit code generator Move() routine needed to be enhanced to handle
constants, as Location::Any() allows them to be generated.

Change-Id: I6b6a42f6faeed4b0b3c940453e487daf5b25d184
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
cd6dffedf1bd8e6dfb3fb0c933551f9a90f7de3f 08-Jan-2015 Calin Juravle <calin@google.com> Add implicit null checks for the optimizing compiler

- for backends: arm, arm64, x86, x86_64
- fixed parameter passing for CodeGenerator
- 003-omnibus-opcodes test verifies that NullPointerExceptions work as
expected

Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
f85a9ca9859ad843dc03d3a2b600afbaf2e9bbdd 13-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing compiler] Compute live spill size

The current stack frame calculation assumes that each live register to
be saved/restored has the word size of the machine. This fails for X86,
where a double in an XMM register takes up 8 bytes. Change the
calculation to keep track of the number of core registers and number of
fp registers to handle this distinction.

This is slightly pessimal, as the registers may not be active at the
same time, but the only way to handle this would be to allocate both
classes of registers simultaneously, or remember all the active
intervals, matching them up and compute the size of each safepoint
interval.

Change-Id: If7860aa319b625c214775347728cdf49a56946eb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
840e5461a85f8908f51e7f6cd562a9129ff0e7ce 07-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Implement double and float support for arm in register allocator.

The basic approach is:
- An instruction that needs two registers gets two intervals.
- When allocating the low part, we also allocate the high part.
- When splitting a low (or high) interval, we also split the high
(or low) equivalent.
- Allocation follows the (S/D register) requirement that low
registers are always even and the high equivalent is low + 1.

Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
52c489645b6e9ae33623f1ec24143cde5444906e 16-Dec-2014 Calin Juravle <calin@google.com> [optimizing compiler] Add support for volatile

- for backends: arm, x86, x86_64
- added necessary instructions to assemblies
- clean up code gen for field set/get
- fixed InstructionDataEquals for some instructions
- fixed comments in compiler_enums

* 003-opcode test verifies basic volatile functionality

Change-Id: I144393efa312dfb2c332cb84056b00edffee338a
9aec02fc5df5518c16f1e5a9b6cb198a192db973 19-Nov-2014 Calin Juravle <calin@google.com> [optimizing compiler] Add shifts

Added SHL, SHR, USHR for arm, x86, x86_64.

Change-Id: I971f594e270179457e6958acf1401ff7630df07e
86a8d7afc7f00ff0f5ea7b8aaf4d50514250a4e6 19-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Consistently use k{InstructionSet}WordSize.

These constants were defined prior to k{InstructionSet}PointerSize. So
use them consistently in optimizing as a first step. We can discuss
whether we should remove them in a second step.

Change-Id: If129de1a3bb8b65f8d9c816a8ad466815fb202e6
bacfec30ee9f2f6fdfd190f11b105b609938efca 14-Nov-2014 Calin Juravle <calin@google.com> [optimizing compiler] Add REM_INT, REM_LONG

- for arm, x86, x86_64
- minor cleanup/fix in div tests

Change-Id: I240874010206a5a9b3aaffbc81a885b94c248f93
f0e3937b87453234d0d7970b8712082062709b8d 12-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Do a parallel move in BoundsCheckSlowPath.

The two locations of the index and length could overlap,
so we need a parallel move. Also factorize the code for
doing a parallel move based on two locations.

Change-Id: Iee8b3459e2eed6704d45e9a564fb2cd050741ea4
9574c4b5f5ef039d694ac12c97e25ca02eca83c0 12-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Implement and/or/xor in optimizing.

Change-Id: I7cf6da1fd334a7177a5580931b8f174dd40b7cec
de58ab2c03ff8112b07ab827c8fa38f670dfc656 05-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Implement try/catch/throw in optimizing.

- We currently don't run optimizations in the presence of a try/catch.
- We therefore implement Quick's mapping table.
- Also fix a missing null check on array-length.

Change-Id: I6917dfcb868e75c1cf6eff32b7cbb60b6cfbd68f
424f676379f2f872acd1478672022f19f3240fc1 03-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Implement CONST_CLASS in optimizing compiler.

Change-Id: Ia8c8dfbef87cb2f7893bfb6e178466154eec9efd
19a19cffd197a28ae4c9c3e59eff6352fd392241 22-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Add support for static fields in optimizing compiler.

Change-Id: Id2f010589e2bd6faf42c05bb33abf6816ebe9fa9
102cbed1e52b7c5f09458b44903fe97bb3e14d5f 15-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Implement register allocator for floating point registers.

Also:
- Fix misuses of emitting the rex prefix in the x86_64 assembler.
- Fix movaps code generation in the x86_64 assembler.

Change-Id: Ib6dcf6e7c4a9c43368cfc46b02ba50f69ae69cbe
34bacdf7eb46c0ffbf24ba7aa14a904bc9176fb2 07-Oct-2014 Calin Juravle <calin@google.com> Add multiplication for integral types

This also fixes an issue where we could allocate a pair register even if
one of its parts was already blocked.

Change-Id: I4869175933409add2a56f1ccfb369c3d3dd3cb01
92a73aef279be78e3c2b04db1713076183933436 16-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Don't use assembler classes in code_generator.h.

The arm64 backend uses its own assembler and does not share
the same classes as the other backends. To avoid conflicts
or unnecessary mappings, just don't use those classes in the
shared part of the code generator.

Change-Id: I9e5fa40c1021d2e83a4ef14c52cd1ccd03f2f73d
71175b7f19a4f6cf9cc264feafd820dbafa371fb 09-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Cleanup baseline register allocator.

- Use three arrays for blocking regsters instead of
one and computing offsets in that array.]
- Don't pass blocked_registers_ to methods, just use the field.

Change-Id: Ib698564c31127c59b5a64c80f4262394b8394dc6
360231a056e796c36ffe62348507e904dc9efb9b 08-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Fix code generation of materialized conditions.

Move the logic for knowing if a condition needs to be materialized
in an optimization pass (so that the information does not change
as a side effect of another optimization).

Also clean-up arm and x86_64 codegen:
- arm: ldr and str are for power-users when a constant is
in play. We should use LoadFromOffset and StoreToOffset.
- x86_64: fix misuses of movq instead of movl.

Change-Id: I01a03b91803624be2281a344a13ad5efbf4f3ef3
56b9ee6fe1d6880c5fca0e7feb28b25a1ded2e2f 09-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Stop converting from Location to ManagedRegister.

Now the source of truth is the Location object that knows
which register (core, pair, fpu) it needs to refer to.

Change-Id: I62401343d7479ecfb24b5ed161ec7829cda5a0b1
7fb49da8ec62e8a10ed9419ade9f32c6b1174687 06-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Add support for floats and doubles.

- Follows Quick conventions.
- Currently only works with baseline register allocator.

Change-Id: Ie4b8e298f4f5e1cd82364da83e4344d4fc3621a3
3c04974a90b0e03f4b509010bff49f0b2a3da57f 24-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Optimize suspend checks in optimizing compiler.

- Remove the ones added during graph build (they were added
for the baseline code generator).
- Emit them at loop back edges after phi moves, so that the test
can directly jump to the loop header.
- Fix x86 and x86_64 suspend check by using cmpw instead of cmpl.

Change-Id: I6fad5795a55705d86c9e1cb85bf5d63dadfafa2a
3bca0df855f0e575c6ee020ed016999fc8f14122 19-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Support for saving and restoring live registers in a slow path.

And use it in suspend check slow paths.

Change-Id: I79caf28f334c145a36180c79a6e2fceae3990c31
e982f0b8e809cece6f460fa2d8df25873aa69de4 13-Aug-2014 Nicolas Geoffray <ngeoffray@google.com> Implement invoke virtual in optimizing compiler.

Also refactor 004 tests to make them work with both Quick and
Optimizing.

Change-Id: I87e275cb0ae0258fc3bb32b612140000b1d2adf8
3c7bb98698f77af10372cf31824d3bb115d9bf0f 23-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Implement array get and array put in optimizing.

Also fix a couple of assembler/disassembler issues.

Change-Id: I705c8572988c1a9c4df3172b304678529636d5f6
f12feb8e0e857f2832545b3f28d31bad5a9d3903 17-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Stack overflow checks and NPE checks for optimizing.

Change-Id: I59e97448bf29778769b79b51ee4ea43f43493d96
96f89a290eb67d7bf4b1636798fa28df14309cc7 11-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Add assembly operations with constants in optimizing compiler.

Change-Id: I5bcc35ab50d4457186effef5592a75d7f4e5b65f
ab032bc1ff57831106fdac6a91a136293609401f 15-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Fix a braino in the stack layout.

Also do some refactoring to have this code be just in CodeGenerator.

Change-Id: I88de109889138af8d60027973c12a64bee813cb7
e50383288a75244255d3ecedcc79ffe9caf774cb 04-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Support fields in optimizing compiler.

- Required support for temporaries, to be only used by baseline compiler.
- Also fixed a few invalid assumptions around locations and instructions
that don't need materialization. These instructions should not have an Out.

Change-Id: Idc4a30dd95dd18015137300d36bec55fc024cf62
412f10cfed002ab617c78f2621d68446ca4dd8bd 19-Jun-2014 Nicolas Geoffray <ngeoffray@google.com> Support longs in the register allocator for x86_64.

Change-Id: I7fb6dfb761bc5cf9e5705682032855a0a70ca867
86dbb9a12119273039ce272b41c809fa548b37b6 04-Jun-2014 Nicolas Geoffray <ngeoffray@google.com> Final CL to enable register allocation on x86.

This CL implements:
1) Resolution after allocation: connecting the locations
allocated to an interval within a block and between blocks.
2) Handling of fixed registers: some instructions require
inputs/output to be at a specific location, and the allocator
needs to deal with them in a special way.
3) ParallelMoveResolver::EmitNativeCode for x86.

Change-Id: I0da6bd7eb66877987148b87c3be6a983b4e3f858
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
a7062e05e6048c7f817d784a5b94e3122e25b1ec 22-May-2014 Nicolas Geoffray <ngeoffray@google.com> Add a linear scan register allocator to the optimizing compiler.

This is a "by-the-book" implementation. It currently only deals
with allocating registers, with no hint optimizations.

The changes remaining to make it functional are:
- Allocate spill slots.
- Resolution and placements of Move instructions.
- Connect it to the code generator.

Change-Id: Ie0b2f6ba1b98da85425be721ce4afecd6b4012a4
a7aca370a7d62ca04a1e24423d90e8020d6f1a58 28-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Setup policies for register allocation.

Change-Id: I857e77530fca3e2fb872fc142a916af1b48400dc
a747a392fb5f88d2ecc4c6021edf9f1f6615ba16 17-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Code cleanup in preparation for x64 backend.

- Use InvokeDexCallingConventionVisitor for setting
up HParameterValues
- Use kVregSize instead of kX86WordSize when dealing with
virtual registers.

Change-Id: Ia520223010194c70a3ff0ed659077f55cec4e7d8
01bc96d007b67fdb7fe349232a83e4b354ce3d08 11-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Long support in optimizing compiler.

- Add stack locations to the Location class.
- Change logic of parameter passing/setup by setting the
location of such instructions the ones for the calling
convention.

Change-Id: I4730ad58732813dcb9c238f44f55dfc0baa18799
707c809f661554713edfacf338365adca8dfd3a3 04-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Use target-specific word instead of runtime word.

Change-Id: Ia11dc3cc520a1a5c7bd017013e5699af9570ce91
4a34a428c6a2588e0857ef6baf88f1b73ce65958 03-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Support passing arguments to invoke-static* instructions.

- Stop using the frame pointer for accessing locals.
- Stop emulating a stack when doing code generation. Instead,
rely on dex register model, where instructions only reference
registers.

Change-Id: Id51bd7d33ac430cb87a53c9f4b0c864eeb1006f9
8ccc3f5d06fd217cdaabd37e743adab2031d3720 19-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Add support for invoke-static in optimizing compiler.

Support is limited to calls without parameters and returning
void. For simplicity, we currently follow the Quick ABI.

Change-Id: I54805161141b7eac5959f1cae0dc138dd0b2e8a5
787c3076635cf117eb646c5a89a9014b2072fb44 17-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Plug new optimizing compiler in compilation pipeline.

Also rename accessors to ART's conventions.

Change-Id: I344807055b98aa4b27215704ec362191464acecc
bab4ed7057799a4fadc6283108ab56f389d117d4 11-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> More code generation for the optimizing compiler.

- Add HReturn instruction
- Generate code for locals/if/return
- Setup infrastructure for register allocation. Currently
emulate a stack.

Change-Id: Ib28c2dba80f6c526177ed9a7b09c0689ac8122fb
d4dd255db1d110ceb5551f6d95ff31fb57420994 28-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Add codegen support to the optimizing compiler.

Change-Id: I9aae76908ff1d6e64fb71a6718fc1426b67a5c28