History log of /art/compiler/optimizing/code_generator_arm64.h
Revision Date Author Comments
c393d63aa2b8f6984672fdd4de631bbeff14b6a2 15-Apr-2016 Alexandre Rames <alexandre.rames@linaro.org> Fix: correctly destruct VIXL labels.

(cherry picked from commit c01a66465a398ad15da90ab2bdc35b7f4a609b17)

Bug: 27505766
Change-Id: I077465e3d308f4331e7a861902e05865f9d99835
dee58d6bb6d567fcd0c4f39d8d690c3acaf0e432 07-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals""

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

This CL fixed an issue with parsing quickened instructions.

Bug: 27894376
Bug: 27998571
Bug: 27995065

Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
60328910cad396589474f8513391ba733d19390b 04-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals"

Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.

Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
e3ff7b293be2a6791fe9d135d660c0cffe4bd73f 02-Mar-2016 David Brazdil <dbrazdil@google.com> Refactor HGraphBuilder and SsaBuilder to remove HLocals

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
cac5a7e871f1f346b317894359ad06fa7bd67fba 22-Feb-2016 Vladimir Marko <vmarko@google.com> Optimizing: Improve const-string code generation.

For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.

For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)

Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
1a65388f1d86bb232c2e44fecb44cebe13105d2e 18-Mar-2016 Roland Levillain <rpl@google.com> Clean up art::HConstant predicates.

- Make the difference between arithmetic zero and zero-bit
pattern non ambiguous.
- Introduce Boolean predicates in art::HIntConstant for when
they are used as Booleans.
- Introduce aritmetic positive and negative zero predicates
for floating-point constants.

Bug: 27639313
Change-Id: Ia04ecc6f6aa7450136028c5362ed429760c883bd
2ae48182573da7087bffc2873730bc758ec29696 16-Mar-2016 Calin Juravle <calin@google.com> Clean up NullCheck generation and record stats about it.

This removes redundant code from the generators and allows for easier
stat recording.

Change-Id: Iccd4368f9e9d87a6fecb863dee4e2145c97851c4
4a0dad67867f389e01a5a6c0fe381d210f687c0d 25-Jan-2016 Artem Udovichenko <artem.u@samsung.com> Revert "Revert "ARM/ARM64: Extend support of instruction combining.""

This reverts commit 6b5afdd144d2bb3bf994240797834b5666b2cf98.

Change-Id: Ic27a10f02e21109503edd64e6d73d1bb0c6a8ac6
9cd6d378bd573cdc14d049d32bdd22a97fa4d84a 09-Feb-2016 David Srbecky <dsrbecky@google.com> Associate slow paths with the instruction that they belong to.

Almost all slow paths already know the instruction they belong to,
this CL just moves the knowledge to the base class as well.

This is needed to be be able to get the corresponding dex pc for
slow path, which allows us generate better native line numbers,
which in turn fixes some native debugging stepping issues.

Change-Id: I568dbe78a7cea6a43a4a71a014b3ad135782c270
c7098ff991bb4e00a800d315d1c36f52a9cb0149 09-Feb-2016 David Srbecky <dsrbecky@google.com> Remove HNativeDebugInfo from start of basic blocks.

We do not require full environment at the start of basic block.
The dex pc contained in basic block is sufficient for line mapping.

Change-Id: I5ba9e5f5acbc4a783ad544769f9a73bb33e2bafa
6e332529c33be4d7dae5dad3609a839f4c0d3bfc 02-Feb-2016 David Brazdil <dbrazdil@google.com> ART: Remove HTemporary

Change-Id: I21b984224370a9ce7a4a13a9652503cfb03c5f03
44015868a5ed9f6915d510ade42e84949b719e3a 22-Jan-2016 Roland Levillain <rpl@google.com> Revert "Revert "ARM64 Baker's read barrier fast path implementation.""

This reverts commit 28a2ff0bd6c30549f3f6465d8316f5707b1d072f.

Bug: 12687968
Change-Id: I6e25c70f303368629cdb1084f1d7039261cbb79a
6b5afdd144d2bb3bf994240797834b5666b2cf98 22-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "ARM/ARM64: Extend support of instruction combining."

The test fails its checker parts.

This reverts commit debeb98aaa8950caf1a19df490f2ac9bf563075b.

Change-Id: I49929e15950c7814da6c411ecd2b640d12de80df
28a2ff0bd6c30549f3f6465d8316f5707b1d072f 21-Jan-2016 Mathieu Chartier <mathieuc@google.com> Revert "ARM64 Baker's read barrier fast path implementation."

This reverts commit c8f1df9965ca7f97ba9e6289f8c7a717765a59a9.

This breaks master.

Change-Id: Ic07f602af8732e2835bd11f65e3b9e766d3349c7
debeb98aaa8950caf1a19df490f2ac9bf563075b 11-Dec-2015 Ilmir Usmanov <i.usmanov@samsung.com> ARM/ARM64: Extend support of instruction combining.

Combine multiply instructions in the following way:
ARM64:
MUL/NEG -> MNEG
ARM32 (32-bit integers only):
MUL/ADD -> MLA
MUL/SUB -> MLS

Change-Id: If20f2d8fb060145ab6fbceeb5a8f1a3d02e0ecdb
c8f1df9965ca7f97ba9e6289f8c7a717765a59a9 20-Jan-2016 Roland Levillain <rpl@google.com> ARM64 Baker's read barrier fast path implementation.

Introduce an ARM64 fast path implementation in Optimizing
for Baker's read barriers (for both heap reference loads and
GC root loads). The marking phase of the read barrier is
performed by a slow path, invoking the runtime entry point
artReadBarrierMark.

Other read barrier algorithms continue to use the original
slow path based implementation, which has been renamed as
GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow.

Bug: 12687968
Bug: 26601270
Change-Id: I60da15249b58a8ee1a065ed9be2c4e438ee17150
58282f4510961317b8d5a364a6f740a78926716f 14-Jan-2016 David Brazdil <dbrazdil@google.com> ART: Remove Baseline compiler

We don't need Baseline any more and it hasn't been maintained for
a while anyway. Let's remove it.

Change-Id: I442ed26855527be2df3c79935403a25b1ee55df6
42249c3602c3d0243396ee3627ffb5906aa77c1e 08-Jan-2016 Aart Bik <ajcbik@google.com> Reduce code size by sharing slow paths.

Rationale:
Sharing identical slow path code reduces code size.

Background:
Currently, slow paths with the same dex-pc, same physical register
spilling code, and identical stack maps are shared (making this
only useful for deopt slow paths). The newly introduced mechanism
is sufficiently general to allow future improvements by e.g.
allowing different dex-pc (by passing this to runtime) or even
the kind of slow paths (by passing runtime addresses to the slowpath).

Change-Id: I819615c47b4fd98440a241f681f93e4fc22d12e0
5f7b58ea1adfc0639dd605b65f59198d3763f801 23-Nov-2015 Vladimir Marko <vmarko@google.com> Rewrite HInstruction::Is/As<type>().

Make Is<type>() and As<type>() non-virtual for concrete
instruction types, relying on GetKind(), and mark GetKind()
as PURE to improve optimization opportunities. This reduces
the number of relocations in libart-compiler.so's .rel.dyn
section by ~4K, or ~44%, and in .data.rel.ro by ~18K, or
~65%. The file is 96KiB smaller for Nexus 5, including 8KiB
reduction of the .text section.

Unfortunately, the g++/clang++ __attribute__((pure)) is not
strong enough to avoid duplicated virtual calls and we would
need the C++ [[pure]] attribute proposed in n3744 instead.
To work around this deficiency, we introduce an extra
non-virtual indirection for GetKind(), so that the compiler
can optimize common expressions such as
instruction->IsAdd() || instruction->IsSub()
or
instruction->IsAdd() && instruction->AsAdd()->...
which contain two virtual calls to GetKind() after inlining.

Change-Id: I83787de0671a5cb9f5b0a5f4a536cef239d5b401
22ccc3a93d32fa6991535eaebb17daf5abaf4ebf 24-Nov-2015 Roland Levillain <rpl@google.com> ARM64 read barrier support for concurrent GC in Optimizing.

This first implementation uses slow paths to instrument heap
reference loads and GC root loads for the concurrent copying
collector, respectively calling the artReadBarrierSlow and
artReadBarrierForRootSlow runtime entry points.

Notes:
- This implementation does not instrument HInvokeVirtual
nor HInvokeInterface instructions (for class reference
loads), as the corresponding read barriers are not stricly
required with the current concurrent copying collector.
- Intrinsics which may eventually call (on slow path) are
disabled when read barriers are enabled, as the current
slow path infrastructure does not support this case.
- When read barriers are enabled, the code generated for a
HArraySet instruction always go into the array set slow
path for object arrays (delegating the operation to the
runtime), as we are lacking a mechanism to keep a
temporary register live accross a runtime call (needed for
the instrumentation of type checking code, which requires
two successive read barriers).

Bug: 12687968
Change-Id: Icfb74f67bf23ae80e7723ee6a0c9ff34ba325d48
3927c8b8361336f1b16aae6eb2ed7577b20560f4 18-Nov-2015 Zheng Xu <zheng.xu@linaro.org> Opt compiler: Arm64 packed-switch jump tables.

In this patch, we set a rough threshold and only generate jump table
with limited number of HIRs in the graph. This is because current VIXL
can only handle Adr with label in the range of +/-1Mb.

Change-Id: I42bff2095ec26caeacc5efc90afebe34e229b518
0debae7bc89eb05f7a2bf7dccd223318fad7c88d 12-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Refactor GenerateTestAndBranch

Each code generator implements a method for generating condition
evaluation and branching to arbitrary labels. This patch refactors
it for better clarity but also to generate fewer jumps when the true
branch is the fallthrough successor.

This is preliminary work for implementing HSelect.

Change-Id: Iaa545a5ecbacb761c5aa241fa69140cf6eb5952f
6dc01748c61a7ad41d4ab701d3e27897bd39a899 12-Nov-2015 Alexandre Rames <alexandre.rames@linaro.org> Minor fixes and cleaning of arm64 static and direct calls code.

Fixes:
The proper way to avoid the MacroAssembler to generate code before or
after an instruction is to block the pools (usually via
`vixl::BlockPoolsScope`). Here we can use
`vixl::SingleEmissionCheckScope`, that checks we generate only one
instruction and also blocks the pools.
In practice the current code would have worked fine because VIXL would
not have generated anything after `Bl()` or `Ldr()`, but that was not
guaranteed.

Cleaning:
- `XRegisterFrom()` returns an X register. Calling `.X()` is not
required.
- Since we are sure (after the previous fixes) that nothing will be
emitted around the instructions we care about, update the code to
bind labels before the instructions for simplicity.

Change-Id: I42d49976721e380e66bcd7a5b345f1777009434a
0f7dca4ca0be8d2f8776794d35edf8b51b5bc997 02-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing/X86: PC-relative dex cache array addressing.

Add PC-relative dex cache array addressing for X86 and use
it for better invoke-static/-direct dispatch. Also delay
the initialization to the PC-relative base until needed.

Change-Id: Ib8634d5edce4920cd70172fd13211809cf6948d1
dc151b2346bb8a4fdeed0c06e54c2fca21d59b5d 15-Oct-2015 Vladimir Marko <vmarko@google.com> Optimizing: Determine invoke-static/-direct dispatch early.

Determine the dispatch type of invoke-static/-direct in a
special pass right after the type inference. This allows the
inliner to pass the "needs dex cache" check and inline more.
It also allows the code generator to avoid requesting a
register location for the ArtMethod* for kDexCachePcRelative
and direct methods.

The supported dispatch check handles also situations that
the CompilerDriver currently doesn't allow. The cleanup of
the CompilerDriver and required changes to Quick will come
in a separate change.

Change-Id: I3f8e903a119949e95871d8ab0a995f4731a13a07
e6dbf48d7a549e58a3d798bbbdc391e4d091b432 19-Oct-2015 Alexandre Rames <alexandre.rames@linaro.org> ARM64: Instruction simplification for array accesses.

HArrayGet and HArraySet with variable indexes generate two
instructions on arm64, like

add temp, obj, #data_offset
ldr out, [temp, index LSL #shift_amount]

When we have multiple accesses to the same array, the initial `add`
instruction is redundant.

This patch introduces the first instruction simplification in the
arm64-specific instruction simplification pass. It splits HArrayGet
and HArraySet using the new arm64-specific IR HIntermediateAddress.
After that we run GVN again to squash the multiple occurrences of
HIntermediateAddress.

Change-Id: I2e3d12fbb07fed07b2cb2f3f47f99f5a032f8312
e460d1df1f789c7c8bb97024a8efbd713ac175e9 29-Sep-2015 Calin Juravle <calin@google.com> Revert "Revert "Support unresolved fields in optimizing"

The CL also changes the calling convetion for 64bit static field set
to use kArg2 instead of kArg1. This allows optimizing to keep
the asumptions:
- arm pairs are always of form (even_reg, odd_reg)
- ecx_edx is not used as a register on x86.

This reverts commit e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1.

Change-Id: I93159917565824084abc96775f31be1a4249f2f3
225b6464a58ebe11c156144653f11a1c6607f4eb 28-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in code generators.

And completely remove the deprecated GrowableArray.

Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.

Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.

Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.

Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
85b62f23fc6dfffe2ddd3ddfa74611666c9ff41d 09-Sep-2015 Andreas Gampe <agampe@google.com> ART: Refactor intrinsics slow-paths

Refactor slow paths so that there is a default implementation for
common cases (only arm64 with vixl is special). Write a generic
intrinsic slow-path that can be reused for the specific architectures.
Move helper functions into CodeGenerator so that they are accessible.

Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
e6f49b47b6a4dc9c7684e4483757872cfc7ff1a1 17-Sep-2015 Calin Juravle <calin@google.com> Revert "Support unresolved fields in optimizing"
breaks debuggable tests.

This reverts commit 23a8e35481face09183a24b9d11e505597c75ebb.

Change-Id: I8e60b5c8f48525975f25d19e5e8066c1c94bd2e5
23a8e35481face09183a24b9d11e505597c75ebb 08-Sep-2015 Calin Juravle <calin@google.com> Support unresolved fields in optimizing

Change-Id: I9941fa5fcb6ef0a7a253c7a0b479a44a0210aad4
175dc732c80e6f2afd83209348124df349290ba8 25-Aug-2015 Calin Juravle <calin@google.com> Support unresolved methods in Optimizing

Change-Id: If2da02b50d2fa668cd58f134a005f1752e7746b1
fa6b93c4b69e6d7ddfa2a4ed0aff01b0608c5a3a 15-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in HGraph.

Replace GrowableArray with ArenaVector in HGraph and related
classes HEnvironment, HLoopInformation, HInvoke and HPhi,
and tag allocations with new arena allocation types.

Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
bfb5ba90cd6425ce49c2125a87e3b12222cc2601 01-Sep-2015 Andreas Gampe <agampe@google.com> Revert "Revert "Do a second check for testing intrinsic types.""

This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7.

When an intrinsic with invoke-type virtual is recognized, replace
the instruction with a new HInvokeStaticOrDirect.

Minimal update for dex-cache rework. Fix includes.

Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
ecc4366670e12b4812ef1653f7c8d52234ca1b1f 13-Aug-2015 Serban Constantinescu <serban.constantinescu@linaro.org> Add OptimizingCompilerStats to the CodeGenerator class.

Just refactoring, not yet used, but will be used by the incoming patch
series and future CodeGen specific stats.

Change-Id: I7d20489907b82678120518a77bdab9c4cc58f937
Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
581550137ee3a068a14224870e71aeee924a0646 19-Aug-2015 Vladimir Marko <vmarko@google.com> Revert "Revert "Optimizing: Better invoke-static/-direct dispatch.""

Fixed kCallArtMethod to use correct callee location for
kRecursive. This combination is used when compiling with
debuggable flag set.

This reverts commit b2c431e80e92eb6437788cc544cee6c88c3156df.

Change-Id: Idee0f2a794199ebdf24892c60f8a5dcf057db01c
b2c431e80e92eb6437788cc544cee6c88c3156df 19-Aug-2015 Vladimir Marko <vmarko@google.com> Revert "Optimizing: Better invoke-static/-direct dispatch."

Reverting due to failing ndebug tests.

This reverts commit 9b688a095afbae21112df5d495487ac5231b12d0.

Change-Id: Ie4f69da6609df3b7c8443412b6cf7f5c43c2c5d9
9b688a095afbae21112df5d495487ac5231b12d0 06-May-2015 Vladimir Marko <vmarko@google.com> Optimizing: Better invoke-static/-direct dispatch.

Add framework for different types of loading ArtMethod*
and code pointer retrieval. Implement invoke-static and
invoke-direct calls the same way as Quick. Document the
dispatch kinds in HInvokeStaticOrDirect's new enumerations
MethodLoadKind and CodePtrLocation.

PC-relative loads from dex cache arrays are used only for
x86-64 and arm64. The implementation for other architectures
will be done in separate CLs.

Change-Id: I468ca4d422dbd14748e1ba6b45289f0d31734d94
3887c468d731420e929e6ad3acf190d5431e94fc 12-Aug-2015 Roland Levillain <rpl@google.com> Remove unnecessary `explicit` qualifiers on constructors.

Change-Id: Id12e392ad50f66a6e2251a68662b7959315dc567
fc6a86ab2b70781e72b807c1798b83829ca7f931 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "Revert "ART: Implement try/catch blocks in Builder""

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

This reverts commit 3e18738bd338e9f8363b26bc895f38c0ec682824.

Change-Id: I4f5ea961848a0b83d8db3673763861633e9bfcfb
3e18738bd338e9f8363b26bc895f38c0ec682824 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Implement try/catch blocks in Builder"

Causes OutOfMemory issues, need to investigate.

This reverts commit 0b5c7d1994b76090afcc825e737f2b8c546da2f8.

Change-Id: I263e6cc4df5f9a56ad2ce44e18932ca51d7e349f
0b5c7d1994b76090afcc825e737f2b8c546da2f8 11-Jun-2015 David Brazdil <dbrazdil@google.com> ART: Implement try/catch blocks in Builder

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

Change-Id: I415b985596d5bebb7b1bb358a46e08b7b04bb53a
eb7b7399dbdb5e471b8ae00a567bf4f19edd3907 19-Jun-2015 Alexandre Rames <alexandre.rames@linaro.org> Opt compiler: Add disassembly to the '.cfg' output.

This is automatically added to the '.cfg' output when using the usual
`--dump-cfg` option.

Change-Id: I864bfc3a8299c042e72e451cc7730ad8271e4deb
ef20f71e16f035a39a329c8524d7e59ca6a11f04 09-Jun-2015 Alexandre Rames <alexandre.rames@linaro.org> Add boilerplate code for architecture-specific HInstructions.

Change-Id: I2723cd96e5f03012c840863dd38d7b2168117db8
69aa60163989c33a008115205d39732a76ecc1dc 09-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Pass current method to HNewInstance and HNewArray.""

Problem exposed by this change was fixed in:
https://android-review.googlesource.com/#/c/154031/

This reverts commit 7b0e353b49ac3f464c662f20e20e240f0231afff.

Change-Id: I680c13dc9db9ba223ab11c7af255222860b4e6d2
7b0e353b49ac3f464c662f20e20e240f0231afff 09-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Pass current method to HNewInstance and HNewArray."

082-inline-execute fails on x86.

This reverts commit e21aa42e1341d34250742abafdd83311ad9fa737.

Change-Id: Ib3fd25faee2e0128001e40d3d51a74f959bc4449
94015b939060f5041d408d48717f22443e55b6ad 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect.""

Fix was to special case baseline for x86, which does not have enough
registers to allocate the current method.

This reverts commit c345f141f11faad177aa9635a78088d00cf66086.

Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
e21aa42e1341d34250742abafdd83311ad9fa737 08-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Pass current method to HNewInstance and HNewArray.

Also remove unsed CodeGenerator::LoadCurrentMethod.

Change-Id: I4b8d3f2a30b8e2c76b6b329a72555483c993cb73
c345f141f11faad177aa9635a78088d00cf66086 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Use HCurrentMethod in HInvokeStaticOrDirect."

Fails on baseline/x86.

This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a.

Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
38207af82afb6f99c687f64b15601ed20d82220a 01-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Use HCurrentMethod in HInvokeStaticOrDirect.

Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
fd88f16100cceafbfde1b4f095f17e89444d6fa8 03-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Factorize code for common LocationSummary of HInvoke.

This is one step forward, we could factorize more, but
I wanted to get this out of the way first.

Change-Id: I6ae411a737eebaecb64974f47af507ce0cfbae85
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
e401d146407d61eeb99f8d6176b2ac13c4df1e33 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
9bd88b0933a372e6a7b64b850868e6a7998567e2 22-Apr-2015 Serban Constantinescu <serban.constantinescu@linaro.org> ARM64: Move xSELF from x18 to x19.

This patch moves xSELF to callee saved x19 and removes support for
ETR (external thread register), previously used across native calls.

Change-Id: Icee07fbb9292425947f7de33d10a0ddf98c7899b
Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
07276db28d654594e0e86e9e467cad393f752e6e 18-May-2015 Nicolas Geoffray <ngeoffray@google.com> Don't do a null test in MarkGCCard if the value cannot be null.

Change-Id: I45687f6d3505178e2fc3689eac9cb6ab1b2c1e29
c66671076b12a0ee8b9d1ae782732cc91beacb73 15-May-2015 Zheng Xu <zheng.xu@arm.com> Opt compiler: Speedup div/rem by constants on arm32 and arm64.

This patch also includes:
1. Add java test for div/rem negative constants.
2. Fix a thumb2 encoding issue where the last operand is
"reg, shift #amount" in some instructions.
3. Support a simple filter in arm32 assembler test to filter out
unsupported cases, such as "smull r0, r0, r1, r2".
4. Add smull arm32 assembler test.
5. Add smull/umull thumb2 test.
6. Add test for the thumb2 encoding issue which is fixed in this
patch.

Change-Id: I1601bc9c38f70f11909f2816fe3ec105a158951e
2d27c8e338af7262dbd4aaa66127bb8fa1758b86 28-Apr-2015 Roland Levillain <rpl@google.com> Refactor InvokeDexCallingConventionVisitor in Optimizing.

Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
da40309f61f98c16d7d58e4c34cc0f5eef626f93 24-Apr-2015 Zheng Xu <zheng.xu@arm.com> Opt compiler: ARM64: Use ldp/stp on arm64 for slow paths.

It should be a bit faster than load/store single registers and reduce
the code size.

Change-Id: I67b8302adf6174b7bb728f7c2afd2c237e34ffde
09a99965bb27649f5b1d373f76bfbec6a2500c9e 15-Apr-2015 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: ARM64: Follow other archs for a few codegen stubs.

Code generation for HInstanceFieldGet, HInstanceFieldSet,
HStaticFieldGet, and HStaticFieldSet are refactored to follow the
structure used for other backends.

Change-Id: I34a3bd17effa042238c6bf199848cbc2ec26ac5d
ad4450e5c3ffaa9566216cc6fafbf5c11186c467 17-Apr-2015 Zheng Xu <zheng.xu@arm.com> Opt compiler: Implement parallel move resolver without using swap.

The algorithm of ParallelMoveResolverNoSwap() is almost the same with
ParallelMoveResolverWithSwap(), except the way we resolve the circular
dependency. NoSwap() uses additional scratch register to resolve the
circular dependency. For example, (0->1) (1->2) (2->0) will be performed
as (2->scratch) (1->2) (0->1) (scratch->0).

On architectures without swap register support, NoSwap() can reduce the
number of moves from 3x(N-1) to (N+1) when there is circular dependency
with N moves.

And also, NoSwap() algorithm does not depend on architecture register
layout information, which means it can support register pairs on arm32
and X/W, D/S registers on arm64 without additional modification.

Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
69a503050fb8a7b3a79b2cd2cdc2d8fbc594575d 14-Apr-2015 Zheng Xu <zheng.xu@arm.com> ARM64: Remove suspend register.

It also clean up build/remove frame used by JNI compiler and generates
stp/ldp instead of str/ldr. Also x19 has been unblocked in both quick and
optimizing compiler.

Change-Id: Idbeac0942265f493266b2ef9b7a65bb4054f0e2d
c6b4dd8980350aaf250f0185f73e9c42ec17cd57 07-Apr-2015 David Srbecky <dsrbecky@google.com> Implement CFI for Optimizing.

CFI is necessary for stack unwinding in gdb, lldb, and libunwind.

Change-Id: I1a3480e3a4a99f48bf7e6e63c4e83a80cfee40a2
d43b3ac88cd46b8815890188c9c2b9a3f1564648 01-Apr-2015 Mingyao Yang <mingyao@google.com> Revert "Revert "Deoptimization-based bce.""

This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430.

Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
82e52ce8364e3e1c644d0d3b3b4f61364bf7089a 26-Mar-2015 Serban Constantinescu <serban.constantinescu@arm.com> ARM64: Update to VIXL 1.9.

Update VIXL's interface to VIXL 1.9.

Change-Id: Iebae947539cbad65488b7195aaf01de284b71cbb
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
d75948ac93a4a317feaf136cae78823071234ba5 27-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify String.compareTo.

Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430 24-Mar-2015 Andreas Gampe <agampe@google.com> Revert "Deoptimization-based bce."

This breaks compiling the core image:

Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1.

This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129.

Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
e295e6ec5beaea31be5d7d3c996cd8cfa2053129 07-Mar-2015 Mingyao Yang <mingyao@google.com> Deoptimization-based bce.

A mechanism is introduced that a runtime method can be called
from code compiled with optimizing compiler to deoptimize into
interpreter. This can be used to establish invariants in the managed code
If the invariant does not hold at runtime, we will deoptimize and continue
execution in the interpreter. This allows to optimize the managed code as
if the invariant was proven during compile time. However, the exception
will be thrown according to the semantics demanded by the spec.

The invariant and optimization included in this patch are based on the
length of an array. Given a set of array accesses with constant indices
{c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and
max(ci) < array-length. The first can be proven statically. The second can be
established with a deoptimization-based invariant. This replaces n bounds
checks with one invariant check (plus slow-path code).

Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
eeefa1276e83776f08704a3db4237423b0627e20 13-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Update locations of registers after slow paths spilling.

Change-Id: Id9aafcc13c1a085c17ce65d704c67b73f9de695d
579885a26d761f5ba9550f2a1cd7f0f598c2e1e3 22-Feb-2015 Serban Constantinescu <serban.constantinescu@arm.com> Opt Compiler: ARM64: Enable explicit memory barriers over acquire/release

Implement remaining explicit memory barrier code paths and temporarily
enable the use of explicit memory barriers for testing.

This CL also enables the use of instruction set features in the ARM64
backend. kUseAcquireRelease has been replaced with PreferAcquireRelease(),
which for now is statically set to false (prefer explicit memory barriers).

Please note that we still prefer acquire-release for the ARM64 Optimizing
Compiler, but we would like to exercise the explicit memory barrier code
path too.

Change-Id: I84e047ecd43b6fbefc5b82cf532e3f5c59076458
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
dc23d8318db08cb42e20f1d16dbc416798951a8b 16-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Avoid generating jmp +0.

When a block branches to a non-following block, but blocks
in-between do branch to it, we can avoid doing the branch.

Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
3d087decd1886b818adcccd4f16802e5e54dd03e 28-Jan-2015 Serban Constantinescu <serban.constantinescu@arm.com> Opt Compiler: ARM64: Enable Callee-saved register, as defined by AAPCS64.

For now we block kQuickSuspendRegister - x19, since Quick and the runtime
use this as a suspend counter register.

Change-Id: I090d386670e81e7924e4aa9a3864ef30d0580a30
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
1cf95287364948689f6a1a320567acd7728e94a3 12-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Small optimization for recursive calls: avoid dex cache.

Change-Id: I044757a2f06e535cdc1480c4fc8182b89635baf6
878d58cbaf6b17a9e3dcab790754527f3ebc69e5 16-Jan-2015 Andreas Gampe <agampe@google.com> ART: Arm64 optimizing compiler intrinsics

Implement most intrinsics for the optimizing compiler for Arm64.

Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
d97dc40d186aec46bfd318b6a2026a98241d7e9c 22-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Support callee save floating point registers on x64.

- Share the computation of core_spill_mask and fpu_spill_mask
between backends.
- Remove explicit stack overflow check support: we need to adjust
them and since they are not tested, they will easily bitrot.

Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
988939683c26c0b1c8808fc206add6337319509a 21-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Enable core callee-save on x64.

Will work on other architectures and FP support in other CLs.

Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
77520bca97ec44e3758510cebd0f20e3bb4584ea 12-Jan-2015 Calin Juravle <calin@google.com> Record implicit null checks at the actual invoke time.

ImplicitNullChecks are recorded only for instructions directly (see NB
below) preceeded by NullChecks in the graph. This way we avoid recording
redundant safepoints and minimize the code size increase.

NB: ParallalelMoves might be inserted by the register allocator between
the NullChecks and their uses. These modify the environment and the
correct action would be to reverse their modification. This will be
addressed in a follow-up CL.

Change-Id: Ie50006e5a4bd22932dcf11348f5a655d253cd898
cd6dffedf1bd8e6dfb3fb0c933551f9a90f7de3f 08-Jan-2015 Calin Juravle <calin@google.com> Add implicit null checks for the optimizing compiler

- for backends: arm, arm64, x86, x86_64
- fixed parameter passing for CodeGenerator
- 003-omnibus-opcodes test verifies that NullPointerExceptions work as
expected

Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
f85a9ca9859ad843dc03d3a2b600afbaf2e9bbdd 13-Jan-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing compiler] Compute live spill size

The current stack frame calculation assumes that each live register to
be saved/restored has the word size of the machine. This fails for X86,
where a double in an XMM register takes up 8 bytes. Change the
calculation to keep track of the number of core registers and number of
fp registers to handle this distinction.

This is slightly pessimal, as the registers may not be active at the
same time, but the only way to handle this would be to allocate both
classes of registers simultaneously, or remember all the active
intervals, matching them up and compute the size of each safepoint
interval.

Change-Id: If7860aa319b625c214775347728cdf49a56946eb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
840e5461a85f8908f51e7f6cd562a9129ff0e7ce 07-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Implement double and float support for arm in register allocator.

The basic approach is:
- An instruction that needs two registers gets two intervals.
- When allocating the low part, we also allocate the high part.
- When splitting a low (or high) interval, we also split the high
(or low) equivalent.
- Allocation follows the (S/D register) requirement that low
registers are always even and the high equivalent is low + 1.

Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
02d81cc8d162a31f0664249535456775e397b608 05-Jan-2015 Serban Constantinescu <serban.constantinescu@arm.com> Opt Compiler: ARM64: Add support for rem-float, rem-double and volatile.

Add support for rem-float, rem-double and volatile memory accesses
using acquire-release and memory barriers.

Change-Id: I96a24dff66002c3b772c3d8e6ed792e3cb59048a
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
5b4b898ed8725242ee6b7229b94467c3ea3054c8 18-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Don't block quick callee saved registers for optimizing."

X64 has one libcore test failing, and codegen_test on
arm is failing.

This reverts commit 6004796d6c630696127df2494dcd4f30d1367a34.

Change-Id: I20e00431fa18e11ce4c0cb6fffa91977fa8e9b4f
6004796d6c630696127df2494dcd4f30d1367a34 15-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Don't block quick callee saved registers for optimizing.

This change builds on:
https://android-review.googlesource.com/#/c/118983/

- Also fix x86_64 assembler bug triggered by this change.
- Fix (and improve) x86's backend byte register usage.
- Fix a bug in baseline register allocator: a fixed
out register must prevent inputs from allocating it.

Change-Id: I4883862e29b4e4b6470f1823cf7eab7e7863d8ad
3e69f16ae3fddfd24f4f0e29deb106d564ab296c 10-Dec-2014 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: Add arm64 support for register allocation.

Change-Id: Idc6e84eee66170de4a9c0a5844c3da038c083aa7
02164b352a1474c616771582ca9a73a2cc514c1f 13-Nov-2014 Serban Constantinescu <serban.constantinescu@arm.com> Opt Compiler: Arm64: Add support for more IRs plus various fixes.

Add support for more IRs and update others.

Change-Id: Iae1bef01dc3c0d238a46fbd2800e71c38288b1d2
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
32f5b4d2c8c9b52e9522941c159577b21752d0fa 25-Nov-2014 Serban Constantinescu <serban.constantinescu@arm.com> Vixl: Update the VIXL interface to VIXL 1.7 and enable VIXL debug.

This patch updates the interface to VIXL 1.7 and enables the debug version of
VIXL when ART is built in debug mode.

Change-Id: I443fb941bec3cffefba7038f93bb972e6b7d8db5
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
86a8d7afc7f00ff0f5ea7b8aaf4d50514250a4e6 19-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Consistently use k{InstructionSet}WordSize.

These constants were defined prior to k{InstructionSet}PointerSize. So
use them consistently in optimizing as a first step. We can discuss
whether we should remove them in a second step.

Change-Id: If129de1a3bb8b65f8d9c816a8ad466815fb202e6
67555f7e9a05a9d436e034f67ae683bbf02d072d 18-Nov-2014 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: Add support for more IRs on arm64.

Change-Id: I4b6425135d1af74912a206411288081d2516f8bf
f0e3937b87453234d0d7970b8712082062709b8d 12-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Do a parallel move in BoundsCheckSlowPath.

The two locations of the index and length could overlap,
so we need a parallel move. Also factorize the code for
doing a parallel move based on two locations.

Change-Id: Iee8b3459e2eed6704d45e9a564fb2cd050741ea4
fc19de8b201475231751b9df08fce01a093e5c2b 07-Nov-2014 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: Add arm64 support for a few more IRs.

Change-Id: I781ddcbc61eb2b04ae80b1c7697e1ed5694bd5b9
a89086e3be94fb262c4c4feb15241b30616c3b8f 07-Nov-2014 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: Add arm64 support for floating-point.

Change-Id: I0d97ab0f5ab770fee62c819505743febbce8835e
de58ab2c03ff8112b07ab827c8fa38f670dfc656 05-Nov-2014 Nicolas Geoffray <ngeoffray@google.com> Implement try/catch/throw in optimizing.

- We currently don't run optimizations in the presence of a try/catch.
- We therefore implement Quick's mapping table.
- Also fix a missing null check on array-length.

Change-Id: I6917dfcb868e75c1cf6eff32b7cbb60b6cfbd68f
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5319defdf502fc4569316473846b83180ec08035 23-Oct-2014 Alexandre Rames <alexandre.rames@arm.com> ART: optimizing compiler: initial support for ARM64.

The ARM64 port uses VIXL for code generation, to which it defers work
like label binding and branch resolving, register type coherency
checking, and immediate values handling.

Change-Id: I0a44508c0c991f472a63e67b3469cdd878fe1a68
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
Signed-off-by: Alexandre Rames <alexandre.rames@arm.com>