History log of /art/compiler/optimizing/nodes.cc
Revision Date Author Comments
d9e4d73b20d68aa387f5837e1535b6fc26b2859a 05-Feb-2018 Gupta Kumar, Sanjiv <sanjiv.kumar.gupta@intel.com> Fix iCache misses for GetKind on x86,x86_64

GetKind() takes about 2.6% of total compilation time on x86_64.
The primary reason is that the target call GetKindInternal() is often
beyond the page boundary causing frequent i-cache misses.
This patch removes the virtual call to GetKindInternal () and instead
keeps the InstructionKind into each constructed instruction.
Since we have about 121 instructions in total as of now,
it takes about 7 extra bits in each instruction.
dex2oat runs about 12% faster with --compiler-filter=everything on an
APK of 25MB.

Test: Tested the patch by running host art tests.

Rebased.

Change-Id: Ia7bbcd67180151e4565507164a718acbb6284885
Signed-off-by: Gupta Kumar, Sanjiv <sanjiv.kumar.gupta@intel.com>
bff7a52e2c6c9e988c3ed1f12a2da0fa5fd37cfb 25-Jan-2018 Nicolas Geoffray <ngeoffray@google.com> Revert "Compiler changes for bitstring based type checks."

Bug: 64692057
Bug: 71853552
Bug: 26687569

This reverts commit eb0ebed72432b3c6b8c7b38f8937d7ba736f4567.

Change-Id: I7daeaa077960ba41b2ed42bc47f17501621be4be
eb0ebed72432b3c6b8c7b38f8937d7ba736f4567 10-Jan-2018 Vladimir Marko <vmarko@google.com> Compiler changes for bitstring based type checks.

We guard the use of this feature with a compile-time flag,
set to true in this CL.

Boot image size for aosp_taimen-userdebug in AOSP master:
- before:
arm boot*.oat: 63604740
arm64 boot*.oat: 74237864
- after:
arm boot*.oat: 63531172 (-72KiB, -0.1%)
arm64 boot*.oat: 74135008 (-100KiB, -0.1%)

The new TypeCheckBenchmark yields the following changes
using the little cores of taimen fixed at 1.4016GHz:
32-bit 64-bit
timeCheckCastLevel1ToLevel1 11.48->15.80 11.47->15.78
timeCheckCastLevel2ToLevel1 15.08->15.79 15.08->15.79
timeCheckCastLevel3ToLevel1 19.01->15.82 17.94->15.81
timeCheckCastLevel9ToLevel1 42.55->15.79 42.63->15.81
timeCheckCastLevel9ToLevel2 39.70->14.36 39.70->14.35
timeInstanceOfLevel1ToLevel1 13.74->17.93 13.76->17.95
timeInstanceOfLevel2ToLevel1 17.02->17.95 16.99->17.93
timeInstanceOfLevel3ToLevel1 24.03->17.95 24.45->17.95
timeInstanceOfLevel9ToLevel1 47.13->17.95 47.14->18.00
timeInstanceOfLevel9ToLevel2 44.19->16.52 44.27->16.51
This suggests that the bitstring typecheck should not be
used for exact type checks which would be equivalent to the
"Level1ToLevel1" benchmark. Whether the implementation is
a beneficial replacement for the kClassHierarchyCheck and
kAbstractClassCheck on average depends on how many levels
from the target class (or Object for a negative result) is
a typical object's class.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --host -t 670-bitstring-type-check
Test: Pixel 2 XL boots.
Test: testrunner.py --target --optimizing --jit
Test: testrunner.py --target -t 670-bitstring-type-check
Bug: 64692057
Bug: 71853552
Bug: 26687569
Change-Id: I538d7e036b5a8ae2cc3fe77662a5903d74854562
7f4aff6705f46f411874b5ca8c4856b8ed5bfb13 21-Jun-2017 Artem Serov <artem.serov@linaro.org> ART: Implement SuperblockCloner.

SuperblockCloner provides a feature of cloning subgraphs in a
smart, high level way without fine grain manipulation with IR;
data flow and graph properties are resolved/adjusted automatically.
The clone transformation is defined by specifying a set of basic
blocks to copy and a set of rules how to treat edges, remap their
successors. By using this approach such optimizations as Branch
Target Expansion, Loop Peeling, Loop Unrolling can be implemented.

Test: superblock_cloner_test.cc.
Change-Id: Ibeede38195376ca35f44ba9015491e50b3a5b87e
04366f382239f4bcf1f9c67bb1ff6975607cd8e4 14-Dec-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "ART: Try to statically evaluate some conditions."

CL has an unwanted 10-15% compile-time impact.

This reverts commit 1de1e11ac90db9fad8916ac43d43714ccb8d978f.

Change-Id: I76b45aa95bbd24dd025d2ee6cf37d77fe17b8497
09faaea17b75269805b4857ed3c9cd04c7273959 07-Dec-2017 Artem Serov <artem.serov@linaro.org> ART: Fix single-preheader transformation.

Original implementation of "Make sure the loop has only one
pre-header" had an assumption that the header had no phi
functions since loops with multiple preheaders now only may exist
during graph building before ssa construction; all of the
optimizations preserve the single-preheader invariant. This code is
used by DCE; DCE was called multiple times but after graph building
preheader transformation was never executed. However if someone
introduces a optimization which might not keep the invariant
(e.g. loop peeling) the data flow adjustments must be performed.

Test: loop_optimization_test.cc
Test: test-art-target, test-art-host
Change-Id: I88bb0aad2dd5241addef7fe9cda474a6868bf532
1de1e11ac90db9fad8916ac43d43714ccb8d978f 20-Jul-2017 Artem Serov <artem.serov@linaro.org> ART: Try to statically evaluate some conditions.

If a condition 'cond' is evaluated in an HIf instruction then in
the successors of the this HIF_BLOCK we statically know the value
of the condition (TRUE in TRUE_SUCC, FALSE in FALSE_SUCC). Using
that we could replace another evaluation (use) EVAL of the same
'cond' with TRUE value (FALSE value) if every path from the
ENTRY_BLOCK to EVAL_BLOCK contains the edge HIF_BLOCK->TRUE_SUCC
(HIF_BLOCK->FALSE_SUCC).

if (cond) {
...
if (cond) {
...
}
...
int a = cond ? 5 : 105;
...
}

The patch is a prerequisite step for "Loop peeling to eliminate
invariant exits" however it brings some value on its own with
a tiny code size reduction in boot-framework.oat (-8Kb).

Test: 458-checker-instruct-simplification
Test: test-art-target, test-art-host.
Change-Id: Ifbe45097dc2b5f098176fa1a1d023ea90b76d396
28e012a4af2d710e5e5f824709ffd6432e4f549f 07-Dec-2017 Vladimir Marko <vmarko@google.com> Determine HLoadClass/String load kind early.

This helps save memory by avoiding the allocation of
HEnvironment and related objects for AOT references to
boot image strings and classes (kBootImage* load kinds)
and also for JIT references (kJitTableAddress).

Compiling aosp_taimen-userdebug boot image, the most memory
hungry method BatteryStats.dumpLocked() needs
- before:
Used 55105384 bytes of arena memory...
...
UseListNode 10009704
Environment 423248
EnvVRegs 20676560
...
- after:
Used 50559176 bytes of arena memory...
...
UseListNode 8568936
Environment 365680
EnvVRegs 17628704
...

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 34053922
Change-Id: I68e73a438e6ac8e8908e6fccf53bbeea8a64a077
dbb9aef046301940d0b253c918a5c78b277330ba 23-Nov-2017 Nicolas Geoffray <ngeoffray@google.com> Log at places we fail to compile.

Useful when diagnosing some compiler issues / limitations.

Test: test.py

Change-Id: I8759d0e78b0682b300ddcadfe02793432cab2036
75bb2f3c85330b2aeba9e0a4a25f7eb059bcd754 30-Nov-2017 Mingyao Yang <mingyao@google.com> Type conversion elimination of constants

A better way of eliminating type conversion for constants.

Test: run-test on host. 711-checker-type-conversion.
Change-Id: I457bc091542a5ac4cc4e77cadb012ee7cb040ce8
46721ef33e8f5cd405c291d72e3f259e3085fb5f 05-Oct-2017 Mingyao Yang <mingyao@google.com> Don't merge values for exit block in LSE.

This enables some additional optimizations since exit block doesn't
really merge values.

Test: run-test on host.
Change-Id: I21ed7e0e43a3bc5d9ed2dabfad8462129b904eb7
cced8ba4245a061ab047a0a6882468d75d619dd9 19-Jul-2017 Artem Serov <artem.serov@linaro.org> ART: Introduce individual HInstruction cloning.

Introduce API for HInstruction cloning, support it for a few
instructions. add a gtest.

Test: cloner_test.cc, test-art-target, test-art-host.

Change-Id: I8b6299be5d04a26390d9ef13a20ce82ee5ae4afe
69d310e0317e2fce97bf8c9c133c5c2c0332e61d 09-Oct-2017 Vladimir Marko <vmarko@google.com> Use ScopedArenaAllocator for building HGraph.

Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.

And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).

Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
ca6fff898afcb62491458ae8bcd428bfb3043da1 03-Oct-2017 Vladimir Marko <vmarko@google.com> ART: Use ScopedArenaAllocator for pass-local data.

Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.

This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
d5d2f2ce627aa0f6920d7ae05197abd1a396e035 26-Sep-2017 Vladimir Marko <vmarko@google.com> ART: Introduce Uint8 compiler data type.

This CL adds all the necessary codegen for the Uint8 type
but does not add code transformations that use that code.
Vectorization codegens are modified to use Uint8 as the
packed type when appropriate. The side effects are now
disconnected from the instruction's type after the graph has
been built to allow changing HArrayGet/H*FieldGet/HVecLoad
to use a type different from the underlying field or array.

Note: HArrayGet for String.charAt() is modified to have
no side effects whatsoever; Strings are immutable.

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: testrunner.py --target --optimizing on Nexus 6P
Test: Nexus 6P boots.
Bug: 23964345
Change-Id: If2dfffedcfb1f50db24570a1e9bd517b3f17bfd0
0ebe0d83138bba1996e9c8007969b5381d972b32 21-Sep-2017 Vladimir Marko <vmarko@google.com> ART: Introduce compiler data type.

Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.

Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
94ec2db21332ee1dcdbbf254b99a9a999a304fe0 06-Sep-2017 Vladimir Marko <vmarko@google.com> Use mmapped boot image class table for PIC app HLoadClass.

Implement new HLoadClass load kind for boot image classes
referenced by PIC-compiled apps (i.e. prebuilts) that uses
PC-relative load from a boot image ClassTable mmapped into
the apps .bss. This reduces the size of the PIC prebuilts
that reference boot image classes compared to the kBssEntry
as we can completely avoid the slow path and stack map
unless we need to do the class initialization check.

Prebuilt services.odex for aosp_angler-userdebug (arm64):
- before: 20312800
- after: 19775352 (-525KiB)

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --host --pictest
Test: testrunner.py --target on Nexus 6P.
Test: testrunner.py --target --pictest on Nexus 6P.
Test: Nexus 6P boots.
Bug: 31951624
Change-Id: I13adb19a1fa7d095a72a41f09daa6101876e77a8
dd018df8a00e841fe38fabe38520b7d297a885c1 09-Aug-2017 Igor Murashkin <iam@google.com> optimizing: add block-scoped constructor fence merging pass

Introduce a new "Constructor Fence Redundancy Elimination" pass.
The pass currently performs local optimization only, i.e. within instructions
in the same basic block.

All constructor fences preceding a publish (e.g. store, invoke) get
merged into one instruction.

==============

OptStat#ConstructorFenceGeneratedNew: 43825
OptStat#ConstructorFenceGeneratedFinal: 17631 <+++
OptStat#ConstructorFenceRemovedLSE: 164
OptStat#ConstructorFenceRemovedPFRA: 9391
OptStat#ConstructorFenceRemovedCFRE: 16133 <---

Removes ~91.5% of the 'final' constructor fences in RitzBenchmark:

(We do not distinguish the exact reason that a fence was created, so
it's possible some "new" fences were also removed.)

==============

Test: art/test/run-test --host --optimizing 476-checker-ctor-fence-redun-elim
Bug: 36656456
Change-Id: I8020217b448ad96ce9b7640aa312ae784690ad99
6cfbdbc359ec5414d3e49f70d28f8c0e65b98d63 25-Jul-2017 Vladimir Marko <vmarko@google.com> Use mmapped boot image intern table for PIC app HLoadString.

Implement new HLoadString load kind for boot image strings
referenced by PIC-compiled apps (i.e. prebuilts) that uses
PC-relative load from a boot image InternTable mmapped into
the apps .bss. This reduces the size of the PIC prebuilts
that reference boot image strings compared to the kBssEntry
as we can completely avoid the slow path and stack map.

We separate the InternedStrings and ClassTable sections of
the boot image (.art) file from the rest, aligning the
start of the InternedStrings section to a page boundary.
This may actually increase the size of the boot image file
by a page but it also allows mprotecting() these tables as
read-only. The ClassTable section is included in
anticipation of a similar load kind for HLoadClass.

Prebuilt services.odex for aosp_angler-userdebug (arm64):
- before: 20862776
- after: 20308512 (-541KiB)
Note that 92KiB savings could have been achieved by simply
avoiding the read barrier, similar to the HLoadClass flag
IsInBootImage(). Such flag is now unnecessary.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --host --pictest
Test: testrunner.py --target on Nexus 6P.
Test: testrunner.py --target --pictest on Nexus 6P.
Test: Nexus 6P boots.
Bug: 31951624
Change-Id: I5f2bf1fc0bb36a8483244317cfdfa69e192ef6c5
6ef45677305048c2bf0600f1c4b98a11b2cfaffb 08-Aug-2017 Igor Murashkin <iam@google.com> optimizing: Add statistics for # of constructor fences added/removed

Statistics are attributed as follows:

Added because:
* HNewInstances requires a HConstructorFence following it.
* HReturn requires a HConstructorFence (for final fields) preceding it.

Removed because:
* Optimized in Load-Store-Elimination.
* Optimized in Prepare-For-Register-Allocation.

Test: art/test.py
Bug: 36656456
Change-Id: Ic119441c5151a5a840fc6532b411340e2d68e5eb
16e528957869c7debb1f6758c9a364819e15ee1a 14-Jul-2017 Mads Ager <ager@google.com> RFC: Generate select instruction for conditional returns.

The select generator currently only inserts select instructions
if there is a diamond shape with a phi.

This change extends the select generator to also deal with the
pattern:

if (condition) {
movable instruction 0
return value0
} else {
movable instruction 1
return value1
}

which it turns into:

moveable instruction 0
moveable instruction 1
return select (value0, value1, condition)

Test: 592-checker-regression-bool-input
Change-Id: Iac50fb181dc2c9b7619f28977298662bc09fc0e1
c73ee37b76494253862ee17933acfe2b88de1a01 31-Jul-2017 Artem Serov <artem.serov@linaro.org> ART: Fix loop header's predecessors reordering in SimplifyLoops.

Fix the issue when after loop header's predecessors reordering in
SimplifyLoops phi inputs are not reordered correspondingly.

Test: loop_optimization_test.cc, test-art-host, test-art-target.

Change-Id: I8a251a0a953d751f9bb67da58181e47d225d90e6
21c7e6fbcabef2f22b467e1e89f4abe1aa43e459 27-Jul-2017 Artem Serov <artem.serov@linaro.org> ART: Fix SimplifyInduction for an instruction with HEnvironment.

After an instruction is removed during RemoveFromCycle its
environment isn't properly cleaned: it still has input instructions
present and registered (those instructions still hold records for
that).

Test: test-art-target, test-art-host.
Change-Id: Iea315bdf735d75fe477f43671f05b40dfecc63a8
8cf9cb386cd9286d67e879f1ee501ec00d72a4e1 19-Jul-2017 Andreas Gampe <agampe@google.com> ART: Include cleanup

Let clang-format reorder the header includes.

Derived with:

* .clang-format:
BasedOnStyle: Google
IncludeIsMainRegex: '(_test|-inl)?$'

* Steps:
find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup'
git-clang-format -style=file HEAD^
manual inspection
git commit -a --amend

Test: mmma art
Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
a4b58ed1ef8dceafbffcfa88bf2c11144e302d18 23-Jun-2017 George Burgess IV <gbiv@google.com> Fix static analyzer warning

nodes.cc: warning: Access to field 'next_' results in a dereference of a
null pointer (loaded from field 'last_instruction_')

This was split from
https://android-review.googlesource.com/#/c/416101/. Please see the
discussion nodes.cc (patch set 1) for why this warning is triggered.

Bug: 32619234
Test: test-art-host. Rebuilt ART with the analyzer to verify that these
issues are gone.

Change-Id: Id5da00ceee0667441233153a7971d238ea8c8650
0eb882bfc5d260e8014c26adfda11602065aa5d8 15-May-2017 Vladimir Marko <vmarko@google.com> Use ArtMethod* .bss entries for HInvokeStaticOrDirect.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --target
Test: Nexus 6P boots.
Test: Build aosp_mips64-userdebug.
Bug: 30627598
Change-Id: I0e54fdd2e91e983d475b7a04d40815ba89ae3d4f
e7197bf7d58c705a048e13e241d7ca320502cd40 02-Jun-2017 Vladimir Marko <vmarko@google.com> Replace invoke kind kDexCacheViaMethod with kRuntimeCall.

In preparation for replacing the dex cache method array
with a hash-based array, get rid of one unnecessary use.
This method load kind is currently used only on mips for
irreducible loops and OSR, so this should have no impact
on x86/x86-64/arm/arm64.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Repeat the above tests with manually changing
kDexCachePcRelative to kRuntimeCall in sharpening.cc.
(Ignore failures in 552-checker-sharpening.)
Bug: 30627598
Change-Id: Ifce42645f2dcc350bbb88c2f4642e88fc5f98152
847e6ce98b4b822fd94c631975763845978ebaa3 02-Jun-2017 Vladimir Marko <vmarko@google.com> Rename kDexCacheViaMethod to kRuntimeCall for HLoadClass/String.

The old name does not reflect the actual code anymore.

Test: testrunner.py --host
Change-Id: I2e13cf727bba9d901c4d3fc821bb526d38a775b8
19d7d50c0ae23f2041b1aee4aa8a5b789e3d2020 24-May-2017 Vladimir Marko <vmarko@google.com> ARM64: Fix IsAdrpPatch().

kMethodRelative patch can be an ADRP patch. Note that this
minor bug had no real consequences as the patch would later
be filtered out in NeedsErratum843419Thunk() because we
generate the instruction sequence ADRP+ADD in this case.

Also change the output for kDirectAddress from "Direct" to
"DirectAddress". This kind is currently unused in checker
tests.

Test: Rely on TreeHugger.
Bug: 30627598
Change-Id: I4e7c033aa010bbff53a554bfadd2ba0f03f69528
6597946d29be9108e2cc51223553d3db9290a3d9 19-May-2017 Vladimir Marko <vmarko@google.com> Use PC-relative pointer to boot image methods.

In preparation for adding ArtMethod entries to the .bss
section, add direct PC-relative pointers to methods so that
the number of needed .bss entries for boot image is small.

Test: m test-art-host-gtest
Test: testrunner.py --host
Test: testrunner.py --target on Nexus 6P
Test: Nexus 6P boots.
Test: Build aosp_mips64-userdebug
Bug: 30627598
Change-Id: Ia89f5f9975b741ddac2816e1570077ba4b4c020f
79d8fa7c52c1810d4618c9bd1d43994be5abb53d 18-Apr-2017 Igor Murashkin <iam@google.com> optimizing: Build HConstructorFence for HNewArray/HNewInstance nodes

Also fixes:
* LSE, code_sinking to keep optimizing new-instance if it did so before
* Various tests to expect constructor fences after new-instance

Sidenote: new-instance String does not get a ConstructorFence; the
special StringFactory calls are assumed to be self-fencing.

Metric changes on go/lem:
* CodeSize -0.262% in ART-Compile (ARMv8)
* RunTime -0.747% for all (linux-armv8)

(No changes expected to x86, constructor fences are no-op).

The RunTime regression is temporary until art_quick_alloc_* entrypoints have their
DMBs removed in a follow up CL.

Test: art/test.py
Bug: 36656456
Change-Id: I6a936a6e51c623e1c6b5b22eee5c3c72bebbed35
764d454d1d51448deb81f6e8d2d7d317c7f4d1b4 16-May-2017 Vladimir Marko <vmarko@google.com> Remove LoadString/Class kind kBootImageLinkTimeAddress.

We no longer support non-PIC boot image compilation.

Also clean up some obsolete code for method patches
and make JIT correctly report itself as non-PIC.

Test: testrunner.py --host
Test: testrunner.py --target
Bug: 33192586
Change-Id: I593289c5c1b0e88b82b86a933038be97bbb15ad2
79efadfdd861584f1c47654ade975eae6c43c360 08-May-2017 Nicolas Geoffray <ngeoffray@google.com> Add runtime reasons for deopt.

Currently to help investigate. Also:
1) Log when deoptimization happens (which method and what reason)
2) Trace when deoptimization happens (to make it visible in systrace)

bug:37655083
Test: test-art-host test-art-target

(cherry picked from commit 4e92c3ce7ef354620a785553bbada554fca83a67)

Change-Id: I992398a1038ab61ea0e5106af6b6ad0a3305312e
4e92c3ce7ef354620a785553bbada554fca83a67 08-May-2017 Nicolas Geoffray <ngeoffray@google.com> Add runtime reasons for deopt.

Currently to help investigate. Also:
1) Log when deoptimization happens (which method and what reason)
2) Trace when deoptimization happens (to make it visible in systrace)

bug:37655083
Test: test-art-host test-art-target
Change-Id: I0c2d87b40db09e8e475cf97a7c784a034c585e97
d01745ef88bfd25df574a885d90a1a7785db5f5b 06-Apr-2017 Igor Murashkin <iam@google.com> optimizing: constructor fence redundancy elimination - remove dmb after LSE

Part one of a few upcoming CLs to optimize constructor fences.

This improves load-store-elimination; all singleton objects that are not
returned will have their associated constructor fence removed.

If the allocation is removed, so is the fence. Even if allocation is not
removed, fences can sometimes be removed.

This change is enabled by tracking the "this" object associated with the
constructor fence as an input. Fence inputs are considered weak; they do not keep
the "this" object alive; if the instructions for "this" are all deleted,
the fence can also be deleted.

Bug: 36656456
Test: art/test.py --host && art/test.py --target
Change-Id: I05659ab07e20d6e2ecd4be051b722726776f4ab1
c6ea7d00ad069a2736f603daa3d8eaa9a1f8ea11 02-Feb-2017 Andreas Gampe <agampe@google.com> ART: Clean up art_method.h

Clean up the header. Fix up other headers including the -inl file,
in an effort to prune the include graph. Fix broken transitive
includes by making includes explicit. Introduce new -inl files
for method handles and reference visiting.

Test: source build/envsetup.sh && lunch aosp_angler-userdebug && mmma art
Test: source build/envsetup.sh && lunch aosp_mips64-userdebug && mmma art
Change-Id: I8f60f1160c2a702fdf3598149dae38f6fa6bc851
b07d1bcf055d3eb8c6b4c45b359ad8ef30909af7 05-Apr-2017 Aart Bik <ajcbik@google.com> Ensure environment is ready when populating loop.

Rationale:
OSR requires the suspend check to already have an environment,
albeit just for testing irreducible loops. This CL fixes the
omission. Note, the error is spurious on OSR and writing a
unit or regression test for this is hard.

Test: test-art-host
Bug: 36950873
Change-Id: Ica89e18e10deb438dead79e2cc40dd00a60b529f
f8f5a16ed7bad1e18179e38453e59c96a944de10 07-Feb-2017 Aart Bik <ajcbik@google.com> ART vectorizer.

Rationale:
Make SIMD great again with a retargetable and easily extendable vectorizer.

Provides a full x86/x86_64 and a proof-of-concept ARM implementation. Sample
improvement (without any perf tuning yet) for Linpack on x86 is about 20% to 50%.

Test: test-art-host, test-art-target (angler)
Bug: 34083438, 30933338

Change-Id: Ifb77a0f25f690a87cd65bf3d5e9f6be7ea71d6c1
dd0fc0481cdc4e04abb7b7300e76edbd2f07f011 28-Mar-2017 Nicolas Geoffray <ngeoffray@google.com> Make data dependency around HDeoptimize correct.

We use HDeoptimize in a few places, but when it comes to data
dependency we either:
- don't have any (BCE, CHA), in which case we should make sure no
code that the deoptimzation guards moves before the HDeoptimize
- have one on the receiver (inline cache), in which case we can
update the dominated users with the HDeoptimize to get the data
dependency correct.

bug:35661819
bug:36371709
test: 644-checker-deopt

Change-Id: I30b750f97b656dede9e10e7e43ac02c8604d7b7a
(cherry picked from commit 6f8e2c9913b24f746a154dda700f609cee3095f9)
6f8e2c9913b24f746a154dda700f609cee3095f9 23-Mar-2017 Nicolas Geoffray <ngeoffray@google.com> Make data dependency around HDeoptimize correct.

We use HDeoptimize in a few places, but when it comes to data
dependency we either:
- don't have any (BCE, CHA), in which case we should make sure no
code that the deoptimzation guards moves before the HDeoptimize
- have one on the receiver (inline cache), in which case we can
update the dominated users with the HDeoptimize to get the data
dependency correct.

bug:35661819
bug:36371709
test: 644-checker-deopt
Change-Id: I4820c6710b06939e7f5a59606971693e995fb958
b13c65bb46544821a84ff2106d0710d77b0fb463 22-Mar-2017 Aart Bik <ajcbik@google.com> Saves full XMM state along suspend check's slow path.

Rationale:
Break-out CL of ART Vectorizer. We need to save 128-bit
of data (default ABI of ART runtime only saves 64-bit)
Note that this is *only* done for xmm registers that
are live, so overhead is not too big.

Bug: 34083438
Test: test-art-host
Change-Id: Ic89988b0acb0c104634271d0c6c3e29b6596d59b
1eede6ae9b08d305d0c1123284ff958373916474 02-Mar-2017 Nicolas Geoffray <ngeoffray@google.com> Don't inline methods that throw in graph with irreducible loops.

Re-computing the loop information is not supported in graphs
with irreducible loops, as it is not deterministic, and the
loop header of a loop could change. That would lead to having the
suspend check in the wrong block.

Test: test-art-host
Test: 641-irreducible-inline
bug: 35757766
Change-Id: I6a435885461fbeca035e4f5d94f055fc3262adca
69d75ffac23fe1e655b7e81f0454c2841280dc1f 07-Feb-2017 Mingyao Yang <mingyao@google.com> Skip loop optimization if there is no loop in the graph.

LinearizeGraph() does quite some allocations.
Also add some comments on the possible false positives of
some flags.

Test: m test-art-host
Change-Id: I80ef89a2dc031d601e7621d0b22060cd8c17fae3
fdb7d638c17dab47984e1d325d04796bb426d9b3 08-Feb-2017 Nicolas Geoffray <ngeoffray@google.com> Inline methods that throw.

Forked from https://android-review.googlesource.com/214292

test: 637-checker-throw-inline
bug: 30933338
Change-Id: I184be82dfab0710af3f3796e9e486c7817fa9c60
83c8e27a292e6e002fb3b3def75cf6d8653378e8 31-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Code refactoring around sharpening HLoadClass.

Even if the class is not accessible through the dex cache, we
can access it by other means (eg boot class, jit table). So rewrite
static field access instruction builder to not bail out if a class
cannot be accessed through the dex cache.

bug:34966607

test: test-art-host test-art-target
Change-Id: I88e4e09951a002b480eb8f271726b56f981291bd
22aa54bf8469689c7c6c33f15ff4df2ffba8fa15 18-Oct-2016 Alexandre Rames <alexandre.rames@linaro.org> AArch64: Add HInstruction scheduling support.

This commit adds a new `HInstructionScheduling` pass that performs
basic scheduling on the `HGraph`.

Currently, scheduling is performed at the block level, so no
`HInstruction` ever leaves its block in this pass.

The scheduling process iterates through blocks in the graph. For
blocks that we can and want to schedule:
1) Build a dependency graph for instructions. It includes data
dependencies (inputs/uses), but also environment dependencies and
side-effect dependencies.
2) Schedule the dependency graph. This is a topological sort of the
dependency graph, using heuristics to decide what node to schedule
first when there are multiple candidates. Currently the heuristics
only consider instruction latencies and schedule first the
instructions that are on the critical path.

Test: m test-art-host
Test: m test-art-target

Change-Id: Iec103177d4f059666d7c9626e5770531fbc5ccdc
1ea9efcbd22127b75865f9a7c2949e20f5553744 16-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Acquire the mutator lock before comparing classes/strings.

Scratch my initial thought we woudldn't need it because the
handlescope is visited during the pause: as the compiler thread
is in state native, the GC can concurrently update the handlescope,
leading to false negatives when doing class/string equality.

bug:34240874
test: test-art-host gcstress
Change-Id: Icda0722fb49300a7de57e1c5d1efaa9e8dbda83f
5247c08fb186a5a2ac02226827cf6b994f41a681 13-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Put the resolved class in HLoadClass.

To avoid repeated lookups in sharpening/rtp/inlining.

Test: test-art-host test-art-target
Change-Id: I08d0da36a4bb061cdaa490ea2af3a3217a875bbe
5d37c152f21a0807459c6f53bc25e2d84f56d259 12-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Put inlined ArtMethod pointer in stack maps.

Currently done for JIT. Can be extended for AOT and inlined boot
image methods.

Also refactor the lookup of a inlined method at runtime to not
rely on the dex cache, but look at the class loader tables.

bug: 30933338
test: test-art-host, test-art-target
Change-Id: I58bd4d763b82ab8ca3023742835ac388671d1794
6bec91c7d4670905cd67440991ec76fd54d0f000 09-Jan-2017 Vladimir Marko <vmarko@google.com> Store resolved types for AOT code in .bss.

Test: m test-art-host
Test: m test-art-target on Nexus 9.
Test: Nexus 9 boots.
Test: Build aosp_mips64-eng.
Bug: 30627598
Bug: 34193123
Change-Id: I8ec60a98eb488cb46ae3ea56341f5709dad4f623
48886c2ee655a16224870fee52dc8721a52babcf 06-Jan-2017 Vladimir Marko <vmarko@google.com> Remove HLoadClass::LoadKind::kDexCachePcRelative.

Test: m test-art-host
Test: m test-art-target-run-test-552-checker-sharpening
Bug: 30627598
Change-Id: Ic809b0f3a8ed0bd4dc7ab67aa64866f9cdff9bdb
6b69e0acb0e4c506ce2587e362c38e36e41e34ab 11-Jan-2017 Aart Bik <ajcbik@google.com> Complete unrolling of loops with small body and trip count one.

Rationale:
Avoids the unnecessary loop control overhead, suspend check,
and exposes more opportunities for constant folding in the
resulting loop body. Fully unrolls loop in execute() of
the Dhrystone benchmark (3% to 8% improvements).

Test: test-art-host

Change-Id: If30f38caea9e9f87a929df041dfb7ed1c227aba3
f0acfe7a812a332122011832074142718c278dae 09-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Keep resolved String in HLoadString.

For the following reasons:
- Avoids needing to do a lookup again in CodeGenerator::EmitJitRoots.
- Fixes races where we the string was GC'ed before CodeGenerator::EmitJitRoots.
- Makes it possible to do GVN on the same string but defined in different
dex files.

Test: test-art-host, test-art-target
Change-Id: If2b5d3079f7555427b1b96ab04546b3373fcf921
c1a42cf3873be202c8c0ca3c4e67500b470ab075 18-Dec-2016 Nicolas Geoffray <ngeoffray@google.com> Remove soon to be obsolete call kinds for direct calls.

And remove CompilerDriver::GetCodeAndMethodForDirectCall in
preparation of removing non-PIC prebuild and non-PIC on-device
boot image compilation.

Test: test-art-host test-art-target
bug:33192586
Change-Id: Ic48e3e8b9d7605dd0e66f31d458a182198ba9578
b0b051ad6c9fab511346882650d5d689f805a980 17-Nov-2016 Mingyao Yang <mingyao@google.com> CHA guard optimization (elimination/hoisting).

Test: manual by checking the dump-cfg output.

Change-Id: I254e168b9a85d2d3d23e02eea7e129c1bc9ab920
a9dbe8333d4df5447157fe575a805003172af047 15-Dec-2016 Mingyao Yang <mingyao@google.com> Add HVariableInputSizeInstruction.

Make HPhi and HInvoke subclasses of the new instruction.

Test: m test-art-host-run-test
Change-Id: I303c725876f1f4407b98702d92370be25193fc53
9b1583e7d799a3bb3c0036abb8a0b9fcbfad360a 13-Dec-2016 Nicolas Geoffray <ngeoffray@google.com> Support GVN for HLoadClass::LoadKind::kJitTableAddress.

Fixes performance regressions seen in eg Dhrystone.

Also add comment on why a class may not be found when sharpening.

Test: manual Dhrystone run, performance recovers
Test: ART_TEST_JIT=true test-art-host-run-test-jit
Change-Id: I8e879f1c390f83e8bc930f343beb7b4a41c2f190
22384aeab988df7fa5ccdc48a668589c5f602c39 12-Dec-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Add kJitTableAddress for HLoadClass.""

This reverts commit d2d5262c8370309e1f2a009f00aafc24f1cf00a0.

Change-Id: I6149d5c7d5df0b0fc5cb646a802a2eea8d01ac08
d2d5262c8370309e1f2a009f00aafc24f1cf00a0 12-Dec-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "Add kJitTableAddress for HLoadClass."

One test failure after merge.

This reverts commit 5b12f7973636bfea29da3956a9baa7a6bbe2b666.

Change-Id: I120c49e53274471fc1c82a10d52e99c83f5f85cc
5b12f7973636bfea29da3956a9baa7a6bbe2b666 09-Dec-2016 Nicolas Geoffray <ngeoffray@google.com> Add kJitTableAddress for HLoadClass.

This new kind loads classes from the root table associated with
JIT compiled code.

Also remove kDexCacheAddress, which is replaced by kJitTableAddress.

test: ART_TEST_JIT=true test-art-host-jit test-art-target-jit
Change-Id: Ia23029688d1a60c178bf2ffa7463927c5d5de4d0
be44dcf6b2338782c276483c6c79a4ab17ca98a9 30-Nov-2016 Mingyao Yang <mingyao@google.com> Add LoadString kind of kJitTableAddress for dump-cfg.

Test: manual

Change-Id: Ifcae7b26f666930766635d8b3ed7a495494cddf7
c757decb04d8535fd806b9bce1c2fe5e52c228dc 04-Nov-2016 David Sehr <sehr@google.com> Do not inline loops without exit edges

Fixes an issue with LinearOrder after inlining a function containing a
loop that has no exit edge. The failure is due to incorrect loop
information being computed for blocks that are not on a path to the
inlined function's return. They should not be considered part of the
caller's enclosing loop, but are today.

Bug: 32547653
Test: run-test --host 478-checker-inline-noreturn
Change-Id: I9694a1cb861430051c801d07f7ce29752332cba5
661b69bd7b19f90b0c0cba7f354e61f7ff9cc583 09-Nov-2016 Vladimir Marko <vmarko@google.com> Reduce arena memory usage when changing graph structure.

Use swap() to reuse previously allocated memory. Use
range insertion instead of individual push_back()s.

Test: m test-art-host.
Change-Id: I7174f8cc66937f34b452ff7268a8806ca94e7573
54d6a207341ad45cb5eceed71a344073ed6d4e31 09-Nov-2016 Vladimir Marko <vmarko@google.com> Fix 552-checker-sharpening for PIC test.

And remove obsolete HLoadString::LoadKind::kDexCacheAddress.

Test: m ART_TEST_PIC_TEST=true test-art-host
Change-Id: I3e7a1a98c2c7eba5ea10954d7efcf743a807c300
2c45bc9137c29f886e69923535aff31a74d90829 25-Oct-2016 Vladimir Marko <vmarko@google.com> Remove H[Reverse]PostOrderIterator and HInsertionOrderIterator.

Use range-based loops instead, introducing helper functions
ReverseRange() for iteration in reverse order in containers.
When the contents of the underlying container change inside
the loop, use an index-based loop that better exposes the
container data modifications, compared to the old iterator
interface that's hiding it which may lead to subtle bugs.

Test: m test-art-host
Change-Id: I2a4e6c508b854c37a697fc4b1e8423a8c92c5ea0
709b070044354d9f47641f273edacaeeb0240ab7 13-Oct-2016 David Sehr <sehr@google.com> Remove mirror:: and ArtMethod deps in utils.{h,cc}

The latest chapter in the ongoing saga of attempting to dump a DEX
file without having to start a whole runtime instance. This episode
finds us removing references to ArtMethod/ArtField/mirror.

One aspect of this change that I would like to call out specfically
is that the utils versions of the "Pretty*" functions all were written
to accept nullptr as an argument. I have split these functions up as
follows:
1) an instance method, such as PrettyClass that obviously requires
this != nullptr.
2) a static method, that behaves the same way as the util method, but
calls the instance method if p != nullptr.
This requires using a full class qualifier for the static methods,
which isn't exactly beautiful. I have tried to remove as many cases
as possible where it was clear p != nullptr.

Bug: 22322814
Test: test-art-host
Change-Id: I21adee3614aa697aa580cd1b86b72d9206e1cb24
e8a3c576301fd531d5f73a65fc8b84a63619d580 12-Oct-2016 Mathieu Chartier <mathieuc@google.com> Replace StackHandleScopeCollection with VariableSizedHandleScope

VariableSizedHandleScope's internal handle scopes are not pushed
directly on the thread. This means that it is safe to intermix with
other types of handle scopes.

Added test.

Test: clean-oat-host && test-art-host

Change-Id: Id2fd1155788428f394d49615d337d9134824c8f0
9620230700d4b451097c2163faa70627c9d8088a 05-Oct-2016 Aart Bik <ajcbik@google.com> Refactoring of graph linearization and linear order.

Rationale:
Ownership of graph's linear order and iterators was
a bit unclear now that other phases are using it.
New approach allows phases to compute their own
order, while ssa_liveness is sole owner for graph
(since it is not mutated afterwards).

Also shortens lifetime of loop's arena.

Test: test-art-host
Change-Id: Ib7137d1203a1e0a12db49868f4117d48a4277f30
aad75c6d5bfab2dc8e30fc99fafe8cd2dc8b74d8 03-Oct-2016 Vladimir Marko <vmarko@google.com> Revert "Revert "Store resolved Strings for AOT code in .bss.""

Fixed oat_test to keep dex files alive. Fixed mips build.
Rewritten the .bss GC root visiting and added write barrier
to the artResolveStringFromCode().

Test: build aosp_mips-eng
Test: m ART_DEFAULT_GC_TYPE=SS test-art-target-host-gtest-oat_test
Test: Run ART test suite on host and Nexus 9.
Bug: 20323084
Bug: 30627598

This reverts commit 5f926055cb88089d8ca27243f35a9dfd89d981f0.

Change-Id: I07fa2278d82b8eb64964c9a4b66cb93726ccda6b
281c681a0852c10f5ca99b351650b244e878aea3 26-Aug-2016 Aart Bik <ajcbik@google.com> A first implementation of a loop optimization framework.

Rationale:
We are planning to add more and more loop related optimizations
and this framework provides the basis to do so. For starters,
the framework optimizes dead induction, induction that can be
replaced with a simpler closed-form, and eliminates dead loops
completely (either pre-existing or as a result of induction
removal).

Speedup on e.g. Benchpress Loop is 73x (17.5us. -> 0.24us.)
[with the potential for more exploiting outer loop too]

Test: 618-checker-induction et al.

Change-Id: If80a809acf943539bf6726b0030dcabd50c9babc
5f926055cb88089d8ca27243f35a9dfd89d981f0 30-Sep-2016 Vladimir Marko <vmarko@google.com> Revert "Store resolved Strings for AOT code in .bss."

There are some issues with oat_test64 on host and aosp_mips-eng.

Also reverts "compiler_driver: Fix build."

Bug: 20323084
Bug: 30627598

This reverts commit 63dccbbefef3014c99c22748d18befcc7bcb3b41.
This reverts commit 04a44135ace10123f059373691594ae0f270a8a4.

Change-Id: I568ba3e58cf103987fdd63c8a21521010a9f27c4
0795f23920ee9aabf28e45c63cd592dcccf00216 28-Sep-2016 Mathieu Chartier <mathieuc@google.com> Clean up ScopedThreadStateChange to use ObjPtr

Also fixed inclusion of -inl.h files in .h files by adding
scoped_object_access-inl.h and scoped_fast_natvie_object_access-inl.h

Changed AddLocalReference / Decode to use ObjPtr.

Changed libartbenchmark to be debug to avoid linkage errors.

Bug: 31113334

Test: test-art-host

Change-Id: I4d2e160483a29d21e1e0e440585ed328b9811483
63dccbbefef3014c99c22748d18befcc7bcb3b41 21-Sep-2016 Vladimir Marko <vmarko@google.com> Store resolved Strings for AOT code in .bss.

And do some related refactorings.

Bug: 20323084
Bug: 30627598
Test: Run ART test suite including gcstress on host and Nexus 9.
Test: Run ART test suite including gcstress with baker CC on host and Nexus 9.
Test: Build aosp_mips64-eng.
Change-Id: I1b12c1570fee8e5da490b47f231050142afcbd1e
20e9db6db787e007e7032878c9899b28ec43e93f 14-Sep-2016 Aart Bik <ajcbik@google.com> Make LinearizeGraph() public (and move it to nodes files)

Rationale:
It is strange that HLinearOrderIterator is defined (and visible)
in nodes.h, but clients have no way to build this order. This CL
makes the building available at the usual place.

Change-Id: Ib66f2edf6dfc8edd6b429bd4bea3ac7e37440b28
Tests: m test-art
bdf7f1c3ab65ccb70f62db5ab31dba060632d458 31-Aug-2016 Andreas Gampe <agampe@google.com> ART: SHARED_REQUIRES to REQUIRES_SHARED

This coincides with the actual attribute name and upstream usage.
Preparation for deferring to libbase.

Test: m
Test: m test-art-host
Change-Id: Ia8986b5dfd926ba772bf00b0a35eaf83596d8518
75d2df275abedf90be8c8e31202389788c3b079b 28-Jul-2016 Andreas Gampe <agampe@google.com> ART: Use old operator<< for MemBarrierKind

To fix assumptions in the tests.

Test: m test-art-host
Change-Id: Ie2e738524ee8f75a3c41730f6a1449dab490e535
26de38bb7f2122417388809f4ff88a7cb5c4af5e 28-Jul-2016 Andreas Gampe <agampe@google.com> ART: Delete old compiler_enums.h

Holdover from the Quick days. Move the two enums that are still
used closer to the actual users (and prune no longer used cases).

Test: m test-art-host
Change-Id: I88aa49961a54635788cafac570ddc3125aa38262
e90049140fdfb89080e5cc9b000b0c9be8c18bcd 16-Jun-2016 Vladimir Marko <vmarko@google.com> Create a typedef for HInstruction::GetInputs() return type.

And some other cleanup after
https://android-review.googlesource.com/230742

Test: No new tests. ART test suite passed (tested on host).
Change-Id: I4743bf17544d0234c6ccb46dd0c1b9aae5c93e17
dbb7f5bef10138ade0fb202da1d61f562b2df649 30-Mar-2016 Vladimir Marko <vmarko@google.com> Improve HLoadClass code generation.

For classes in the boot image, use either direct pointers
or PC-relative addresses. For other classes, use PC-relative
access to the dex cache arrays for AOT and direct address of
the type's dex cache slot for JIT.

For aosp_flounder-userdebug:
- 32-bit boot.oat: -252KiB (-0.3%)
- 64-bit boot.oat: -412KiB (-0.4%)
- 32-bit dalvik cache total: -392KiB (-0.4%)
- 64-bit dalvik-cache total: -2312KiB (-1.0%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -124KiB (-0.2%)
- 64-bit boot.oat: -420KiB (-0.5%)
- 32-bit dalvik cache total: -136KiB (-0.1%)
- 64-bit dalvik-cache total: -1136KiB (-0.5%)
(contains more files than the 32-bit dalvik cache)

Bug: 27950288
Change-Id: I4da991a4b7e53c63c92558b97923d18092acf139
206fbf51ccd2dbbcf106e480cc6c5bb96fc7befb 10-Jun-2016 Nicolas Geoffray <ngeoffray@google.com> Remove too aggressive DCHECKs.

A class can move from a state greater or equal than resolved
to erroneous concurrently to the verifier or the compiler.

bug:29239283
Change-Id: I89f3fe1c1d9556c6c99b8e005b3ec02de7f01b85
(cherry picked from commit f7d994622aabcc689f62253a9253e0c67d9e787e)
f7d994622aabcc689f62253a9253e0c67d9e787e 10-Jun-2016 Nicolas Geoffray <ngeoffray@google.com> Remove too aggressive DCHECKs.

A class can move from a state greater or equal than resolved
to erroneous concurrently to the verifier or the compiler.

bug:29239283
Change-Id: I89f3fe1c1d9556c6c99b8e005b3ec02de7f01b85
d6c205eb9b04bcfa072cd5ffdd93deef167ec340 07-Jun-2016 David Brazdil <dbrazdil@google.com> ART: Remove redundant MoveInstructionBefore method

Change-Id: If53d7011197cc6b9c1702a3d98ef11b59eb76f0c
f479d7758dbe7e8740386fbf1d73e05b0277c5e3 16-May-2016 Anton Shamin <anton.shamin@intel.com> ART: ArrayGet hoisting restriction added.

Currently if we hoist ArrayGet from loop there is no guarantee
that insn will be executed at runtime. Because of that we could
face issues like crashes in generated code.

This patch introduces restriction for ArrayGet hoisting. We say
that ArrayGet execution is guaranteed at least one time if its bb
dominates all exit blocks.

Signed-off-by: Anton Shamin <anton.shamin@intel.com>

(cherry picked from commit f89381fed12faf96c45a83a989ae2fff82c05f3b)

BUG=29145171

Change-Id: Ia5664dedb1543d78a7b4038801b8372572f069f6
f89381fed12faf96c45a83a989ae2fff82c05f3b 16-May-2016 Anton Shamin <anton.shamin@intel.com> ART: ArrayGet hoisting restriction added.

Currently if we hoist ArrayGet from loop there is no guarantee
that insn will be executed at runtime. Because of that we could
face issues like crashes in generated code.

This patch introduces restriction for ArrayGet hoisting. We say
that ArrayGet execution is guaranteed at least one time if its bb
dominates all exit blocks.

Change-Id: I9f72c0f4c33b358341109238bea46cb5a82f490f
Signed-off-by: Anton Shamin <anton.shamin@intel.com>
372f10e5b0b34e2bb6e2b79aeba6c441e14afd1f 17-May-2016 Vladimir Marko <vmarko@google.com> Refactor handling of input records.

Introduce HInstruction::GetInputRecords(), a new virtual
function that returns an ArrayRef<> to all input records.
Implement all other functions dealing with input records as
wrappers around GetInputRecords(). Rewrite functions that
previously used multiple virtual calls to deal with input
records, especially in loops, to prefetch the ArrayRef<>
only once for each instruction. Besides avoiding all the
extra calls, this also allows the compiler (clang++) to
perform additional optimizations.

This speeds up the Nexus 5 boot image compilation by ~0.5s
(4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT.

Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
d7c2fdc939bb7efb3e7204d62e54c6a3f7d77f9b 10-May-2016 Nicolas Geoffray <ngeoffray@google.com> Fix another case of live_in at irreducible loop entry.

GVN was implicitly extending the liveness of an instruction across
an irreducible loop.

Fix this problem by clearing the value set at loop entries that contain
an irreducible loop.

bug:28252896

(cherry picked from commit 77ce6430af2709432b22344ed656edd8ec80581b)

Change-Id: Ie0121e83b2dfe47bcd184b90a69c0194d13fce54
77ce6430af2709432b22344ed656edd8ec80581b 10-May-2016 Nicolas Geoffray <ngeoffray@google.com> Fix another case of live_in at irreducible loop entry.

GVN was implicitly extending the liveness of an instruction across
an irreducible loop.

Fix this problem by clearing the value set at loop entries that contain
an irreducible loop.

bug:28252896
Change-Id: I68823cb88dceb4c2b4545286ba54fd0c958a48b0
a26b3c51bfd97be1100d267f20c46535913e6bb7 09-May-2016 Vladimir Marko <vmarko@google.com> Attribute arena allocations previously marked as STL.

Bug: 28603175
Bug: 28684584

(cherry picked from commit 3ea5a97d27468cec846d958c38d0d706ef7ec67e)

Change-Id: I7f1bd22e7710cca74f4b10fd13cb8fa2c3b1b318
7e589feab1b35203fbb8c431213f1d2b2a4ad530 06-May-2016 David Brazdil <dbrazdil@google.com> ART: Fix dominance for irreducible loops

Computation of dominance was broken in the presence of irreducible
loops because the algorithm assumed that back edges are always
dominated by their respective headers and a fix-point iteration is
therefore unnecessary.

This is not true for irreducible loops, forcing us to revisit their
loop headers and all dependent blocks. This patch adds a fix-point
iteration if a back edge not dominated by its header is found.

Bug: 28611485
Change-Id: If84044e49d5b9c682949648033d2861628d7fe05
(cherry picked from commit 3f4a522cc39f5c651e7c718196e989bc81d8c6ef)
bf12e4d4209ac4e8fb98b4fd5193208adc7fe3ff 05-Apr-2016 Vladimir Marko <vmarko@google.com> Optimizing: LoadString may not have any side effects.

LoadString does not have any side effects if the string is
known to be in the dex cache or it's a boot image string
referenced directly, as specified by the string load kind.
We need to clear the side effects for these cases to avoid
a DCHECK() failure when such LoadString instruction ends up
between a ClinitCheck and an instruction to which we want to
merge that ClinitCheck. This may happen as a consequence of
inlining, LICM and DCE as shown by a regression test.

Bug: 27929914

(cherry picked from commit ace7a000a433ce4ecf94f30adea39c01a76fa936)

Change-Id: Iaf9c63b6e58aae1e246b43ca52eea0b47a6ad565
3ea5a97d27468cec846d958c38d0d706ef7ec67e 09-May-2016 Vladimir Marko <vmarko@google.com> Attribute arena allocations previously marked as STL.

Bug: 28603175
Change-Id: I488e39b23afb86f3ff5a2df16d2df59eb3adaa0f
e4a97f926882ffb12707655a755a1c8f96620f91 05-May-2016 David Brazdil <dbrazdil@google.com> Stop populating irreducible loop at header

Recent CL modified the (previously exponential) algorithm for
populating irreducible loops so as to not visit blocks multiple times.
However, the new condition for an early exit did not take into account
that it is not necessary to visit predecessors of the loop header.

The algorithm remains correct but does unnecessary work.

Bug: 27856014
Change-Id: If72f8be82e79838f0dd9678082a3bbaabb51e43b
(cherry picked from commit 5a620590e5cf6d6817693edffd661371555de88b)
3f4a522cc39f5c651e7c718196e989bc81d8c6ef 06-May-2016 David Brazdil <dbrazdil@google.com> ART: Fix dominance for irreducible loops

Computation of dominance was broken in the presence of irreducible
loops because the algorithm assumed that back edges are always
dominated by their respective headers and a fix-point iteration is
therefore unnecessary.

This is not true for irreducible loops, forcing us to revisit their
loop headers and all dependent blocks. This patch adds a fix-point
iteration if a back edge not dominated by its header is found.

Bug: 28611485
Change-Id: If84044e49d5b9c682949648033d2861628d7fe05
5a620590e5cf6d6817693edffd661371555de88b 05-May-2016 David Brazdil <dbrazdil@google.com> Stop populating irreducible loop at header

Recent CL modified the (previously exponential) algorithm for
populating irreducible loops so as to not visit blocks multiple times.
However, the new condition for an early exit did not take into account
that it is not necessary to visit predecessors of the loop header.

The algorithm remains correct but does unnecessary work.

Bug: 27856014
Change-Id: If72f8be82e79838f0dd9678082a3bbaabb51e43b
d1d7c40c8004303d1131ebb1956fd0ade54f8404 25-Apr-2016 Aart Bik <ajcbik@google.com> Test component type for errors too.
With regression test.

Rationale:
Moved erroneous check in convenience method, so we
put all the same logic in one place. When testing
for erroneous T[], check both the array type
as well at the component type T for errors
(it is possible T[] is not marked erroneous
even though T is eventually).

BUG=28358598

(cherry picked from commit f417ff44d1eb111854d7a213f106912b3dd9e3d4)

Change-Id: Ieba66aa4b55d8e7ebddf200239c7e4095dfd4678
f417ff44d1eb111854d7a213f106912b3dd9e3d4 25-Apr-2016 Aart Bik <ajcbik@google.com> Test component type for errors too.
With regression test.

Rationale:
Moved erroneous check in convenience method, so we
put all the same logic in one place. When testing
for erroneous T[], check both the array type
as well at the component type T for errors
(it is possible T[] is not marked erroneous
even though T is eventually).

BUG=28358598

Change-Id: I11339a976dc83e0493a99e6bb97f3a058ca3f796
9ef26544e6d9c438b150dfda39be61a7b4fd8b20 20-Apr-2016 Vladimir Marko <vmarko@google.com> Fix HInstruction::ReplaceInput(), allow no-op.

Allow HInstruction::ReplaceInput() to be called with
a `replacement` being the same as the old input and
do nothing in that case.

This is a follow-up to
https://android-review.googlesource.com/216923
where I erroneously assumed that it never happens.

Also adhere to the standard C++ std::forward_list<>
semantics in the single-element overload of
`IntrusiveForwardList<>::splice_after()`.

Bug: 28173563

(cherry picked from commit c6b5627c25ff5653e97ccff8c5ccf6ac967b6f83)

Change-Id: I66b7d4b48f629284d0bcb1d31100519e02a872e5
c6b5627c25ff5653e97ccff8c5ccf6ac967b6f83 20-Apr-2016 Vladimir Marko <vmarko@google.com> Fix HInstruction::ReplaceInput(), allow no-op.

Allow HInstruction::ReplaceInput() to be called with
a `replacement` being the same as the old input and
do nothing in that case.

This is a follow-up to
https://android-review.googlesource.com/216923
where I erroneously assumed that it never happens.

Also adhere to the standard C++ std::forward_list<>
semantics in the single-element overload of
`IntrusiveForwardList<>::splice_after()`.

Bug: 28173563
Change-Id: I5cea14c212b1083f90ffe6b5b53324ad663d57d8
fa7f58935491fad5cb8e5af9ee38d1a4f73e872d 19-Apr-2016 Vladimir Marko <vmarko@google.com> Reuse HUseListNode<>s when replacing instruction or input.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~5.6MiB:

Before:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...
After:
MEM: used: 38554536, allocated: 39463008, lost: 908472
Number of arenas allocated: 274,
Number of allocations: 572221, avg size: 67
...
UseListNode 4469976
...

With 32-bit dex2oat, the UseListNode would be 2/3 of the
values for 64-bit dex2oat (both before and after).

Bug: 28173563
Bug: 27856014

(cherry picked from commit 3c19d3e029a9fcc123d2c6fd1e5e13867d2cfe1f)

Change-Id: Iddd42d7545e4f97a13da63590b33711a5bbdd43b
d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede 29-Mar-2016 Vladimir Marko <vmarko@google.com> Use iterators "before" the use node in HUserRecord<>.

Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.

Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:

Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...

Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.

Bug: 28173563
Bug: 27856014

(cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c)

Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
7de2439d357710aaf8bfe02b8cf9c196d3c77705 06-Apr-2016 Aart Bik <ajcbik@google.com> Avoid constructing types with errors.

BUG=27626735

Rationale:
Do not construct classes with a link error. Without this,
the error type thought it was Object (mirror's method
IsObjectClass() returns true if there is no superclass).

(cherry picked from commit 8b3f9b246d5bdbf67faeb2b872b75b8d72777bc0)
(also contains follow-up commit 31244b4cde9156632a08103a8bf1cbff4cbae3cc)

Change-Id: I4443779dda47c320115975c1c71b22e118bd8252
3c19d3e029a9fcc123d2c6fd1e5e13867d2cfe1f 19-Apr-2016 Vladimir Marko <vmarko@google.com> Reuse HUseListNode<>s when replacing instruction or input.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~5.6MiB:

Before:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...
After:
MEM: used: 38554536, allocated: 39463008, lost: 908472
Number of arenas allocated: 274,
Number of allocations: 572221, avg size: 67
...
UseListNode 4469976
...

With 32-bit dex2oat, the UseListNode would be 2/3 of the
values for 64-bit dex2oat (both before and after).

Bug: 28173563
Change-Id: Ia4fabe03568f0e0dbf2cdf2b031863602aea3530
46817b876ab00d6b78905b80ed12b4344c522b6c 29-Mar-2016 Vladimir Marko <vmarko@google.com> Use iterators "before" the use node in HUserRecord<>.

Create a new template class IntrusiveForwardList<> that
mimicks std::forward_list<> except that all allocations
are handled externally. This is essentially the same as
boost::intrusive::slist<> but since we're not using Boost
we have to reinvent the wheel.

Use the new container to replace the HUseList and use the
iterators to "before" use nodes in HUserRecord<> to avoid
the extra pointer to the previous node which was used
exclusively for removing nodes from the list. This reduces
the size of the HUseListNode by 25%, 32B to 24B in 64-bit
compiler, 16B to 12B in 32-bit compiler. This translates
directly to overall memory savings for the 64-bit compiler
but due to rounding up of the arena allocations to 8B, we
do not get any improvement in the 32-bit compiler.

Compiling the Nexus 5 boot image with the 64-bit dex2oat
on host this CL reduces the memory used for compiling the
most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB:

Before:
MEM: used: 47829200, allocated: 48769120, lost: 939920
Number of arenas allocated: 345,
Number of allocations: 815492, avg size: 58
...
UseListNode 13744640
...
After:
MEM: used: 44393040, allocated: 45361248, lost: 968208
Number of arenas allocated: 319,
Number of allocations: 815492, avg size: 54
...
UseListNode 10308480
...

Note that while we do not ship the 64-bit dex2oat to the
device, the JIT compilation for 64-bit processes is using
the 64-bit libart-compiler.

Bug: 28173563
Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
3563c44464ca55b2106373b35110e5ecaae38abf 18-Apr-2016 Vladimir Marko <vmarko@google.com> Fix inlining loops in OSR mode.

When compiling a method in OSR mode and the method does not
contain a loop (arguably, a very odd case) but we inline
another method with a loop and then the final DCE re-runs
the loop identification, the inlined loop would previously
be marked as irreducible. However, the SSA liveness analysis
expects irreducible loop to have extra loop Phis which were
already eliminated from the loop before the inner graph was
inlined to the outer graph, so we would fail a DCHECK().

We fix this by not marking inlined loops as irreducible when
compiling in OSR mode.

Bug: 28210356

(cherry picked from commit fd66c50d64c38e40bafde83b4872e27bbff7546d)

Change-Id: I149273b766d1c713c571baad6033c5f70e6dd960
fd66c50d64c38e40bafde83b4872e27bbff7546d 18-Apr-2016 Vladimir Marko <vmarko@google.com> Fix inlining loops in OSR mode.

When compiling a method in OSR mode and the method does not
contain a loop (arguably, a very odd case) but we inline
another method with a loop and then the final DCE re-runs
the loop identification, the inlined loop would previously
be marked as irreducible. However, the SSA liveness analysis
expects irreducible loop to have extra loop Phis which were
already eliminated from the loop before the inner graph was
inlined to the outer graph, so we would fail a DCHECK().

We fix this by not marking inlined loops as irreducible when
compiling in OSR mode.

Bug: 28210356
Change-Id: If10057ed883333c62a878ed2ae3fe01bb5280e33
8b3f9b246d5bdbf67faeb2b872b75b8d72777bc0 06-Apr-2016 Aart Bik <ajcbik@google.com> Avoid constructing types with errors.

BUG=27626735

Rationale:
Do not construct classes with a link error. Without this,
the error type thought it was Object (mirror's method
IsObjectClass() returns true if there is no superclass).

Change-Id: I55ca8cc8cfc042210edf748aab10da4c6e345980
c2e8af9659db7e456b26febb1b971900057ad427 05-Apr-2016 David Brazdil <dbrazdil@google.com> ART: Speed up HGraph::PopulateIrreducibleRecursive

Populating an irreducible loop can potentially traverse all possible
paths through the HGraph, leading to an exponential algorithm.
This patch adds a bit vector of nodes whose membership in the loop
has been decided and need not be revisited again.

Bug: 27856014
Change-Id: I3696f08c846e6f40e5de44cb771811bac7e3e08a
dee58d6bb6d567fcd0c4f39d8d690c3acaf0e432 07-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals""

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

This CL fixed an issue with parsing quickened instructions.

Bug: 27894376
Bug: 27998571
Bug: 27995065

Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
ace7a000a433ce4ecf94f30adea39c01a76fa936 05-Apr-2016 Vladimir Marko <vmarko@google.com> Optimizing: LoadString may not have any side effects.

LoadString does not have any side effects if the string is
known to be in the dex cache or it's a boot image string
referenced directly, as specified by the string load kind.
We need to clear the side effects for these cases to avoid
a DCHECK() failure when such LoadString instruction ends up
between a ClinitCheck and an instruction to which we want to
merge that ClinitCheck. This may happen as a consequence of
inlining, LICM and DCE as shown by a regression test.

Bug: 27929914
Change-Id: I7b3bddf7d8c79ce1828a4a751f1270cf2e3d61f0
60328910cad396589474f8513391ba733d19390b 04-Apr-2016 David Brazdil <dbrazdil@google.com> Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals"

Bug: 27995065
This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f.

Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
e3ff7b293be2a6791fe9d135d660c0cffe4bd73f 02-Mar-2016 David Brazdil <dbrazdil@google.com> Refactor HGraphBuilder and SsaBuilder to remove HLocals

This patch merges the instruction-building phases from HGraphBuilder
and SsaBuilder into a single HInstructionBuilder class. As a result,
it is not necessary to generate HLocal, HLoadLocal and HStoreLocal
instructions any more, as the builder produces SSA form directly.

Saves 5-15% of arena-allocated memory (see bug for more data):
GMS 20.46MB => 19.26MB (-5.86%)
Maps 24.12MB => 21.47MB (-10.98%)
YouTube 28.60MB => 26.01MB (-9.05%)

Bug: 27894376
Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
86ea7eeabe30c98bbe1651a51d03cb89776724e7 16-Feb-2016 David Brazdil <dbrazdil@google.com> Build dominator tree before generating HInstructions

Second CL in the series of merging HGraphBuilder and SsaBuilder. This
patch refactors the builders so that dominator tree can be built
before any HInstructions are generated. This puts the SsaBuilder
removal of HLoadLocals/HStoreLocals straight after HGraphBuilder's
HInstruction generation phase. Next CL will therefore be able to
merge them.

This patch also adds util classes for iterating bytecode and switch
tables which allowed to simplify the code.

Bug: 27894376
Change-Id: Ic425d298b2e6e7980481ed697230b1a0b7904526
cac5a7e871f1f346b317894359ad06fa7bd67fba 22-Feb-2016 Vladimir Marko <vmarko@google.com> Optimizing: Improve const-string code generation.

For strings in the boot image, use either direct pointers
or pc-relative addresses. For other strings, use PC-relative
access to the dex cache arrays for AOT and direct address of
the string's dex cache slot for JIT.

For aosp_flounder-userdebug:
- 32-bit boot.oat: -692KiB (-0.9%)
- 64-bit boot.oat: -948KiB (-1.1%)
- 32-bit dalvik cache total: -900KiB (-0.9%)
- 64-bit dalvik cache total: -3672KiB (-1.5%)
(contains more files than the 32-bit dalvik cache)
For aosp_flounder-userdebug forced to compile PIC:
- 32-bit boot.oat: -380KiB (-0.5%)
- 64-bit boot.oat: -928KiB (-1.0%)
- 32-bit dalvik cache total: -468KiB (-0.4%)
- 64-bit dalvik cache total: -1928KiB (-0.8%)
(contains more files than the 32-bit dalvik cache)

Bug: 26884697
Change-Id: Iec7266ce67e6fedc107be78fab2e742a8dab2696
9eeebf646b118269b3f5bd6fc6c1b0a58c26c6d4 24-Mar-2016 David Brazdil <dbrazdil@google.com> ART: Fix order of operations in HBasicBlock::DisconnectAndDelete

The method would remove predecessors before successors. As a result,
instructions used by dead loop phis would see dangling uses, causing
a DCHECK to fail.

Steps were reordered to remove dependencies in post order.

Bug: 27683071
Change-Id: I8e0e976443fb410908321a065276f1340b757c41
c9b21f83c2eab8bdd16241992193c3049dc68e43 23-Mar-2016 Roland Levillain <rpl@google.com> Fix some typos in art/compiler/optimizing/nodes.cc.

Change-Id: I11be5a9b73da207c9eb497bcaffc49d614c1ca89
f6a35de9eeefb20f6446f1b4815b4dcb0161d09c 21-Mar-2016 Vladimir Marko <vmarko@google.com> Optimizing: Fix register allocator validation memory usage.

Also attribute ArenaBitVector allocations to appropriate
passes. This was used to track down the source of the
excessive memory alloactions.

Bug: 27690481

Change-Id: Ib895984cb7c04e24cbc7abbd8322079bab8ab100
1a65388f1d86bb232c2e44fecb44cebe13105d2e 18-Mar-2016 Roland Levillain <rpl@google.com> Clean up art::HConstant predicates.

- Make the difference between arithmetic zero and zero-bit
pattern non ambiguous.
- Introduce Boolean predicates in art::HIntConstant for when
they are used as Booleans.
- Introduce aritmetic positive and negative zero predicates
for floating-point constants.

Bug: 27639313
Change-Id: Ia04ecc6f6aa7450136028c5362ed429760c883bd
18401b748a3180f52e42547ede22d1b184fe8c43 11-Mar-2016 Nicolas Geoffray <ngeoffray@google.com> Fix invariant in reference type propagation.

Also some cleanups.

Change-Id: I7f0ec7d06b4bab10dbfa230c757447d311658f93
7ba9966b76bbf818513018fa1da72c89330fe384 02-Mar-2016 Serguei Katkov <serguei.i.katkov@intel.com> ART: cleanup exit_block_ in graph if exit block is removed

If we remove the exit block from the graph (for example method
contains infinite loop) we should also clean the field exit_block_
in graph. At least inliner expects it.

Change-Id: Icda668da2233cdd6cd673635a1949f5ed34cf270
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
3f52306b259caed1c654c4b3fd5b594d5ec8d46c 29-Feb-2016 David Brazdil <dbrazdil@google.com> ART: Fix overlapping instruction IDs in inliner

Inliner creates the inner graph so that it generates instruction IDs
higher than the outer graph. This was broken because the inliner
would create instructions in the outer graph before the inner graph
is inlined.

The bug cannot be triggered because the offending instruction would
share the same ID as the first inner HLocal, which is removed before
the inner graph is inlined. The added DCHECKs reveal the hidden problem
and make it safe for HLocals to be removed in the future.

Change-Id: I486eb0f3987e20c50cbec0fb06332229e07fbae9
d65925108fc99607f3b447d8505fe6713acda55c 29-Feb-2016 Nicolas Geoffray <ngeoffray@google.com> Bug fix for polymorphic inlining.

The code used to wrongly propagate try/catch information to
new blocks. Since it has the same logic as Hraph::InlineInto,
extract the code that updates loop and try/catch information
to blocks to a shared method.

bug:27330865
bug:27372101
bug:27360329

(cherry picked from commit a1d8ddfaf09545f99bc326dff97ab604d4574eb6)

Change-Id: Ice0373ec0a1c24d78121634a377f6f502e814cfb
a1d8ddfaf09545f99bc326dff97ab604d4574eb6 29-Feb-2016 Nicolas Geoffray <ngeoffray@google.com> Bug fix for polymorphic inlining.

The code used to wrongly propagate try/catch information to
new blocks. Since it has the same logic as Hraph::InlineInto,
extract the code that updates loop and try/catch information
to blocks to a shared method.

bug:27330865
bug:27372101
bug:27360329

Change-Id: I4386f724d8d412bde5bcc04fda6955bc3bacf5a9
a1de9188a05afdecca8cd04ecc4fefbac8b9880f 25-Feb-2016 Vladimir Marko <vmarko@google.com> Optimizing: Reduce memory usage of HInstructions.

Pack narrow fields and flags into a single 32-bit field.

Change-Id: Ib2f7abf987caee0339018d21f0d498f8db63542d
e53bd8160ad2892f33849108d3b1099992a311fd 24-Feb-2016 Roland Levillain <rpl@google.com> Remove unreachable code paths in constant folding.

Change-Id: I7ffb361711c87f6b1b98d172d2cfdf9b2ba65607
916cc1d504f10a24f43b384e035fdecbe6a74b4c 18-Feb-2016 Nicolas Geoffray <ngeoffray@google.com> Implement polymorphic inlining.

For example, before:
HInvokeVirtual

After:
If (receiver == Foo) {
// inlined code.
} else if (receiver == Bar) {
// inlined code
} else {
// HInvokeVirtual or HDeoptimize(receiver != Baz)
}

Change-Id: I5ce305aef8f39f8294bf2b2bcfe60e0dddcfdbec
31dd3d60491148d345c1edae1ccd090a1b67dd2b 16-Feb-2016 Roland Levillain <rpl@google.com> Extend constant folding to float and double operations.

Change-Id: I2837064b2ceea587bc171fc520507f13355292c6
55bd749991f9a0a73f612696e1a93e739380546b 16-Feb-2016 Nicolas Geoffray <ngeoffray@google.com> Refactor the inliner.

In preparation for more polymorphic inlining, refactor the inliner
a bit.

Change-Id: Ie3fd6c1ef205f1089989c67a527e6f57ff3c8b5d
badd826664896d4a9628a5a89b78016894aa414b 02-Feb-2016 David Brazdil <dbrazdil@google.com> ART: Run SsaBuilder from HGraphBuilder

First step towards merging the two passes, which will later result in
HGraphBuilder directly producing SSA form. This CL mostly just updates
tests broken by not being able to inspect the pre-SSA form.

Using HLocals outside the HGraphBuilder is now deprecated.

Bug: 27150508
Change-Id: I00fb6050580f409dcc5aa5b5aa3a536d6e8d759e
86503785cd6414b8692e5c83cadaa2972b6a099b 11-Feb-2016 Roland Levillain <rpl@google.com> Fix x86-64 Baker's read barrier fast path for CheckCast.

Use an art::x86_64::Label instead of an
art::x86_64::NearLabel as end label when emitting code for a
HCheckCast instruction, as the range of the latter may
sometimes be too short when Baker's read barriers are
enabled.

Bug: 12687968
Change-Id: Ia9742dce65be7d4fb104688f3c4717b65df1fb54
b331febbab8e916680faba722cc84b66b84218a3 05-Feb-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Implement on-stack replacement for arm/arm64/x86/x86_64.""

This reverts commit bd89a5c556324062b7d841843b039392e84cfaf4.

Change-Id: I08d190431520baa7fcec8fbdb444519f25ac8d44
bd89a5c556324062b7d841843b039392e84cfaf4 05-Feb-2016 David Brazdil <dbrazdil@google.com> Revert "Implement on-stack replacement for arm/arm64/x86/x86_64."

DCHECK whether loop headers are covered fails.

This reverts commit 891bc286963892ed96134ca1adb7822737af9710.

Change-Id: I0f9a90630b014b16d20ba1dfba31ce63e6648021
891bc286963892ed96134ca1adb7822737af9710 29-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Implement on-stack replacement for arm/arm64/x86/x86_64.

High-level overview:
- osr_method_threshold is used to know when to compile a method
in osr mode (-> treat all loops as irreducible).
- branch instructions in the compiler query whether they can
jump to an osr method.
- An osr entry point is found through the stack maps: if a stack
map is duplicated in the CodeInfo, it is an osr entry point.

Change-Id: Ifb39338cd281e2c7eccce67f4e18d46428be71e4
3f1a8be7c9511afbc1ea0ce2e76a018269382336 03-Feb-2016 Aart Bik <ajcbik@google.com> Fixed bug on premature DCHECK.
With fail-before/pass-after test

bug=26947011

Rationale:
During BCE, the phi structure is under construction,
to be fixed by InsertPhiNodes() and carefully checked
with the SSA checker. So utilities should not overly
DCHECK on SSA consistency during the modifications.

Change-Id: Ia9df9ee5aac0c1dd2c3e3a447c730246d5e48bbb
74eb1b264691c4eb399d0858015a7fc13c476ac6 14-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Implement HSelect

This patch adds a new HIR instruction to Optimizing. HSelect returns
one of two inputs based on the outcome of a condition.

This is only initial implementation which:
- defines the new instruction,
- repurposes BooleanSimplifier to emit it,
- extends InstructionSimplifier to statically resolve it,
- updates existing code and tests accordingly.

Code generators currently emit fallback if/then/else code and will be
updated in follow-up CLs to use platform-specific conditional moves
when possible.

Change-Id: Ib61b17146487ebe6b55350c2b589f0b971dcaaee
09e8d5ffe52c738c6a74984b1cbc7ad4bc8f5e2c 23-Jan-2016 Aart Bik <ajcbik@google.com> Some minor simplifications in code and tests.

Background:
This is actually a resubmit of an earlier cl that was
reverted because was test was less robust against
inlining changes (it assumed a virtual call would
never be inlined).

original cl: If8ada79dfd70bea991c11d2b18661b951b6c4cd4
revert cl: I739aaaccd0509d02a62ef01e797a6d45bfe941df

Change-Id: I952680d60ff488874907f066bfdf156a45b409ba
788f2f05c3e5b0e5bda247b00e34f0094585546f 22-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Inline methods with loops.""

Bug: 26689526

This reverts commit 451ad8d1be9a1949ea3c3e3a713a9e76198a8b2d.

Change-Id: If484fe4c0744254dd7568fd5006e574d621a1855
69fd1b56021ac62c17e188bd0d4dd22fc911558e 22-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "Some minor simplifications in code and tests."

Fails 530-checker-loops on arm

This reverts commit bf03fcd10a3ffa15468d335f26697b0473e45b36.

Change-Id: I739aaaccd0509d02a62ef01e797a6d45bfe941df
bf03fcd10a3ffa15468d335f26697b0473e45b36 04-Jan-2016 Aart Bik <ajcbik@google.com> Some minor simplifications in code and tests.

Rationale: fell through the cracks of previous "intrinsics" CL.

Change-Id: If8ada79dfd70bea991c11d2b18661b951b6c4cd4
a0ee77157ee6ceb72292c20bc299c1d24fe95a39 20-Jan-2016 Andreas Gampe <agampe@google.com> Revert "Inline methods with loops."

This reverts commit 82fc9bb45dbf8ff728122fb7ab72d1eb7b2f4869.

Loop inlining exposes issues with BCE.

Bug: 26689526

(cherry picked from commit 451ad8d1be9a1949ea3c3e3a713a9e76198a8b2d)

Change-Id: Ie6f260e6a224aeb7f5ed93df378b6cefba10d35f
451ad8d1be9a1949ea3c3e3a713a9e76198a8b2d 20-Jan-2016 Andreas Gampe <agampe@google.com> Revert "Inline methods with loops."

This reverts commit 82fc9bb45dbf8ff728122fb7ab72d1eb7b2f4869.

Loop inlining exposes issues with BCE.

Bug: 26689526
Change-Id: Id9983d7f9d3c5579d91e56e4699d4d939517b2dc
82fc9bb45dbf8ff728122fb7ab72d1eb7b2f4869 19-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Inline methods with loops.

Compiling Gms/Fb/Framework/Docs:
- Overall compilation-time increase: 2.2%
- Overall code size increase: 1.1%

Performance improvements:
- Richards with jit: +6%
- Takl: +11%

Change-Id: I0a6fcf2a360e5ad193cd95b5c4fe92227ac6bd96
09aa147f0891ef28a95d89e8ad61c429f82ddd5b 19-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Disable DCE when there are irreducible loops.

Also ensure an instruction that requires an environment
does have one.

Change-Id: I41a8460e05ef320f872197d3be7847e7ffaa6ee8
6de1938e562b0d06e462512dd806166e754035ea 08-Jan-2016 David Brazdil <dbrazdil@google.com> ART: Remove incorrect HFakeString optimization

Simplification of HFakeString assumes that it cannot be used until
String.<init> is called which is not true and causes different
behaviour between the compiler and the interpreter. This patch
removes the optimization together with the HFakeString instruction.

Instead, HNewInstance is generated and an empty String allocated
until it is replaced with the result of the StringFactory call. This
is consistent with the behaviour of the interpreter but is too
conservative. A follow-up CL will attempt to optimize out the initial
allocation when possible.

Bug: 26457745
Bug: 26486014

Change-Id: I7139e37ed00a880715bfc234896a930fde670c44
15bd22849ee6a1ffb3fb3630f686c2870bdf1bbc 05-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Implement irreducible loop support in optimizing.

So we don't fallback to the interpreter in the presence of
irreducible loops.

Implications:
- A loop pre-header does not necessarily dominate a loop header.
- Non-constant redundant phis will be kept in loop headers, to
satisfy our linear scan register allocation algorithm.
- while-graph optimizations, such as gvn, licm, lse, and dce
need to know when they are dealing with irreducible loops.

Change-Id: I2cea8934ce0b40162d215353497c7f77d6c9137e
a3eca2d7300f35c66cf4b696d788a8b7ba74eb99 12-Jan-2016 Nicolas Geoffray <ngeoffray@google.com> Do not leave intermediate addresses across Java calls.

bug:26472446
Change-Id: Ie4a9b5fe6f1d61a76c71eceaa2299fe55512c612
c928591f5b2c544751bb3fb26dc614d3c2e67bef 18-Dec-2015 Roland Levillain <rpl@google.com> ARM Baker's read barrier fast path implementation.

Introduce an ARM fast path implementation in Optimizing for
Baker's read barriers (for both heap reference loads and GC
root loads). The marking phase of the read barrier is
performed by a slow path, invoking the runtime entry point
artReadBarrierMark.

Other read barrier algorithms continue to use the original
slow path based implementation, which has been renamed as
GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow.

Bug: 12687968
Change-Id: Ie7ee85b1b4c0564148270cebdd3cbd4c3da51b3a
f555258861aea7df8af9c2241ab761227fd2f66a 27-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Create BoundType for CheckCast early

ReferenceTypePropagation creates a BoundType for each CheckCast and
replaces all dominated uses of the casted object with it. This does
not include Phi uses on the boundary of the dominated scope, reducing
typing precision. This patch creates the BoundType in Builder, causing
SsaBuilder to replace uses of the object automatically.

Bug: 26081304

Change-Id: I083979155cccb348071ff58cb9060a896ed7d2ac
4833f5a1990c76bc2be89504225fb13cca22bedf 16-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Refactor SsaBuilder for more precise typing info

This reverts commit 68289a531484d26214e09f1eadd9833531a3bc3c.

Now uses Primitive::Is64BitType instead of Primitive::ComponentSize
because it was incorrectly optimized by GCC.

Bug: 26208284
Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318

Change-Id: Ib39f3da2b92bc5be5d76f4240a77567d82c6bebe
5d75afe333f57546786686d9bee16b52f1bbe971 14-Dec-2015 Aart Bik <ajcbik@google.com> Improved side-effects/can-throw information on intrinsics.

Rationale: improved side effect and exception analysis gives
many more opportunities for GVN/LICM/BCE.

Change-Id: I8aa9b757d77c7bd9d58271204a657c2c525195b5
5f7b58ea1adfc0639dd605b65f59198d3763f801 23-Nov-2015 Vladimir Marko <vmarko@google.com> Rewrite HInstruction::Is/As<type>().

Make Is<type>() and As<type>() non-virtual for concrete
instruction types, relying on GetKind(), and mark GetKind()
as PURE to improve optimization opportunities. This reduces
the number of relocations in libart-compiler.so's .rel.dyn
section by ~4K, or ~44%, and in .data.rel.ro by ~18K, or
~65%. The file is 96KiB smaller for Nexus 5, including 8KiB
reduction of the .text section.

Unfortunately, the g++/clang++ __attribute__((pure)) is not
strong enough to avoid duplicated virtual calls and we would
need the C++ [[pure]] attribute proposed in n3744 instead.
To work around this deficiency, we introduce an extra
non-virtual indirection for GetKind(), so that the compiler
can optimize common expressions such as
instruction->IsAdd() || instruction->IsSub()
or
instruction->IsAdd() && instruction->AsAdd()->...
which contain two virtual calls to GetKind() after inlining.

Change-Id: I83787de0671a5cb9f5b0a5f4a536cef239d5b401
68289a531484d26214e09f1eadd9833531a3bc3c 16-Dec-2015 Alex Light <allight@google.com> Revert "ART: Refactor SsaBuilder for more precise typing info"

This reverts commit d9510dfc32349eeb4f2145c801f7ba1d5bccfb12.

Bug: 26208284

Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318

Change-Id: I5f491becdf076ff51d437d490405ec4e1586c010
d9510dfc32349eeb4f2145c801f7ba1d5bccfb12 05-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Refactor SsaBuilder for more precise typing info

This patch refactors the SsaBuilder to do the following:

1) All phis are constructed live and marked dead if not used or proved
to be conflicting.

2) Primitive type propagation, now not a separate pass, identifies
conflicting types and marks corresponding phis dead.

3) When compiling --debuggable, DeadPhiHandling used to revive phis
which had only environmental uses but did not attempt to resolve
conflicts. This pass was removed as obsolete and is now superseded
by primitive type propagation (identifying conflicting phis) and
SsaDeadPhiEliminiation (keeping phis live if debuggable + env use).

4) Resolving conflicts requires correct primitive type information
on all instructions. This was not the case for ArrayGet instructions
which can have ambiguous types in the bytecode. To this end,
SsaBuilder now runs reference type propagation and types ArrayGets
from the type of the input array.

5) With RTP being run inside the SsaBuilder, it is not necessary to
run it as a separate optimization pass. Optimizations can now assume
that all instructions of type kPrimNot have reference type info after
SsaBuilder (with the exception of NullConstant).

6) Graph now contains a reference type to be assigned to NullConstant.
All reference type instructions therefore have RTI, as now enforced
by the SsaChecker.

Bug: 24252151
Bug: 24252100
Bug: 22538329
Bug: 25786318

Change-Id: I7a3aee1ff66c82d64b4846611c547af17e91d260
04ff4e8463ac68752638d305eeb84b457fd8289c 10-Dec-2015 David Brazdil <dbrazdil@google.com> ART: Fix bug in DCE not removing phis from catch phi uses

Due to the missing edges between throwing instructions and catch phis
DCE needs to manually remove dead instructions from catch phi users,
being overly conservative if the inputs are not in the dead blocks.
DCE used to do this for normal instructions, but it needs to do the
same for phis.

Change-Id: I7edfcb84ec6ff7303945d5d5cd436b1d1e95df2a
4b467ed97bc5886fb800209c0ee94df10163b88d 20-Nov-2015 Mingyao Yang <mingyao@google.com> Simplify and rename IsLoopInvariant() test.

Simplify IsLoopInvariant() test. Also rename it to IsDefinedOutOfTheLoop()
so there is no ambiguity for example whether a instruction after the loop counts
as a loop invariant. It's up to the caller to make the interpretation.

Change-Id: I999139032b0e4d815dd1e2276f2bd428cf558686
f64242a30c6e05a8e4302a64eab4bcc28297dc9e 01-Dec-2015 Vladimir Marko <vmarko@google.com> Optimizing: Add checker tests for sharpening.

This is a follow-up to
https://android-review.googlesource.com/184116 .

Change-Id: Ib03c424fb673afc5ccce15d7d072b7572b47799a
fb337ea53d1e6fe68b24217a9ea95e9f544ef697 25-Nov-2015 Vladimir Marko <vmarko@google.com> Move PC-relative addressing bases to a better position.

Move the platform-specific HX86ComputeBaseMethodAddress and
HArmDexCacheArraysBase to the latest dominator of their uses
outside any loop. This brings the base closer to the first
use (previously, it was in the entry block) and relieves
some pressure on the register allocator while avoiding
recalculation of the base in a loop.

Change-Id: I231aa81eb5b4de9af2d0167054d06b65eb18a636
5c0048565e78ff53fd2b3a2e446c72ea2fffe239 23-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Fix uninitialized variable

Change-Id: I906de334b3c3cb1e36eff4944457f4598b7c174f
3fc7f357170311689c4c31007a5e168ddea321d5 21-Nov-2015 Aart Bik <ajcbik@google.com> Accept synthetic phi nodes and general names for blocks.

Rationale: these changes were already approved as part of the dynamic
bce changes, but I am now sending them out separately.

Change-Id: I3564bac9f6a0b6a89466457836ff54ad09164faf
d59c70627cc42878cc30b46bd29ff497b4483b22 21-Nov-2015 Aart Bik <ajcbik@google.com> Revert "Dynamic BCE (based on induction range analysis)"

This reverts commit 0b5849be045c5683d4a6b6b6c306abadba5f0fcc.


Change-Id: Id33f5da42bbdfb1aff7e2281417c8a7aa492df05
Rationale: so close :-( but bullhead-userdebug (linux) build in git_mnc-dr-dev-plus-aosp reported a breakage with a type inconsistency (long vs int in probably the codegen of dynamic bce); no time to investigate and fix this fully before my trip, so rolling back for now
0b5849be045c5683d4a6b6b6c306abadba5f0fcc 19-Oct-2015 Aart Bik <ajcbik@google.com> Dynamic BCE (based on induction range analysis)

Rationale: A rewritten dynamic BCE that uses induction variable analysis
to generate the run-time tests before a loop in order to
eliminate bounds-checks from its body. This CL removes now
obsoleted induction related code inside the BCE module.
Also, the dynamic test generation is placed more strategically,
since we missed a few cases where static analysis does better.

Most significant performance improvements (after filtering noise) is about:
Linpack +20%
LU > +10%

Change-Id: I4e7b8bab0288beff6f98a14856e3536103d32742
f652917de5634b30c974c81d35a72871915b352a 17-Nov-2015 Mark Mendell <mark.p.mendell@intel.com> Simplify boolean condition compared to 0

CaffeineMarkRR Logic has some boolean flipping which can be helped by
some simplification.

Simplify non-FP (A COND_OP B) != 0 to A OPPOSITE_COND_OP B.
This is better than the original code, which would use a HBooleanNot
after the condition.

Also simplify non-FP (A COND_OP B) == 1 to A OPPOSITE_COND_OP B.

Move GetOppositeCondition to nodes.h/nodes.cc to share with Boolean
Simplification, renaming it to InsertOppositeCondition, as it inserts
the new HInstruction (unless it is a constant).

Change-Id: I34ded7758836e375de0d6fdba9239d2d451928d0
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
fbb184a1c6df22d9302b32b55206396c8278edcf 13-Nov-2015 Vladimir Marko <vmarko@google.com> Fix ClinitCheck pruning.

Make sure we merge the ClinitCheck only with LoadClass and
HInvokeStaticOrDirect that is a part of the very same dex
instruction. This fixes incorrect stack traces from class
initializers (wrong dex pcs).

Rewrite the pruning to do all the ClinitCheck merging when
we see the ClinitCheck, instead of merging ClinitCheck into
LoadClass and then LoadClass into HInvokeStaticOrDirect.
When we later see an HInvokeStaticOrDirect with an explicit
check (i.e. not merged), we know that some other instruction
is doing the check and the invoke doesn't need to, so we
mark it as not requiring the check at all. (Previously it
would have been marked as having an implicit check.)

Remove the restriction on merging with inlined invoke static
as this is not necessary anymore. This was a workaround for
X.test():
invoke-static C.foo() [1]
C.foo():
invoke-static C.bar() [2]
After inlining and GVN we have
X.test():
LoadClass C (from [1])
ClinitCheck C (from [1], to be merged to LoadClass)
InvokeStaticOrDirect C.bar() (from [2])
and the LoadClass must not be merged into the invoke as this
would cause the resolution trampoline to see an inlined
frame from the not-yet-loaded class C during the stack walk
and try to load the class. However, we're not allowed to
load new classes at that point, so an attempt to do so leads
to an assertion failure. With this CL, LoadClass is not
merged when it comes from a different instruction, so we can
guarantee that all inlined frames seen by the stack walk in
the resolution trampoline belong to already loaded classes.

Change-Id: I2b8da8d4f295355dce17141f0fab2dace126684d
0f7dca4ca0be8d2f8776794d35edf8b51b5bc997 02-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing/X86: PC-relative dex cache array addressing.

Add PC-relative dex cache array addressing for X86 and use
it for better invoke-static/-direct dispatch. Also delay
the initialization to the PC-relative base until needed.

Change-Id: Ib8634d5edce4920cd70172fd13211809cf6948d1
d26a411adee1e71b3f09dd604ab9b23018037138 10-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Refactor iteration over normal/exceptional successors

Add helper methods on HBasicBlock which return ArrayRef with the
suitable sub-array of the `successors_` list.

Change-Id: I66c83bb56f2984d7550bf77c48110af4087515a8
9e23df5c21bed53ead79e3131b67105abc8871e4 10-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing: Improve constant folding + DCE for inlining.

Run constant folding before DCE in inliner to eliminate more
code that can prevent inlining. Improve the constant folding
to evaluate Equals and NotEquals for null inputs.

Change-Id: I876ffb903ef39484370b6c8793f0f8467a977362
59a850ee63a637660599f215901058b059f3a4b4 10-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Improve comment about inlining into try/catch

Change-Id: I66a4fd3206847c8d5bb57b1678d9d3dc94331294
dc0d1eb8a4b033db4acf136cd563de865542098a 10-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Add clarifying comment

Change-Id: I189ec4cae0aa1a5245a79e86d1ec0592e38eac4a
8a7c0fe837bb00b02dfcfc678910d81d07fb2136 02-Nov-2015 David Brazdil <dbrazdil@google.com> Revert "Revert "ART: Update DCE to work with try/catch""

The previous CL failed because it did not update inputs of catch phis.
Since phi input indices cannot be easily mapped back to throwing
instructions, this new implementation at least removes catch phi uses
of values defined in the removed blocks to preserve graph consistency.

This reverts commit fb552d7061746f7a90fdd5002696e255e2e15c35.

Change-Id: I63d95915d1ef50e71d3bcf0cd10aaded554035b4
391d01f3e8dedf3af3727bdf5d5b76ab35d52795 06-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing: Rewrite search for common dominators.

Provide a utility class that can be used to quickly search
for common dominators of two or more blocks. Change the
algorithm to avoid memory allocations.

Change-Id: Id72c975fc42377cb7622902f87c4262ea7b3cc38
db51efb3617d15f1cd9e5ff0cc2d934777014e9a 06-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Fix critical edge splitting under try/catch

A critical edge would not be split if the predecessor ends with
TryBoundary. This would eventually trip liveness analysis because
a back edge block would have smaller liveness position than a nested
loop.

Another implication of this change is that an edge between a loop's
pre-header ending with TryBoundary and the header will be split,
guaranteeing that a pre-header always has just one successor.

Bug: 25493695
Bug: 25454012
Change-Id: I5a13b8bb74509b48f5d628906f7158af007f99ae
b554b5a5ae3cdc66969d61be20783a8af816206e 06-Nov-2015 Vladimir Marko <vmarko@google.com> Optimizing: Remove unused ArtMethod* input from HInvokeStaticOrDirect.

Change-Id: Iea99fa683440673ff517e246f35fade96600f229
9bc436160b4af99067973affb0b1008de9a2b04c 05-Nov-2015 David Brazdil <dbrazdil@google.com> ART: Fix simplification of catch blocks in the presence of dead code

Simplification of catch blocks transforms the code so that catch
blocks have only exceptional predecessors. However, it is invoked
before trivially dead code is eliminated which breaks simple
assumptions such as the fact that a catch block cannot start with
move-exception if it has non-exceptional predecessors. This patch
fixes the algorithm to work under these relaxed conditions.

Bug: 25494450
Bug: 25492628
Change-Id: Idc8d010102a4b8b9a6cd918b98d6e11d1838db0c
fb552d7061746f7a90fdd5002696e255e2e15c35 02-Nov-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Update DCE to work with try/catch"

This reverts commit ce52901e2c8377fc1c331ae0faf7fbcb46b9da97.

Change-Id: I6b3a1f2a3dc036030b089b0df2005ecefa66b949
ce52901e2c8377fc1c331ae0faf7fbcb46b9da97 29-Oct-2015 David Brazdil <dbrazdil@google.com> ART: Update DCE to work with try/catch

Dead block elimination was previously disabled because it needed
to be updated. With this patch, try/catch blocks can be removed
as a result of a dead if/switch branch.

Change-Id: I3261060bf24fd5fe7bb0f989247f0ef62ec5fd7b
951779839f0d35ed5336f399c8f521fd9a6b7c27 30-Oct-2015 David Brazdil <dbrazdil@google.com> ART: Enable inlining under try/catch

This patch updates the inliner to set try/catch information
when inlining into a method with try/catch. It does not yet
allow inlining of methods with try/catch because that will
require generating catch stack maps with inline info.

Change-Id: I7d57e1454e7da537d75c5c7eda60b22f3a30fa60
771e5cc519665ce0cc76985bb4803f0dd50c3b40 29-Oct-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Enable more passes under try/catch"

BCE does not set TryCatchInformation when creating new blocks.
Will be fixed with DynamicBCE CL.

This reverts commit 39fabd6bb6fcf7a712b370d3b6fd0ada83e2e5d8.

Change-Id: I76ae707ac132bb1a4a9f64f4916ffcd786ef730c
73f1f3be46652d3f6df61b4234c366ebbf81274a 28-Oct-2015 Aart Bik <ajcbik@google.com> Move loop invariant utility to more general place.

Change-Id: I15ebfbf9684f0fcce9e63d078ff8dc1381fd1ca3
39fabd6bb6fcf7a712b370d3b6fd0ada83e2e5d8 26-Oct-2015 David Brazdil <dbrazdil@google.com> ART: Enable more passes under try/catch

LICM, BCE, LSE are all safe under try/catch. Inliner and DCE
need updating and will be enabled in follow-up CLs.

Change-Id: I86db5f811257d5e765fea91666a2a2af0fb24ec3
dc151b2346bb8a4fdeed0c06e54c2fca21d59b5d 15-Oct-2015 Vladimir Marko <vmarko@google.com> Optimizing: Determine invoke-static/-direct dispatch early.

Determine the dispatch type of invoke-static/-direct in a
special pass right after the type inference. This allows the
inliner to pass the "needs dex cache" check and inline more.
It also allows the code generator to avoid requesting a
register location for the ArtMethod* for kDexCachePcRelative
and direct methods.

The supported dispatch check handles also situations that
the CompilerDriver currently doesn't allow. The cleanup of
the CompilerDriver and required changes to Quick will come
in a separate change.

Change-Id: I3f8e903a119949e95871d8ab0a995f4731a13a07
214bbcd1d7454197427c13cc082860619357d847 20-Oct-2015 Calin Juravle <calin@google.com> Inliner: make sure the returned value is in the outer graph.

The returned value may be a constant or a parameter value. If so, it
will be in the inlined entry_block and (before this CL) we would not
update its block or graph. This CL fixes this and makes sure that the
returned value belongs to the outer graph.

Change-Id: Ie296f0d5a320c33f39eb187df6d328371ccf6500
805b3b56c6eb542298db33e0181f135dc9fed3d9 18-Sep-2015 Mark Mendell <mark.p.mendell@intel.com> X86 jump tables for PackedSwitch

Implement X86PackedSwitch using a jump table of offsets to blocks. The
X86PackedSwitch version just adds an input to address the constant area.

Change-Id: Id2752a1ee79222493040c6fd0e59aee9a544b76a
Bug: 21119474
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ec7802a102d49ab5c17495118d4fe0bcc7287beb 01-Oct-2015 Vladimir Marko <vmarko@google.com> Add DCHECKs to ArenaVector and ScopedArenaVector.

Implement dchecked_vector<> template that DCHECK()s element
access and insert()/emplace()/erase() positions. Change the
ArenaVector<> and ScopedArenaVector<> aliases to use the new
template instead of std::vector<>. Remove DCHECK()s that
have now become unnecessary from the Optimizing compiler.

Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
a83a54d7f2322060f08480f8aabac5eb07268912 02-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Add support for intrinsic optimizations.

Change-Id: Ib5a4224022f9360e60c09a19ac8642270a7f3b64
225b6464a58ebe11c156144653f11a1c6607f4eb 28-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in code generators.

And completely remove the deprecated GrowableArray.

Replace GrowableArray with ArenaVector in code generators
and related classes and tag arena allocations.

Label arrays use direct allocations from ArenaAllocator
because Label is non-copyable and non-movable and as such
cannot be really held in a container. The GrowableArray
never actually constructed them, instead relying on the
zero-initialized storage from the arena allocator to be
correct. We now actually construct the labels.

Also avoid StackMapStream::ComputeDexRegisterMapSize() being
passed null references, even though unused.

Change-Id: I26a46fdd406b23a3969300a67739d55528df8bf4
d7558daaa86decf5a38f4f9bcd82267ab6e3e17f 22-Sep-2015 David Brazdil <dbrazdil@google.com> ART: Preserve loop headers with try/catch

Algorithm for inserting HTryBoundary instructions would generate a
non-natural loop when a loop header block was covered by a TryItem.
This patch changes the approach to fix the issue.

Bug: 23895756
Change-Id: I0e1ee6cf135cea326a96c97954907d202c9793cc
1f8695ca0c0d443a3d2754637ea5c9459147af55 24-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Rewrite HGraph::FindBackEdges().

Replace a recursive implementation with a loop using a work
list to avoid stack overflow for 702-LargeBranchOffset in
host debug build with -O0, 512KiB thread pool worker stack.

Change-Id: Iaa91f006fa1099913aeffc9c764879bd004d56de
d76d1390b04a4db9ca1f74eb4873d926643d979b 23-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Rewrite HGraph::ComputeDominanceInformation().

Replace a recursive implementation with a loop using a work
list to avoid stack overflow for 702-LargeBranchOffset in
host debug build with -O0.

Bug: 24133462
Change-Id: I444cc85733a9212403a071ea98b9ddfb52bfc402
fe57faa2e0349418dda38e77ef1c0ac29db75f4d 18-Sep-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Add basic PackedSwitch support

Add HPackedSwitch, and generate it from the builder. Code generators
convert this to a series of compare/branch tests. Better implementation
in the code generators as a real jump table will follow as separate CLs.

Change-Id: If14736fa4d62809b6ae95280148c55682e856911
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b7d8e8cf7063fdec1cce6ebd33e33804976bd978 17-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Do not use range-based loop when inserting elements.

When we iterate over the elements of a container and we may
insert new elements into that container, it's wrong to use
the range-based loop.

Bug: 24133462
Change-Id: Iee35fbcf88ed3bcd6155cbeba09bd256032a16be
71bf8090663d02869cafafdd530976f7f2a9db7f 15-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in SsaBuilder.

Replace GrowableArray with ArenaVector in SsaBuilder and
tag allocations with a new arena allocation type.

Change-Id: I27312c51d7be9d2ad02a974cce93b365c65c5fc4
fa6b93c4b69e6d7ddfa2a4ed0aff01b0608c5a3a 15-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag arena allocations in HGraph.

Replace GrowableArray with ArenaVector in HGraph and related
classes HEnvironment, HLoopInformation, HInvoke and HPhi,
and tag allocations with new arena allocation types.

Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
77a48ae01bbc5b05ca009cf09e2fcb53e4c8ff23 15-Sep-2015 David Brazdil <dbrazdil@google.com> Revert "Revert "ART: Register allocation and runtime support for try/catch""

The original CL triggered b/24084144 which has been fixed
by Ib72e12a018437c404e82f7ad414554c66a4c6f8c.

This reverts commit 659562aaf133c41b8d90ec9216c07646f0f14362.

Change-Id: Id8980436172457d0fcb276349c4405f7c4110a55
baf89b8f2591df93686df545bc6ccc6eb3b128cc 15-Sep-2015 David Brazdil <dbrazdil@google.com> ART: Fix bug in reference type propagation

Reference type propagation assumed that type is exact if the class is
final. This does not hold for arrays which are always final and the
component type needs to be considered.

Bug: 24084144
Change-Id: Ib72e12a018437c404e82f7ad414554c66a4c6f8c
659562aaf133c41b8d90ec9216c07646f0f14362 14-Sep-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Register allocation and runtime support for try/catch"

Breaks libcore test org.apache.harmony.security.tests.java.security.KeyStorePrivateKeyEntryTest#testGetCertificateChain. Need to investigate.

This reverts commit b022fa1300e6d78639b3b910af0cf85c43df44bb.

Change-Id: Ib24d3a80064d963d273e557a93469c95f37b1f6f
b022fa1300e6d78639b3b910af0cf85c43df44bb 20-Aug-2015 David Brazdil <dbrazdil@google.com> ART: Register allocation and runtime support for try/catch

This patch completes a series of CLs that add support for try/catch
in the Optimizing compiler. With it, Optimizing can compile all
methods containing try/catch, provided they don't contain catch loops.
Future work will focus on improving performance of the generated code.

SsaLivenessAnalysis was updated to propagate liveness information of
instructions live at catch blocks, and to keep location information on
instructions which may be caught by catch phis.

RegisterAllocator was extended to spill values used after catch, and
to allocate spill slots for catch phis. Catch phis generated for the
same vreg share a spill slot as the raw value must be the same.

Location builders and slow paths were updated to reflect the fact that
throwing an exception may not lead to escaping the method.

Instruction code generators are forbidden from using of implicit null
checks in try blocks as live registers need to be saved before handing
over to the runtime.

CodeGenerator emits a stack map for each catch block, storing locations
of catch phis. CodeInfo and StackMapStream recognize this new type of
stack map and store them separate from other stack maps to avoid dex_pc
conflicts.

After having found the target catch block to deliver an exception to,
QuickExceptionHandler looks up the dex register maps at the throwing
instruction and the catch block and copies the values over to their
respective locations.

The runtime-support approach was selected because it allows for the
best performance in the normal control-flow path, since no propagation
of catch phi values is necessary until the exception is thrown. In
addition, it also greatly simplifies the register allocation phase.

ConstantHoisting was removed from LICMTest because it instantiated
(now abstract) HConstant and was bogus anyway (constants are always in
the entry block).

Change-Id: Ie31038ad8e3ee0c13a5bbbbaf5f0b3e532310e4e
3ecfd65143d95bd7c6cbe4f58c33af517d3761e0 07-Sep-2015 Yevgeny Rouban <yevgeny.y.rouban@intel.com> Add dex_pc to all HInstructions in builder.

Optimizing compiler generates minimum debug line info that
is built using the dex_pc information about suspend points.
This is not enough for performance and debugging needs.

This patch makes all HInstructions contain
dex_pc and all allocations in the builder define this value.

Change-Id: I1d14aefe075189b7b1b41b4384c3499474c19afc
Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
Signed-off-by: Serdjuk, Nikolay Y <nikolay.y.serdjuk@intel.com>
6058455d486219994921b63a2d774dc9908415a2 03-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag basic block allocations with their source.

Replace GrowableArray with ArenaVector in HBasicBlock and,
to track the source of allocations, assign one new and two
Quick's arena allocation types to these vectors. Rename
kArenaAllocSuccessor to kArenaAllocSuccessors.

Bug: 23736311
Change-Id: Ib52e51698890675bde61f007fe6039338cf1a025
145acc5361deb769eed998f057bc23abaef6e116 03-Sep-2015 Vladimir Marko <vmarko@google.com> Revert "Optimizing: Tag basic block allocations with their source."

Reverting so that we can have more discussion about the STL API.

This reverts commit 91e11c0c840193c6822e66846020b6647de243d5.

Change-Id: I187fe52f2c16b6e7c5c9d49c42921eb6c7063dba
91e11c0c840193c6822e66846020b6647de243d5 02-Sep-2015 Vladimir Marko <vmarko@google.com> Optimizing: Tag basic block allocations with their source.

Replace GrowableArray with ArenaVector in HBasicBlock and,
to track the source of allocations, assign one new and two
Quick's arena allocation types to these vectors. Rename
kArenaAllocSuccessor to kArenaAllocSuccessors.

Bug: 23736311
Change-Id: I984aef6e615ae2380a532f5c6726af21015f43f5
bbd733e4ef277eff19bf9a6601032da081e9b68f 18-Aug-2015 David Brazdil <dbrazdil@google.com> ART: Enable basic optimizations for try/catch

Generating code for try/catch methods requires having run at least the
instruction simplifier to remove redundant suspend checks. This patch
enables the first group of optimizations when try/catch is present.

Enabled optimizations:
1) IntrinsicsRecognizer
Does not modify the graph, only sets HInvoke::intrinsic_.

2) ConstantFolding
Does not deal with throwing instructions.

3) InstructionSimplifier
May remove a throwing instruction (e.g. LoadClass in VisitCheckCast),
or may turn a throwing instruction into a non-throwing one (ArraySet).
Their corresponding catch phi inputs are not removed but correctness
is preserved.

4) ReferenceTypePropagation
Does not modify the graph, only sets type properties. Typing of
LoadException from catch handler information was added.

5) DeadCodeElimination
Removing individual instructions is fine (same as 3). Removal of dead
blocks was disabled for try/catch.

Change-Id: I2722c3229eb8aaf326391e07f522dbf5186774b8
ec16f79a4d0aeff319bf52139a0c82de3080d73c 19-Aug-2015 David Brazdil <dbrazdil@google.com> ART: Refactor try/catch block info, store exception type

This patch replaces HBasicBlock fields storing try/catch info with a
single TryCatchInformation data structure, saving memory for the
majority of non-try/catch blocks. It also changes builder to store
the exception type for catch blocks.

Change-Id: Ib3e43f7db247e6915d67c267fc62410420e230c9
29fc008c9689e9036a3f5e3bd186bbfb5de3cb82 18-Aug-2015 David Brazdil <dbrazdil@google.com> ART: Revert storing of exceptional predecessors

After change of the approach for try/catch register allocation, it is
no longer necessary to record instructions which might throw into a
catch block.

Change-Id: I7ef12ed06c49a35280029810975fa2a50fe4a424
9867bc722f7c41e07a95397bc08b790cd21dc758 05-Aug-2015 Roland Levillain <rpl@google.com> Have constant folding be more flexible.

- Have Evaluate methods take as argument(s) and return value
instances of HConstant (instead of built-in 32- or 64-bit
integer values), to let the evaluated instruction choose
the type of the statically evaluated node; for instance,
art::HEqual::Evaluate shall return a HIntConstant
node (as implementation of a Boolean constant) whatever
the type of its inputs (a pair of HIntConstant or a pair
of HLongConstant).
- Split the evaluation job from the operation logic: the
former is addressed by Evaluate methods, while the latter
is done by a generic Compute method.
- Adress valid BinOp(int, long) and BinOp(long, int) cases.
- Add a constructor to art::HIntConstant to build an integer
constant from a `bool` value.

Change-Id: If84b6fe8406bb94ddb1aa8b02e36628dff526db3
c90bc7c07f9bd24b5424cfb1e3f064fbae5334d6 11-Dec-2014 Roland Levillain <rpl@google.com> Add constant folding for long unary operations in opt. compiler.

Add tests to exercise the constant folding of these
instructions.

Also, prevent Java methods from run-tests exercising the
code generation of these instruction from being inlined,
so that they continue to check the generated code (and
not the code produced by the constant folding pass).

Change-Id: I28efca7cdb5142ac2b6d158ba296fb9136d62481
b618adebbc19e50d7b1aa2f11b84341beb3c64dc 29-Jul-2015 David Brazdil <dbrazdil@google.com> ART: Store and check exceptional predecessors

Future CL on register allocation for try/catch will require the
knowledge of instructions which throw into a catch block. This patch
stores that information with the basic block and verifies it in the
graph checker.

More checks on try catch also added to the graph checker and an order
of exception handlers is enforced in TryBoundary successors.

Change-Id: I3034c610791ea51d96724bcca97f49ec6ecf2af3
2e76830f0b3f23825677436c0633714402715099 28-Jul-2015 Calin Juravle <calin@google.com> Revert "Revert "Revert "Revert "Use the object class as top in reference type propagation""""

This reverts commit b734808d0c93af98ec4e3539fdb0a8c0787263b0.

Change-Id: Ifd925f166761bcb9be2268ff0fc9fa3a72f00c6f
b734808d0c93af98ec4e3539fdb0a8c0787263b0 28-Jul-2015 Calin Juravle <calin@google.com> Revert "Revert "Revert "Use the object class as top in reference type propagation"""

This reverts commit 80caa1478cf3df4eac1214d8a63a4da6f4fe622b.

Change-Id: I63b51ca418b19b2bfb5ede3f8444f8fbeb8a339d
80caa1478cf3df4eac1214d8a63a4da6f4fe622b 16-Jul-2015 Calin Juravle <calin@google.com> Revert "Revert "Use the object class as top in reference type propagation""

This reverts commit 7733bd644ac71f86d4b30a319624b23343882e53.

Change-Id: I7d393a808c01c084c18d632a54e0554b4b455f2c
7733bd644ac71f86d4b30a319624b23343882e53 22-Jul-2015 Calin Juravle <calin@google.com> Revert "Use the object class as top in reference type propagation"

This reverts commit 3fabec7a25d151b26ba7de13615bbead0dd615a6.

Change-Id: Id8614f6b6e3e0e4c9caeb9f771e4c145d9fec64f
3fabec7a25d151b26ba7de13615bbead0dd615a6 16-Jul-2015 Calin Juravle <calin@google.com> Use the object class as top in reference type propagation

This properly types all instructions, making it safe to query the type
at any time.

This also moves a few functions from class.h to class-inl.h to please
gcc linker when compiling for target.

Change-Id: I6b7ce965c10834c994b95529ab65a548515b4406
ffee3d33f3ea39aa6031c3d2ff29c4806c8dcc51 06-Jul-2015 David Brazdil <dbrazdil@google.com> ART: Build SSA form when try/catch is present

This patch implements support for try/catch in the SsaBuilder.
Values of locals are propagated from throwing sites inside try
blocks to their respective catch blocks and phis ("catch phis")
are created when necessary.

Change-Id: I0736565c2c4ff3f9f0924b6e3a785a50023f875a
beba9302bec33d72beb582970bf23d056f62641f 08-Jul-2015 Calin Juravle <calin@google.com> Revert "Use the object class as top in reference type propagation"

failing on the build bot on some targets but not locally. needs more investigation.

This reverts commit 20e6071362b84a9782b633a893c29ebde458205e.

Change-Id: I6965483f569fb862f9bdb66d459b747ded54de71
20e6071362b84a9782b633a893c29ebde458205e 01-Jul-2015 Calin Juravle <calin@google.com> Use the object class as top in reference type propagation

This properly types all instructions, making it safe to query the type
at any time.

Change-Id: I3ee2f0f79253cdf45b10ddab37ecb473345ca53a
c470193cfc522fc818eb2eaab896aef9caf0c75a 10-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> Fuse long and FP compare & condition on x86/x86-64 in Optimizing.

This is a preliminary implementation of fusing long/float/double
compares with conditions to avoid materializing the result from the
compare and condition.

The information from a HCompare is transferred to the HCondition if it
is legal. There must be only a single use of the HCompare, the HCompare
and HCondition must be in the same block, the HCondition must not need
materialization.

Added GetOppositeCondition() to HCondition to return the flipped
condition.

Bug: 21120453
Change-Id: I1f1db206e6dc336270cd71070ed3232dedc754d6
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
56e1accf3966ae92e151567abf4561ef3f6466f4 30-Jun-2015 David Brazdil <dbrazdil@google.com> ART: Changes to try-catch in GraphBuilder

This patch adds an additional case into the insertion algorithm for
HTryBoundary inside HGraphBuilder in order to better handle catch
blocks covered by a TryItem.

Building SSA form also required to stop combining HTryBoundaries for
neighbouring TryItems because it was not clear which exception
handlers belong to which try block.

Change-Id: Ic68bd6ef98fee784609fa593cb08dca1f00a15e0
fc6a86ab2b70781e72b807c1798b83829ca7f931 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "Revert "ART: Implement try/catch blocks in Builder""

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

This reverts commit 3e18738bd338e9f8363b26bc895f38c0ec682824.

Change-Id: I4f5ea961848a0b83d8db3673763861633e9bfcfb
3e18738bd338e9f8363b26bc895f38c0ec682824 26-Jun-2015 David Brazdil <dbrazdil@google.com> Revert "ART: Implement try/catch blocks in Builder"

Causes OutOfMemory issues, need to investigate.

This reverts commit 0b5c7d1994b76090afcc825e737f2b8c546da2f8.

Change-Id: I263e6cc4df5f9a56ad2ce44e18932ca51d7e349f
0b5c7d1994b76090afcc825e737f2b8c546da2f8 11-Jun-2015 David Brazdil <dbrazdil@google.com> ART: Implement try/catch blocks in Builder

This patch enables the GraphBuilder to generate blocks and edges which
represent the exceptional control flow when try/catch blocks are
present in the code. Actual compilation is still delegated to Quick
and Baseline ignores the additional code.

To represent the relationship between try and catch blocks, Builder
splits the edges which enter/exit a try block and links the newly
created blocks to the corresponding exception handlers. This layout
will later enable the SsaBuilder to correctly infer the dominators of
the catch blocks and to produce the appropriate reverse post ordering.
It will not, however, allow for building the complete SSA form of the
catch blocks and consequently optimizing such blocks.

To this end, a new TryBoundary control-flow instruction is introduced.
Codegen treats it the same as a Goto but it allows for additional
successors (the handlers).

Change-Id: I415b985596d5bebb7b1bb358a46e08b7b04bb53a
18b236e5261d2b1f312e632a4d3bb2273c8bf641 24-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Recompute dominator tree after DCE.

bug:22031382

(cherry picked from commit 1f82ecc6a0c9f88d03d6d1a6d95eeb8707bd06c1)

Change-Id: I9a74edb185cb806045903dfe9695d9cc1a02e86b
fe659462e7d58bb2585b1bd029f9e08fd9dd32ae 24-Jun-2015 David Brazdil <dbrazdil@google.com> ART: Stop creating a fallthrough block for Goto

Optimizing's Builder used to create a basic block after a Goto under
the assumption that control flow can fall through.

Bug: 19084197
Change-Id: Id85f31df98a4177466750d3cd0bc8bb74782ca2d
1f82ecc6a0c9f88d03d6d1a6d95eeb8707bd06c1 24-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Recompute dominator tree after DCE.

bug:22031382
Change-Id: Ifebe169897b76872015e3ce0ed7d0a9662f80cef
1e256bf257e8d97df9b2178ae8658b731ca2d662 19-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Be careful with predecessor/successor index.

When we simplify the CFG, we must preserve things that were already
simplified. For example, the index in the predecessor list or
successor list of a block must be preserved for ensuring the
first block is a loop pre header.

bug:21867463

(cherry picked from commit 8b20f88b0a8d1b374dd5eaae289d19734c77b8f8)

Change-Id: I2581b5a50942290da96cd9ec876f6f2573e0a6c4
25fde612b0df01a086cd4c801b7bd3a10e93a0e9 18-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Fix a bug in optimizing when the null constant has been DCE.

If it has been DCE, we should create a new one, instead of
using the old one.

Also move the first DCE to a place where it could actually
be useful.

bug:21870788

(cherry picked from commit 18e6873c469b48aaed22148451523479eece98e3)

Change-Id: I3b3ab2dafe8ce5fb60868fd1a6ef0eeefe666e0c
bca381a12965a98e3727e93986dd0a195db500a0 20-May-2015 Mingyao Yang <mingyao@google.com> Fix premature deoptimization if the loop body isn't entered.

Add a test between initial_ and end_ to see if the loop body is entered.
If the loop body isn't entered at all, we jump to the loop header. Loop header is
still executed and is going to test the condition again and loop body won't be
entered. This makes sure no deoptimization is triggered if the loop body isn't
even entered.

Bug: 21034044

(cherry picked from commit 3584bce5b1f45e5741d3a6ca24884a36320ecb6b)

Change-Id: I2b6de1f22fbc4568ca419f76382ebd87806d9694
8b20f88b0a8d1b374dd5eaae289d19734c77b8f8 19-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Be careful with predecessor/successor index.

When we simplify the CFG, we must preserve things that were already
simplified. For example, the index in the predecessor list or
successor list of a block must be preserved for ensuring the
first block is a loop pre header.

bug:21867463

Change-Id: Ic3fcb3eb2c3fb109d8a57ee2a6b6d4d65fdb9410
18e6873c469b48aaed22148451523479eece98e3 18-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Fix a bug in optimizing when the null constant has been DCE.

If it has been DCE, we should create a new one, instead of
using the old one.

Also move the first DCE to a place where it could actually
be useful.

bug:21870788

Change-Id: I28fc52ae481ef92cba45fc1b5abcf07c995f524c
f78848f2ced8466b5fb2d7148d608288ee88757b 17-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Don't special case HCurrentMethod in DCE.

Instead, re-create the HCurrentMethod if it is needed
after it has been removed.

Change-Id: Id3bf15ae87b00a1d7eb35bf36d58fe96f788fba4
3584bce5b1f45e5741d3a6ca24884a36320ecb6b 20-May-2015 Mingyao Yang <mingyao@google.com> Fix premature deoptimization if the loop body isn't entered.

Add a test between initial_ and end_ to see if the loop body is entered.
If the loop body isn't entered at all, we jump to the loop header. Loop header is
still executed and is going to test the condition again and loop body won't be
entered. This makes sure no deoptimization is triggered if the loop body isn't
even entered.

Bug: 21034044
Change-Id: I2b6de1f22fbc4568ca419f76382ebd87806d9694
e401d146407d61eeb99f8d6176b2ac13c4df1e33 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
d23eeef3492b53102eb8093524cf37e2b4c296db 18-May-2015 Nicolas Geoffray <ngeoffray@google.com> Support for inlining methods that call/throw.

Mostly fixes here and there to make it working.

Change-Id: I1b535e895105d78b65634636d675b818551f783e
76b1e1799a713a19218de26b171b0aef48a59e98 27-May-2015 Nicolas Geoffray <ngeoffray@google.com> Add a HCurrentMethod node.

This enables register allocation for the current method, so
that users of it don't always load it from the stack.

Currently only used by HLoadClass. Will make follow-up
CLs for the other users.

Change-Id: If73324d85643102faba47fabbbd2755eb258c59c
41b175aba41c9365a1c53b8a1afbd17129c87c14 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
80afd02024d20e60b197d3adfbb43cc303cf29e0 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192
Change-Id: I704884dab15b41ecf7a1c47d397ab1c3fc7ee0f7
c7af85dad0dc392cfc0b373b0c1cb4b4197c89f4 26-May-2015 David Brazdil <dbrazdil@google.com> ART: Update graph's exit block field if removed

Running DCE on an infinite loop will delete the exit block but the
corresponding field is currently not cleared in the parent graph.
This does not cause any problems at the moment as that information is
only used in codegens to DCHECK that a block is not the exit block.
However, it will be necessary to update the inliner once we start to
inline methods with loops.

With this patch, DCE will update the HGraph::exit_block_ field. DCHECK
was also added to HGraph::InlineInto to make sure that the inlined
graph does have an exit block.

Change-Id: Ia8ddca375bbc6830cd919af6059a52cc9b73a023
e82549b14c7def0a45461183964f7e6a34cbb70c 06-May-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Fold HTypeConversion of constants

While looking into optimizing long shifts on x86, I found that the
compiler wasn't folding HTypeConversion of constants. Add simple
conversions of constants, taking care of float/double values
with NaNs and small/large values, ensuring Java conversion semantics.

Add checker cases to see that constant folding of HTypeConversion is
done.

Ensure 422-type-conversion type conversion routiness don't get
inlined to avoid compile time folding.

Change-Id: I5a4eb376b64bc4e41bf908af5875bed312efb228
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e8ff50df01c89e1b5264a5a900cfebdde87a9b44 07-May-2015 David Brazdil <dbrazdil@google.com> ART: Rediscover loops after deleting blocks in DCE

The way DCE currently updates loop information does not cover all
cases. This patch removes the logic, resets loop information of live
blocks to pre-SSA state and reanalyzes the affected loops.

Change-Id: I0b996a70235b95a8db0de9a23a03f71db57a21b8
(cherry picked from commit a4b8c21dae70ae34aee13628632c39a675c06022)
a4b8c21dae70ae34aee13628632c39a675c06022 07-May-2015 David Brazdil <dbrazdil@google.com> ART: Rediscover loops after deleting blocks in DCE

The way DCE currently updates loop information does not cover all
cases. This patch removes the logic, resets loop information of live
blocks to pre-SSA state and reanalyzes the affected loops.

Change-Id: I0b996a70235b95a8db0de9a23a03f71db57a21b8
0a23d74dc2751440822960eab218be4cb8843647 07-May-2015 Nicolas Geoffray <ngeoffray@google.com> Add a parent environment to HEnvironment.

This code has no functionality change. It adds a placeholder
for chaining inlined frames.

Change-Id: I5ec57335af76ee406052345b947aad98a6a4423a
8c0c91a845568624815df026cfdac8c42ecccdf6 07-May-2015 Nicolas Geoffray <ngeoffray@google.com> Use a growable array instead of an environment during SSA.

Using an environment was convenient because it contains
a growable array. But there's no need for the environment
abstraction when being used as a temporary holder for values
of locals.

Change-Id: Idf2883fe4b8f97a31ee70b3627c1bdd23ebfff0e
db216f4d49ea1561a74261c29f1264952232728a 05-May-2015 Nicolas Geoffray <ngeoffray@google.com> Relax the only one back-edge restriction.

The rule is in the way for better register allocation, as
it creates an artificial join point between multiple paths.

Change-Id: Ia4392890f95bcea56d143138f28ddce6c572ad58
6db49a74e8402d3b6c66536ea7ec988144c05d24 28-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Update the remaining input index of phis after deleting an input.

bug:20715803
bug:20690906

(cherry picked from commit 5d7b7f81ed5455893f984752c00571ef27cc97c5)

Change-Id: Ie55739601b8d6fedc830d6e19d8a053392047d34
5d7b7f81ed5455893f984752c00571ef27cc97c5 28-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Update the remaining input index of phis after deleting an input.

bug:20715803
bug:20690906

Change-Id: Iaf08f0c30d629e766be2b04815dc3e38b6e7ff35
395086f0a9e0658a2d33eeade7121db55c1f5dc8 29-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Fix loop information after dead code elimination

Compilation failed when only some blocks of a loop were removed during
dead code elimination.

Bug: 20680703
(cherry picked from commit 69a2804c3bb48cf4fd00a66080f613a4fd96c422)

Change-Id: If9988381236e4d8d8c3b508dfce1376b27c20d75
69a2804c3bb48cf4fd00a66080f613a4fd96c422 29-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Fix loop information after dead code elimination

Compilation failed when only some blocks of a loop were removed during
dead code elimination.

Bug: 20680703
Change-Id: If31025169ca493f0d7f7f2788576e98d05f03394
2b1c622d5db941fe06b3ea9c1a5366358fa298c6 27-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Fix removing a Phi with RemoveInstruction

Boolean simplifier might attempt to remove a Phi from the Instruction
list.

(cherry picked from commit c7508e93fa3df3a3890f6b62550cbd5e35bdd8df)

Change-Id: Ic8ad31967aa3e47c1fb1c67553d08681b6063a16
f213e05cef6d38166cfe0cce8f3b0a53225a1b39 27-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Add support for caching float and double constants.

Change-Id: Ib5205bad1006bc5e3c9cc86bc82a6b4b1ce9bef9
c7508e93fa3df3a3890f6b62550cbd5e35bdd8df 27-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Fix removing a Phi with RemoveInstruction

Boolean simplifier might attempt to remove a Phi from the Instruction
list.

Change-Id: I698cc616549bd88dac96395cb2e5d09b5433d157
2967ec6c3dad1c1dc15fc827188bd5ecfa75493b 24-Apr-2015 Guillaume "Vermeille" Sanchez <guillaumesa@google.com> Add InsertInstructionAfter in HBasicBlock.

Change-Id: I56e4e6edb39d1aab747877b7e517e94f0393f296
206d6fd6cae5ba8c4d5f0e230111fe77b9d5c0a5 14-Apr-2015 Mingyao Yang <mingyao@google.com> Deoptimization-based BCE for unknown loop bounds.

For loop like:
for (int i = start; i < end; i++) {
array[i] = 1;
}
We add the following to the loop pre-header:
if (start < 0) deoptimize();
if (end > array.length) deoptimize();
Then we can eliminate bounds-check of array[i] inside the loop.

We also take care of indexing with induction variable plus some offsets,
like array[i - 1]/array[i + 1] inside the loop, and adjust the condition
for deoptimization accordingly.

Change-Id: I9e24c6b5e134ff95eff5b5605ff8f95d6546616f
067cae2c86627d2edcf01b918ee601774bc76aeb 26-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "[optimizing] Replace FP divide by power of 2"

Fails compiling docs.

This reverts commit b0bd8915cb257cdaf46ba663c450a6543bca75af.

Change-Id: I47d32525c83a73118e2163eb58c68bbb7a28bb38
4c0eb42259d790fddcd9978b66328dbb3ab65615 24-Apr-2015 Roland Levillain <rpl@google.com> Ensure inlined static calls perform clinit checks in Optimizing.

Calls to static methods have implicit class initialization
(clinit) checks of the method's declaring class in
Optimizing. However, when such a static call is inlined,
the implicit clinit check vanishes, possibly leading to an
incorrect behavior.

To ensure that inlining static methods does not change the
behavior of a program, add explicit class initialization
checks (art::HClinitCheck) as well as load class
instructions (art::HLoadClass) as last input of static
calls (art::HInvokeStaticOrDirect) in Optimizing' control
flow graphs, when the declaring class is reachable and not
known to be already initialized. Then when considering the
inlining of a static method call, proceed only if the method
has no implicit clinit check requirement.

The added explicit clinit checks are already removed by the
art::PrepareForRegisterAllocation visitor. This CL also
extends this visitor to turn explicit clinit checks from
static invokes into implicit ones after the inlining step,
by removing the added art::HLoadClass nodes mentioned
hereinbefore.

Change-Id: I9ba452b8bd09ae1fdd9a3797ef556e3e7e19c651
2d7352ba5311b8f57427b91b7a891e61497373c1 20-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Dead block removal

Adds a new pass which finds all unreachable blocks, typically due to
simplifying an if-condition to a constant, and removes them from the
graph. The patch also slightly generalizes the graph-transforming
operations.

Change-Id: Iff7c97f1d10b52886f3cd7401689ebe1bfdbf456
2cebb24bfc3247d3e9be138a3350106737455918 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Replace NULL with nullptr

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
c3d743fa2a26effcb35627d8a1338029c86e582a 22-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Update last_instruction when adding Phis

HBasicBlock::InsertPhiAfter would not update the last_instruction
pointer when adding at the end of the list. This could cause problems
when iterating over phis backwards. Fortunately, we don't do that
anywhere in the existing code.

Change-Id: I4487265bf2cf3d3819623fafd7ce7c359bac190e
7d275379bf490a87805852129e3fe2e8afe961e7 21-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Update loop info of all nested loops when inlining

When inlining into a nested loop, the inliner would only add the new
blocks into the innermost loop info object. This patch fixes that and
modifies SsaChecker to verify the property.

Change-Id: I21d343a6f7d972f5b7420701f816c65ab3f20566
b0bd8915cb257cdaf46ba663c450a6543bca75af 16-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Replace FP divide by power of 2

Replace a floating point division by a power of two by a multiplication
of the reciprocal. This is guarenteed to have the exact same result as
it is exactly representable.

Add routines to allow generation of float and double constants after the
SSA Builder. I was unsure if float and double caches should be
implemented. Under the assumption that there is probably not a lot of
repetition of FP values. Please let me know.

Change-Id: I3a6c3847b49b4e747a7e7e8843ca32bb174b1584
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
05144f4322eed049f4878015bf1f0381d419b785 16-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Hot fix for an inliner issue

Change 147130 caused a problem with duplicit instruction ids when
inlining methods with constants. This is a hot fix to unblock build.

Change-Id: Ieddadcd94135930a1f29ad64ad57349a384da07f
4a3faecbe4157225a3fe83a9ef7f4992dfc9c19d 16-Apr-2015 David Brazdil <dbrazdil@google.com> ART: Don't duplicate null/int/long constants when inlining

Change-Id: I7e6a3393fcbbcf76b4ba2000915ba6bbbfb7c70e
f776b92a0d52bb522043812dacb9c21ac11858e2 15-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Remove dead blocks for the blocks_ array.

This prevents crashing because of structurally incorrect
blocks. Also we now don't need to remove its instructions.

Test case courtesy of Serguei I Katkov.

Change-Id: Ia3ef9580549fc3546e8cd5f346079b1f0ceb2a61
d43b3ac88cd46b8815890188c9c2b9a3f1564648 01-Apr-2015 Mingyao Yang <mingyao@google.com> Revert "Revert "Deoptimization-based bce.""

This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430.

Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
8d5b8b295930aaa43255c4f0b74ece3ee8b43a47 24-Mar-2015 David Brazdil <dbrazdil@google.com> ART: Force constants into the entry block

Optimizations such as GVN and BCE make the assumption that all
constants are located in the entry block of the CFG, but not all
passes adhere to this rule.

This patch makes constructors of constants private and only accessible
to friend classes - HGraph for int/long constants and SsaBuilder for
float/double - which ensure that they are placed correctly and not
duplicated.

Note that the ArenaAllocatorAdapter was modified to not increment
the ArenaAllocator's internal reference counter in order to allow
for use of ArenaSafeMap inside an arena-allocated objects. Because
their destructor is not called, the counter does not get decremented.

Change-Id: I36a4fa29ae34fb905cdefd482ccbf386cff14166
b2bd1c5f9171f35fa5b71ada42d1a9e11189428d 25-Mar-2015 David Brazdil <dbrazdil@google.com> ART: Formatting and comments in BooleanSimplifier

Change-Id: I9a5aa3f2aa8b0a29d7b0f1e5e247397cf8e9e379
46e2a3915aa68c77426b71e95b9f3658250646b7 16-Mar-2015 David Brazdil <dbrazdil@google.com> ART: Boolean simplifier

The optimization recognizes the negation pattern generated by 'javac'
and replaces it with a single condition. To this end, boolean values
are now consistently assumed to be represented by an integer.

This is a first optimization which deletes blocks from the HGraph and
does so by replacing the corresponding entries with null. Hence,
existing code can continue indexing the list of blocks with the block
ID, but must check for null when iterating over the list.

Change-Id: I7779da69cfa925c6521938ad0bcc11bc52335583
0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430 24-Mar-2015 Andreas Gampe <agampe@google.com> Revert "Deoptimization-based bce."

This breaks compiling the core image:

Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1.

This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129.

Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
e295e6ec5beaea31be5d7d3c996cd8cfa2053129 07-Mar-2015 Mingyao Yang <mingyao@google.com> Deoptimization-based bce.

A mechanism is introduced that a runtime method can be called
from code compiled with optimizing compiler to deoptimize into
interpreter. This can be used to establish invariants in the managed code
If the invariant does not hold at runtime, we will deoptimize and continue
execution in the interpreter. This allows to optimize the managed code as
if the invariant was proven during compile time. However, the exception
will be thrown according to the semantics demanded by the spec.

The invariant and optimization included in this patch are based on the
length of an array. Given a set of array accesses with constant indices
{c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and
max(ci) < array-length. The first can be proven statically. The second can be
established with a deoptimization-based invariant. This replaces n bounds
checks with one invariant check (plus slow-path code).

Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
4f1a384762bf49fe8f3ecae8dd2bcb0e19d044a9 12-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Give an expected type to phis created for multiple returns.

When inlining, we used to take the type of the inlined method
for the phi in case of multiple returns. I recently changed the
logic of phi types to only be of int/float/double/ref, so we
need to call ToPhiType when creating the phi.

Change-Id: I960067ca8a8814509c2a7c52c08387d892ebf4a3
b2fd7bca70b580921eebf7c45769c39d2dfd8a5a 11-Mar-2015 Alexandre Rames <alexandre.rames@arm.com> Opt compiler: Basic simplification for arithmetic operations.

The optimisations in this patch do not look further than the
inputs of each operation.

Change-Id: Iddd0ab6b360b9e7bb042db22086d51a31be85530
817bce7658918b7a70c17b70aa5e6a46b1ae8b3d 24-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Fix inlining in the presence of multiple returns.

One return could actually return a phi, so doing a phi check for
knowing if the result phi was already created was bogus.

Bug: 19454010

Change-Id: Iee703a2d1071ae263092354465eda368e5d6770d
1abb4191a2e56d8dbf518efcaeefb266c1acdf2b 17-Feb-2015 David Brazdil <dbrazdil@google.com> Optimizing: Speed up HInstruction use removal

Similarly to a previous commit on HEnvironment use removal, this patch
adds links from instructions to their respective inputs' use lists for
contant-time removal at the cost of doubling the size of input lists
(from one pointer per entry to two). Manual testing shows that this
significantly reduces the time required to transform HGraph to SSA
form for some huge methods.

Change-Id: I8dc3e4b0c48a50ac1481eb55c31093b99f4dc29f
acf735c13998ad2a175f5a17e7bfce220073279d 12-Feb-2015 Calin Juravle <calin@google.com> Reference type propagation

- propagate reference types between instructions
- remove checked casts when possible
- add StackHandleScopeCollection to manage an arbitrary number of stack
handles (see comments)

Change-Id: I31200067c5e7375a5ea8e2f873c4374ebdb5ee60
d6138ef1ea13d07ae555542f8898b30d89e9ac9a 18-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Ensure the graph is correctly typed.

We used to be forgiving because of HIntConstant(0) also being
used for null. We now create a special HNullConstant for such uses.

Also, we need to run the dead phi elimination twice during ssa
building to ensure the correctness.

Change-Id: If479efa3680d3358800aebb1cca692fa2d94f6e5
be31ff94d66a0037c445eb57dc82f2a51bb46d9e 04-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Fix a bug in the inliner.

Code did not work in the presence of multiple returns.
Spotted by Mark P. Mendell.

Change-Id: I237050a0d79c0cfaa479e9b886f7450879e84713
276d9daaedfbff716339f94d55e6eff98b7434c6 02-Feb-2015 Nicolas Geoffray <ngeoffray@google.com> Inline methods with multiple blocks.

Change-Id: I3431af60e97fae230e0b6e98bcf0acc0ee9abf8c
82091dad38f3e5bfaf3b6984c9ab73069fb68310 26-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Implement LICM in optimizing compiler.

Change-Id: I9c8afb0a58ef45e568576015473cbfd5f011c242
ed59619b370ef23ffbb25d1d01f615e60a9262b6 23-Jan-2015 David Brazdil <dbrazdil@google.com> Optimizing: Speed up HEnvironment use removal

Removal of use records from HEnvironment vregs involved iterating over
potentially large linked lists which made compilation of huge methods
very slow. This patch turns use lists into doubly-linked lists, stores
pointers to the relevant nodes inside HEnvironment and subsequently
turns the removals into constant-time operations.

Change-Id: I0e1d4d782fd624e7b8075af75d4adf0a0634a1ee
0ada95d8de4b04b5f201b4b7e9c3c2fd2cc321ae 04-Dec-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Replace NULL to nullptr in the optimizing compiler

Replace macro NULL to the nullptr variation for C++.

Change-Id: Ib6e48dd4bb3c254343383011b67372622578ca76
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
6c2dff8ff8e1440fa4d9e1b2ba2a44d036882801 21-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Fully support pairs in the register allocator.""

This reverts commit c399fdc442db82dfda66e6c25518872ab0f1d24f.

Change-Id: I19f8215c4b98f2f0827e04bf7806c3ca439794e5
77520bca97ec44e3758510cebd0f20e3bb4584ea 12-Jan-2015 Calin Juravle <calin@google.com> Record implicit null checks at the actual invoke time.

ImplicitNullChecks are recorded only for instructions directly (see NB
below) preceeded by NullChecks in the graph. This way we avoid recording
redundant safepoints and minimize the code size increase.

NB: ParallalelMoves might be inserted by the register allocator between
the NullChecks and their uses. These modify the environment and the
correct action would be to reverse their modification. This will be
addressed in a follow-up CL.

Change-Id: Ie50006e5a4bd22932dcf11348f5a655d253cd898
c399fdc442db82dfda66e6c25518872ab0f1d24f 21-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Fully support pairs in the register allocator."

Libcore tests fail.

This reverts commit 41aedbb684ccef76ff8373f39aba606ce4cb3194.

Change-Id: I2572f120d4bbaeb7a4d4cbfd47ab00c9ea39ac6c
41aedbb684ccef76ff8373f39aba606ce4cb3194 14-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Fully support pairs in the register allocator.

Enabled on ARM for longs and doubles.

Change-Id: Id8792d08bd7ca9fb049c5db8a40ae694bafc2d8b
9ee66183d8e046ea661f642ba884626f16b46e06 16-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Constant fold after inlining.

- Inlining opens up new opportunities for constant folding.
- Fix a bug in constant folder where the result type was not
correctly set for the folding of a HCompare.
- Improve graph checker's coverage.

Change-Id: I0943bf8ff65505c4addc4a555a526b55e00b5268
7c5367badfe61b96c5836d495d286cee64861579 17-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Fix ids and remove invoke when inlining.

Bugs found by Razvan Lupusoru.

Change-Id: I3a5a9af280d8700d18f52abb4a2cff0e3a9aac74
53d9da8507a1b68f036ce8669ad3f2ae9fc3d225 04-Dec-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Create a RemoveBlock method

The RemoveDeadBlocks should be separated into a utility function to remove
a single block so that it can be used as a future utility method.

Change-Id: I4c67113fff24e92a66a81bc0e8edf9fbdda08cdf
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
e53798a7e3267305f696bf658e418c92e63e0834 01-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Inlining support in optimizing.

Currently only inlines simple things that don't require an
environment, such as:
- Returning a constant.
- Returning a parameter.
- Returning an arithmetic operation.

Change-Id: Ie844950cb44f69e104774a3cf7a8dea66bc85661
fc600dccd7797a9a10cdd457034ea8e148ccd631 02-Dec-2014 Roland Levillain <rpl@google.com> Fix a compiler bug related to a catch-less try-finally statement.

Ensure a dead basic block produced in this case is properly
removed.

Change-Id: I7c88e26aaa6c6378892f7c7c299494fa42312db2
f537012ceb6cba8a78b36a5065beb9588451a250 02-Dec-2014 Nicolas Geoffray <ngeoffray@google.com> Treat SSA transformation special, as we may have to bailout.

We forgot to bailout when we found a non-natural loop (on which
our optimizations don't work).

Change-Id: I11976b5af4c98f4f29267a74c74d34b5ad81e20c
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
b762d2ebf9dc604561d9915c96b377235c94960c 22-Oct-2014 Roland Levillain <rpl@google.com> Various fixes related to integer negate operations.

- Emit an RSB instruction for HNeg nodes in the ARM code
generator instead of RSBS, as we do not need to update the
condition code flags in this case.
- Simply punt when trying to statically evaluate a long
unary operation, instead of aborting.
- Move a test case to the right place.

Change-Id: I35eb8dea58ed35258d4d8df77181159c3ab07b6f
102cbed1e52b7c5f09458b44903fe97bb3e14d5f 15-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Implement register allocator for floating point registers.

Also:
- Fix misuses of emitting the rex prefix in the x86_64 assembler.
- Fix movaps code generation in the x86_64 assembler.

Change-Id: Ib6dcf6e7c4a9c43368cfc46b02ba50f69ae69cbe
9240d6a2baa9ed1e18ee08744b461fe49a1ee269 20-Oct-2014 Roland Levillain <rpl@google.com> Constant folding on unary operations in the optimizing compiler.

Change-Id: I4b77afa2a89f5ad2eedd4d6c0c6c382585419349
6c82d40eb142771086f5531998de2273ba5cc08c 13-Oct-2014 Roland Levillain <rpl@google.com> Have HInstruction::StrictlyDominates compute strict dominance.

Change-Id: I3a4fa133268615fb4ce54a0bcb43e0c2458cc865
633021e6ff6b9a57a374a994e74cfd69275ce100 01-Oct-2014 Roland Levillain <rpl@google.com> Implement default traversals in CFG & SSA graph checkers.

- Check CFG graphs using an insertion order traversal.
- Check SSA form graphs using a reverse post-order traversal.

Change-Id: Ib9062599bdbf3c17b9f213b743274b2d71a9fa90
476df557fed5f0b3f32f8d11a654674bb403a8f8 09-Oct-2014 Roland Levillain <rpl@google.com> Use Is*() helpers to shorten code in the optimizing compiler.

Change-Id: I79f31833bc9a0aa2918381aa3fb0b05d45f75689
360231a056e796c36ffe62348507e904dc9efb9b 08-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Fix code generation of materialized conditions.

Move the logic for knowing if a condition needs to be materialized
in an optimization pass (so that the information does not change
as a side effect of another optimization).

Also clean-up arm and x86_64 codegen:
- arm: ldr and str are for power-users when a constant is
in play. We should use LoadFromOffset and StoreToOffset.
- x86_64: fix misuses of movq instead of movl.

Change-Id: I01a03b91803624be2281a344a13ad5efbf4f3ef3
191c4b1372aef7c0272f8fa3985b55513029e728 07-Oct-2014 Nicolas Geoffray <ngeoffray@google.com> Inserting a node must also update its inputs users.

Change-Id: I55357564b81efcc0cf52fffdf23289696fe27dd1
3c04974a90b0e03f4b509010bff49f0b2a3da57f 24-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Optimize suspend checks in optimizing compiler.

- Remove the ones added during graph build (they were added
for the baseline code generator).
- Emit them at loop back edges after phi moves, so that the test
can directly jump to the loop header.
- Fix x86 and x86_64 suspend check by using cmpw instead of cmpl.

Change-Id: I6fad5795a55705d86c9e1cb85bf5d63dadfafa2a
6b46923ff0197c95f1e7ea0bc730961df6725cc9 25-Sep-2014 Roland Levillain <rpl@google.com> Optimizing compiler: check inputs & uses definitions in CFG.

Ensure each input and each use of an instruction is defined
in a block of the control-flow graph.

Change-Id: If4a83b02825230329b0b4fd84255dcb7c3219684
18efde5017369e005f1e8bcd3bbfb04e85053640 22-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Fix code generation with materialized conditions.

Change-Id: I8630af3c13fc1950d3fa718d7488407b00898796
724c96326dea6ec33287a0076279c136abb0208a 22-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Also remove environment links to removed instructions.

Change-Id: I505163fb8683269c7d3fe21b34df92337d244552
d31cf3d55a0847c018c4eaa2b349b8eea509de64 08-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> First optimization in new compiler: simple GVN.

Change-Id: Ibe0efa4e84fd020a53ded310a92e0b4363f91b12
556c3d193134f6461f3e1fe17c032b087c5931a0 18-Sep-2014 Roland Levillain <rpl@google.com> Initiate a constant propagation pass in the optimizing compiler.

- Perform constant folding on int and long additions and subtractions in
the optimizing compiler.
- Apply constant folding to conditions and comparisons.

Change-Id: Ic88783a3c975fda777c74c531e257fa777be42eb
ccc07a9579c554443cd03a306ca9b4f943fd2a93 16-Sep-2014 Roland Levillain <rpl@google.com> Add CFG and SSA form checkers in the optimizing compiler.

Checks performed on control-flow graphs:
- Ensure that the predecessors and successors of a basic block are
consistent within a control-flow graph.
- Ensure basic blocks end with a branch instruction.
- Detect phi functions listed in non-phi instruction lists and vice
versa.
- Ensure a block's instructions (and phi functions) are associated
with this very block.

Checks performed on SSA form graphs:
- Ensure an instruction dominates all its uses.
- Ensure there are no critical edges.

Change-Id: I1c12b4a61ecf608682152c897980ababa7eca847
604c6e4764edb2fd244e9f47626868cda5644a7a 17-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Ensure the first predecessor of a loop is the pre header.

Note that the check in ssa_phi_elimination.cc was very defensive:
it does not affect the outcome of the algorithm whether the
loop phi takes itself as the first input.

It makes things consistent to always have the pre header as first
input.

Change-Id: Ic86248c1f38af67f7432782f6deefae1f4bf1ab6
065bf77b43c39da315b974ea08a5ed25e9049681 03-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Add (simple) side effects flags and equality methods on nodes.

This is in preparation of doing GVN and LICM.

Change-Id: I43050ff846755f9387a62b893d548ecdb54e7e95
3946844c34ad965515f677084b07d663d70ad1b8 02-Sep-2014 Nicolas Geoffray <ngeoffray@google.com> Runtime support for the new stack maps for the opt compiler.

Now most of the methods supported by the compiler can be optimized,
instead of using the baseline.

Change-Id: I80ab36a34913fa4e7dd576c7bf55af63594dc1fa
20dfc797dc631bf8d655dcf123f46f13332d3074 17-Jun-2014 Dave Allison <dallison@google.com> Add some more instruction support to optimizing compiler.

This adds a few more DEX instructions to the optimizing compiler's
builder (constants, moves, if_xx, etc).

Also:
* Changes the codegen for IF_XX instructions to use a condition
rather than comparing a value against 0.
* Fixes some instructions in the ARM disassembler.
* Fixes PushList and PopList in the thumb2 assembler.
* Switches the assembler for the optimizing compiler to thumb2
rather than ARM.

Change-Id: Iaafcd02243ccc5b03a054ef7a15285b84c06740f
ec7e4727e99aa1416398ac5a684f5024817a25c7 06-Jun-2014 Nicolas Geoffray <ngeoffray@google.com> Fix some bugs in graph construction/simplification methods.

Also fix a brano during SSA construction. The code should
not have been commented out. Added a test to cover what the code
intends.

Change-Id: Ia00ae79dcf75eb0d412f07649d73e7f94dbfb6f0
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
a7062e05e6048c7f817d784a5b94e3122e25b1ec 22-May-2014 Nicolas Geoffray <ngeoffray@google.com> Add a linear scan register allocator to the optimizing compiler.

This is a "by-the-book" implementation. It currently only deals
with allocating registers, with no hint optimizations.

The changes remaining to make it functional are:
- Allocate spill slots.
- Resolution and placements of Move instructions.
- Connect it to the code generator.

Change-Id: Ie0b2f6ba1b98da85425be721ce4afecd6b4012a4
4e3d23aa1523718ea1fdf3a32516d2f9d81e84fe 22-May-2014 Nicolas Geoffray <ngeoffray@google.com> Import Dart's parallel move resolver.

And write a few tests while at it.

A parallel move resolver will be needed for performing multiple moves
that are conceptually parallel, for example moves at a block
exit that branches to a block with phi nodes.

Change-Id: Ib95b247b4fc3f2c2fcab3b8c8d032abbd6104cd7
f635e63318447ca04731b265a86a573c9ed1737c 14-May-2014 Nicolas Geoffray <ngeoffray@google.com> Add a compilation tracing mechanism to the new compiler.

Code mostly imported from: https://android-review.googlesource.com/#/c/81653/.

Change-Id: I150fe942be0fb270e03fabb19032180f7a065d13
622d9c31febd950255b36a48b47e1f630197c5fe 12-May-2014 Nicolas Geoffray <ngeoffray@google.com> Add loop recognition and CFG simplifications in new compiler.

We do three simplifications:
- Split critical edges, for code generation from SSA (new).
- Ensure one back edge per loop, to simplify loop recognition (new).
- Ensure only one pre header for a loop, to simplify SSA creation (existing).

Change-Id: I9bfccd4b236a00486a261078627b091c8a68be33
804d09372cc3d80d537da1489da4a45e0e19aa5d 02-May-2014 Nicolas Geoffray <ngeoffray@google.com> Build live-in, live-out and kill sets for each block.

This information will be used when computing live ranges of
instructions.

Change-Id: I345ee833c1ccb4a8e725c7976453f6d58d350d74
c32e770f21540e4e9eda6dc7f770e745d33f1b9f 24-Apr-2014 Nicolas Geoffray <ngeoffray@google.com> Add a Transform to SSA phase to the optimizing compiler.

Change-Id: Ia9700756a0396d797a00b529896487d52c989329
43c86422e210a3883729ab90997711e79f26bccc 18-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Fix lint error, and Makefile that could be confused with local files.

Change-Id: I780cc0d6593eadd6f82e1126d7ad445894af666c
787c3076635cf117eb646c5a89a9014b2072fb44 17-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Plug new optimizing compiler in compilation pipeline.

Also rename accessors to ART's conventions.

Change-Id: I344807055b98aa4b27215704ec362191464acecc
bab4ed7057799a4fadc6283108ab56f389d117d4 11-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> More code generation for the optimizing compiler.

- Add HReturn instruction
- Generate code for locals/if/return
- Setup infrastructure for register allocation. Currently
emulate a stack.

Change-Id: Ib28c2dba80f6c526177ed9a7b09c0689ac8122fb
3ff386aafefd5282bb76c8a50506a70a4321e698 04-Mar-2014 Nicolas Geoffray <ngeoffray@google.com> Add register support to the optimizing compiler.

Also make if take an input and build the use list for instructions.

Change-Id: I1938cee7dce5bd4c66b259fa2b431d2c79b3cf82
d4dd255db1d110ceb5551f6d95ff31fb57420994 28-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Add codegen support to the optimizing compiler.

Change-Id: I9aae76908ff1d6e64fb71a6718fc1426b67a5c28
be9a92aa804c0d210f80966b74ef8ed3987f335a 25-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Add conditional branches, and build dominator tree.

Change-Id: I4b151a07b72692961235a1419b54b6b45cf54e63
818f2107e6d2d9e80faac8ae8c92faffa83cbd11 18-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Re-apply: Initial check-in of an optimizing compiler.

The classes and the names are very much inspired by V8/Dart.
It currently only supports the RETURN_VOID dex instruction,
and there is a pretty printer to check if the building of the
graph is correct.

Change-Id: I28e125dfee86ae6ec9b3fec6aa1859523b92a893
1af0c0b88a956813eb0ad282664cedc391e2938f 19-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Initial check-in of an optimizing compiler."

g++ warnings turned into errors.

This reverts commit 68a5fefa90f03fdf5a238ac85c9439c6b03eae96.

Change-Id: I09bb95d9cc13764ca8a266c41af04801a34b9fd0
68a5fefa90f03fdf5a238ac85c9439c6b03eae96 18-Feb-2014 Nicolas Geoffray <ngeoffray@google.com> Initial check-in of an optimizing compiler.

The classes and the names are very much inspired by V8/Dart.
It currently only supports the RETURN_VOID dex instruction,
and there is a pretty printer to check if the building of the
graph is correct.

Change-Id: Id5ef1b317ab997010d4e3888e456c26bef1ab9c0