History log of /art/compiler/dex/mir_dataflow.cc
Revision Date Author Comments
2ab40eb3c23559205ac7b9b039bd749458e8a761 02-Jun-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Add Invokes to DecodedInstruction

Add a method Invokes to test for the kInvoke flag.
Also moved IsPseudoMirOp to DecodedInstruction to use it for the various
querry methods.

Change-Id: I59a2056b7b802b8393fa2b0d977304d252b38c89
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
f2466a766947f952c5c41186a6c5e26fb862bb37 10-Jul-2014 Udayan Banerji <udayan.banerji@intel.com> ART: Handle Extended MIRs in a uniform manner

The special handling is needed since some extended MIRs can hold values in
args array, and we might want to handle the dataflow for those in a
specialized manner. Current dataflow attributes may not be able to describe
it for the extended MIRs.

Change-Id: I8b64f3142a4304282bb31f1d4686eba72284d97d
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43 09-Jul-2014 Udayan Banerji <udayan.banerji@intel.com> X86 Backend support for vectorized float and byte 16x16 operations

Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.

Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.

Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
35ba7f3a78d38885ec54e61ed060d2771eeceea7 31-May-2014 buzbee <buzbee@google.com> Quick compiler: fix array overrun.

MIRGraph::InlineCalls() was using the MIR opcode to recover
Dalvik instruction flags - something that is only valid for
Dalvik opcodes and not the set of extended MIR opcodes.

This is probably the 3rd or 4th time we've had a bug using
the MIR opcode in situations that are only valid for the Dalvik
opcode subset. I took the opportunity to scan the code for
other cases of this (didn't find any), and did some cleanup while
I was in the neighborhood.

We should probably rework the DalvikOpcode/MirOpcode model whenver we
get around to removing DalvikInstruction from MIR.

Internal bug b/15352667: out-of-bound access in mir_optimization.cc

Change-Id: I75f06780468880892151e3cdd313e14bfbbaa489
2469e60e6ff08c2a0b4cd1e209246c5d91027679 07-May-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Setting up cleanup

- Moved code around to actually have the clean-up code in a PassDriver format.
This allows us to better control what is being called after an optimization
It also allows the use of a centralized pass system for both optimizations
and cleanup.

Change-Id: I9d21e9bb9ee663739722f440d82adf04f73e380c
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
4896d7b6fb75add25f2d6ba84346ac83d8ba9d51 02-May-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Better SSA Allocation when recreating SSA

The SSA calculation is not allocation friendly. This makes the
SSARepresentation remember how much is allocated and not reallocate
if SSA should be recalculated.

Also added some allocation friendly code for the dominance code.

Change-Id: Icd5586b7e2fefae8e1535975ab400b1ca95b500f
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
d4750f202170a448119c1813a964574bfea0dded 24-May-2014 Bill Buzbee <buzbee@android.com> Revert "ART: Better SSA Allocation when recreating SSA"

Temporarily reverting until memory footprint cost of adding a vreg to ssa entrance map on every applicable MIR node can be assessed..

This reverts commit cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e.

Change-Id: Ia9c03bfc5d365ad8d8b949e870f1e3bcda7f9a54
0add77a86599260aba3ea4b56e9db3da6bb881a8 06-May-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Ensure use counts updated when adding SSA reg

Ensure that matching data structures are updated when adding SSA
registers late in the compile.

Change-Id: I8e664dddf52c1a9095ba5b7a8df84e5a733bbc43
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
29a2648821ea4d0b5d3aecb9f835822fdfe6faa1 03-May-2014 Ian Rogers <irogers@google.com> Move DecodedInstruction into MIR.

Change-Id: I188dc7fef4f4033361c78daf2015b869242191c6
cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e 02-May-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Better SSA Allocation when recreating SSA

The SSA calculation is not allocation friendly. This makes the
SSARepresentation remember how much is allocated and not reallocate
if SSA should be recalculated.

Also added some allocation friendly code for the dominance code.

Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Change-Id: I6418b402434bd850b45771c75b7631b7b84a8f66
cc794c3dc5b45601da23fb0d7bc16f9b4ef04065 02-May-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Move oat_data_flow_attributes_ to private and put an API

The oat_data_flow_attributes had no checking mechanism to ensure bound
correctness.

This fix handles this and also offers two functions to retrieve the
attributes: using the MIR and DecodedInstruction.

Change-Id: Ib4f1f749efb923a803d364a4eea83a174527a644
Signed-Off-By: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
9820b7c1dc70e75ad405b9e6e63578fa9fe94e94 02-Jan-2014 Vladimir Marko <vmarko@google.com> Early inlining of simple methods.

Inlining "special" methods: empty methods, methods returning
constants or their arguments, simple getters and setters.

Bug: 8164439
Change-Id: I8c7fa9c14351fbb2470000b378a22974daaef236
3d73ba2ce682edfaf41f29521bd6039c6499a1c5 06-Mar-2014 Vladimir Marko <vmarko@google.com> Avoid Cache*LoweringInfo pass when there's no GET/PUT/INVOKE.

Add new data flow flags indicating instance/static field
access. Record merged flags of all insns and use them to skip
the CacheFieldLoweringInfo pass if the method uses no fields
and the CacheMethodLoweringInfo pass if it has no invokes.

Change-Id: I36a36b438ca9b0f104a7baddc0497d736495cc3c
83cc7ae96d4176533dd0391a1591d321b0a87f4f 12-Feb-2014 Vladimir Marko <vmarko@google.com> Create a scoped arena allocator and use that for LVN.

This saves more than 0.5s of boot.oat compilation time
on Nexus 5.

TODO: Move other stuff to the scoped allocator. This CL
alone increases the peak memory allocation. By reusing
the memory for other parts of the compilation we should
reduce this overhead.

Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
da7a69b3fa7bb22d087567364b7eb5a75824efd8 09-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Enable compiler temporaries

Compiler temporaries are a facility for having virtual register sized space
for dealing with intermediate values during MIR transformations. They receive
explicit space in managed frames so they can have a home location in case they
need to be spilled. The facility also supports "special" temporaries which
have specific semantic purpose and their location in frame must be tracked.

The compiler temporaries are treated in the same way as virtual registers
so that the MIR level transformations do not need to have special logic. However,
generated code needs to know stack layout so that it can distinguish between
home locations.

MIRGraph has received an interface for dealing with compiler temporaries. This
interface allows allocation of wide and non-wide virtual register temporaries.

The information about how temporaries are kept on stack has been moved to
stack.h. This is was necessary because stack layout is dependent on where the
temporaries are placed.

Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4e97c539408f47145526f0062c1c06df99146a73 07-Jan-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> Added pass framework

The patch adds a Middle-End pass system and normalizes the current
passes into the pass framework.

Passes have:
- A start, work, and end functions.
- A gate to determine to apply the pass.
- Can provide a CFG dump folder.

mir_dataflow.cc, mir_graph.cc, mir_optimization.cc, ssa_transformation.cc:
- Changed due to moving code into bb_optimizations.cc.
- Moved certain functions from private to public due to needed from the passes.

pass.cc, pass.h:
- Pass base class

pass_driver.cc, pass_driver.h:
- The pass driver implementation.

frontend.cc:
- Replace the function calls to the passes with the pass driver.

Change-Id: I88cd82efbf6499df9e6c7f135d7e294dd724a079
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
1da1e2fceb0030b4b76b43510b1710a9613e0c2e 15-Nov-2013 buzbee <buzbee@google.com> More compile-time tuning

Another round of compile-time tuning, this time yeilding in the
vicinity of 3% total reduction in compile time (which means about
double that for the Quick Compile portion).

Primary improvements are skipping the basic block combine optimization
pass when using Quick (because we already have big blocks), combining
the null check elimination and type inference passes, and limiting
expensive local value number analysis to only those blocks which
might benefit from it.

Following this CL, the actual compile phase consumes roughly 60%
of the total dex2oat time on the host, and 55% on the target (Note,
I'm subtracting out the Deduping time here, which the timing logger
normally counts against the compiler).

A sample breakdown of the compilation time follows (this taken on
PlusOne.apk w/ a Nexus 4):

39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling)
10.25% -> MIROpt:SSATransform: 361.31
8.45% -> BuildMIRGraph: 297.80
7.55% -> Assemble: 266.16
6.87% -> MIROpt:NCE_TypeInference: 242.22
5.56% -> Dedupe: 196.15
3.45% -> MIROpt:BBOpt: 121.53
3.20% -> RegisterAllocation: 112.69
3.00% -> PcMappingTable: 105.65
2.90% -> GcMap: 102.22
2.68% -> Launchpads: 94.50
1.16% -> MIROpt:InitRegLoc: 40.94
1.16% -> Cleanup: 40.93
1.10% -> MIROpt:CodeLayout: 38.80
0.97% -> MIROpt:ConstantProp: 34.35
0.96% -> MIROpt:UseCount: 33.75
0.86% -> MIROpt:CheckFilters: 30.28
0.44% -> SpecialMIR2LIR: 15.53
0.44% -> MIROpt:BBCombine: 15.41

(cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3)

Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
17189ac098b2f156713db1821b49db7b2f018bbe 08-Nov-2013 buzbee <buzbee@google.com> Quick compiler compile-time/memory use improvement

This CL delivers a surprisingly large reduction in compile time,
as well as a significant reduction in memory usage by conditionally
removing a CFG construction feature introduced to support LLVM
bitcode generation.

In short, bitcode requires all potential exception edges to be
explicitly present in the CFG. The Quick compiler (based on the
old JIT), can ignore, at least for the purposes of dataflow
analysis, potential throw points that do not have a corresponding
catch block.

To support LLVM, we create a target basic block for every
potentially throwing instruction to give us a destination for
the exception edge. Then, following the check elimination pass,
we remove blocks whose edges have gone away.

However, if we're not using LLVM, we can skip the creation of
those things in the first place. The savings are significant.

Single-threaded compilation time on the host looks to be reduced
by something in the vicinity of 10%. We create roughly 60% fewer
basic blocks (and, importantly, the creation of fewer basic
block nodes has a multiplying effect on memory use reduction
because it results in fewer dataflow bitmaps).

Some basic block redution stats:

boot: 2325802 before, 844846 after.
Phonesky: 485472 before, 156014 after.
PlusOne: 806232 before, 243156 after.
Thinkfree: 864498 before, 264858 after.

Another nice side effect of this change is giving the basic
block optimization pass generally larger scope.

For arena memusage in the boot class path (compiled on the host):

Average Max
Before: 50,863 88,017,820
After: 41,964 4,914,208

The huge reduction in max arena memory usage is due to the
collapsing of a large initialization method. Specifically, with complete
exception edges org.ccil.cowan.tagsoup.Scheme requires 13,802
basic blocks. With exception edges collapsed, it requires 4.

This change also caused 2 latent bugs to surface.

1) The dex parsing code did not expect that the target of a switch statement
could reside in the middle of the same basic block ended by that same switch
statement.

2) The x86 backend introduced a 5-operand LIR instruction for indexed memops.
However, there was no corresponding change to the use/def mask creation code.
Thus, the 5th operand was never included in the use/def mask. This allowed
the instruction scheduling code to incorrectly move a use above a definition.
We didn't see this before because the affected x86 instructions were only used
for aget/aput, and prior to this CL those Dalvik opcodes caused a basic
block break because of the implied exception edge - which prevented the code
motion.

And finally, also included is some minor tuning of register use weighting.

Change-Id: I3f2cab7136dba2bded71e9e33b452b95e8fffc0e
0b1191cfece83f6f8d4101575a06555a2d13387a 28-Oct-2013 Bill Buzbee <buzbee@google.com> Revert "Revert "Null check elimination improvement""

This reverts commit 31aa97cfec5ee76b2f2496464e1b6f9e11d21a29.

..and thereby brings back change 380165, which was reverted
because it was buggy.

Three problems with the original CL:

1. The author ran the pre-submit tests, but used -j24 and
failed to search the output for fail messages.
2. The new null check analysis pass uses an interative
approach to identify whether a null check is needed. It
is possible that the null-check-required state may
oscillate, and a logic error caused it to stick in the
"no check needed" state.
3. Our old nemesis Dalvik untyped constants, in which 0 values
can be used both as object reference and non-object references.
This CL conservatively treats all CONST definitions as
potential object definitions for the purposes of null
check elimination.

Change-Id: I3c1744e44318276e42989502a314585e56ac57a0
31aa97cfec5ee76b2f2496464e1b6f9e11d21a29 26-Oct-2013 Ian Rogers <irogers@google.com> Revert "Null check elimination improvement"

This reverts commit 4db179d1821a9e78819d5adc8057a72f49e2aed8.

Change-Id: I059c15c85860c6c9f235b5dabaaef2edebaf1de2
4db179d1821a9e78819d5adc8057a72f49e2aed8 23-Oct-2013 buzbee <buzbee@google.com> Null check elimination improvement

See b/10862777

Improves the null check elimination pass by tracking visibility
of object definitions, rather than successful uses of object
dereferences. For boot class path, increases static null
check elimination success rate from 98.4% to 98.6%. Reduces
size of boot.oat by ~300K bytes.

Fixes loop nesting depth computation, which is used by register
promotion, and tweaked the heuristics.

Fixes a bug in verbose listing output in which a basic block
id is directly dereferenced, rather than first being converted
to a pointer.

Change-Id: Id01c20b533cdb12ea8fc4be576438407d0a34cec
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
56c717860df2d71d66fb77aa77f29dd346e559d3 06-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning

Specialized the dataflow iterators and did a few other minor tweaks.
Showing ~5% compile-time improvement in a single-threaded environment;
less in multi-threaded (presumably because we're blocked by something
else).

Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
65ec92cf13c9d11c83711443a02e4249163d47f1 06-Sep-2013 Ian Rogers <irogers@google.com> Refactor CompilerDriver::ComputeInvokeInfo

Don't use non-const reference arguments.
Move ins before outs.

Change-Id: I4a7b8099abe91ea60f93a56077f4989303fa4876
1e54d68ce8e77dfe63340275d11a072c5184c89a 06-Sep-2013 Sebastien Hertz <shertz@google.com> Disable devirtualization detection in DEX-to-DEX compiler.

This CL allows the DEX-to-DEX compiler to disable devirtualization detection.
This allows to quicken invoke-virtual/range instructions that used to be
eligible for devirtualization.

Bug: 10632943
Change-Id: I6c9f4d3249cf42b47f004be5825b3186fa83501e
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db 25-Aug-2013 Mathieu Chartier <mathieuc@google.com> New arena memory allocator.

Before we were creating arenas for each method. The issue with doing this
is that we needed to memset each memory allocation. This can be improved
if you start out with arenas that contain all zeroed memory and recycle
them for each method. When you give memory back to the arena pool you do
a single memset to zero out all of the memory that you used.

Always inlined the fast path of the allocation code.

Removed the "zero" parameter since the new arena allocator always returns
zeroed memory.

Host dex2oat time on target oat apks (2 samples each).
Before:
real 1m11.958s
user 4m34.020s
sys 1m28.570s

After:
real 1m9.690s
user 4m17.670s
sys 1m23.960s

Target device dex2oat samples (Mako, Thinkfree.apk):
Without new arena allocator:
0m26.47s real 0m54.60s user 0m25.85s system
0m25.91s real 0m54.39s user 0m26.69s system
0m26.61s real 0m53.77s user 0m27.35s system
0m26.33s real 0m54.90s user 0m25.30s system
0m26.34s real 0m53.94s user 0m27.23s system

With new arena allocator:
0m25.02s real 0m54.46s user 0m19.94s system
0m25.17s real 0m55.06s user 0m20.72s system
0m24.85s real 0m55.14s user 0m19.30s system
0m24.59s real 0m54.02s user 0m20.07s system
0m25.06s real 0m55.00s user 0m20.42s system

Correctness of Thinkfree.apk.oat verified by diffing both of the oat files.

Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
38f85e4892f6504971bde994fec81fd61780ac30 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/operators issues

Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
df62950e7a32031b82360c407d46a37b94188fbb 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/parens issues

Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
2ce745c06271d5223d57dbf08117b20d5b60694a 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/braces issues

Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81