Cross Reference: /art/compiler/dex/mir

History log of /art/compiler/dex/mir_dataflow.cc
Revision	Date	Author	Comments
2ab40eb3c23559205ac7b9b039bd749458e8a761	02-Jun-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Add Invokes to DecodedInstruction Add a method Invokes to test for the kInvoke flag. Also moved IsPseudoMirOp to DecodedInstruction to use it for the various querry methods. Change-Id: I59a2056b7b802b8393fa2b0d977304d252b38c89 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
f2466a766947f952c5c41186a6c5e26fb862bb37	10-Jul-2014	Udayan Banerji <udayan.banerji@intel.com>	ART: Handle Extended MIRs in a uniform manner The special handling is needed since some extended MIRs can hold values in args array, and we might want to handle the dataflow for those in a specialized manner. Current dataflow attributes may not be able to describe it for the extended MIRs. Change-Id: I8b64f3142a4304282bb31f1d4686eba72284d97d Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43	09-Jul-2014	Udayan Banerji <udayan.banerji@intel.com>	X86 Backend support for vectorized float and byte 16x16 operations Add support for reserving vector registers for the duration of vector loop. Add support for 16x16 multiplication, shifts, and add reduce. Changed the vectorization implementation to be able to use the dataflow elements for SSA recreation and fixed a few implementation details. Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Olivier Come <olivier.come@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
ffddfdf6fec0b9d98a692e27242eecb15af5ead2	03-Jun-2014	Tim Murray <timmurray@google.com>	DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
35ba7f3a78d38885ec54e61ed060d2771eeceea7	31-May-2014	buzbee <buzbee@google.com>	Quick compiler: fix array overrun. MIRGraph::InlineCalls() was using the MIR opcode to recover Dalvik instruction flags - something that is only valid for Dalvik opcodes and not the set of extended MIR opcodes. This is probably the 3rd or 4th time we've had a bug using the MIR opcode in situations that are only valid for the Dalvik opcode subset. I took the opportunity to scan the code for other cases of this (didn't find any), and did some cleanup while I was in the neighborhood. We should probably rework the DalvikOpcode/MirOpcode model whenver we get around to removing DalvikInstruction from MIR. Internal bug b/15352667: out-of-bound access in mir_optimization.cc Change-Id: I75f06780468880892151e3cdd313e14bfbbaa489
2469e60e6ff08c2a0b4cd1e209246c5d91027679	07-May-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Setting up cleanup - Moved code around to actually have the clean-up code in a PassDriver format. This allows us to better control what is being called after an optimization It also allows the use of a centralized pass system for both optimizations and cleanup. Change-Id: I9d21e9bb9ee663739722f440d82adf04f73e380c Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Yixin Shou <yixin.shou@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
4896d7b6fb75add25f2d6ba84346ac83d8ba9d51	02-May-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Better SSA Allocation when recreating SSA The SSA calculation is not allocation friendly. This makes the SSARepresentation remember how much is allocated and not reallocate if SSA should be recalculated. Also added some allocation friendly code for the dominance code. Change-Id: Icd5586b7e2fefae8e1535975ab400b1ca95b500f Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
d4750f202170a448119c1813a964574bfea0dded	24-May-2014	Bill Buzbee <buzbee@android.com>	Revert "ART: Better SSA Allocation when recreating SSA" Temporarily reverting until memory footprint cost of adding a vreg to ssa entrance map on every applicable MIR node can be assessed.. This reverts commit cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e. Change-Id: Ia9c03bfc5d365ad8d8b949e870f1e3bcda7f9a54
0add77a86599260aba3ea4b56e9db3da6bb881a8	06-May-2014	Mark Mendell <mark.p.mendell@intel.com>	ART: Ensure use counts updated when adding SSA reg Ensure that matching data structures are updated when adding SSA registers late in the compile. Change-Id: I8e664dddf52c1a9095ba5b7a8df84e5a733bbc43 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
29a2648821ea4d0b5d3aecb9f835822fdfe6faa1	03-May-2014	Ian Rogers <irogers@google.com>	Move DecodedInstruction into MIR. Change-Id: I188dc7fef4f4033361c78daf2015b869242191c6
cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e	02-May-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Better SSA Allocation when recreating SSA The SSA calculation is not allocation friendly. This makes the SSARepresentation remember how much is allocated and not reallocate if SSA should be recalculated. Also added some allocation friendly code for the dominance code. Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Change-Id: I6418b402434bd850b45771c75b7631b7b84a8f66
cc794c3dc5b45601da23fb0d7bc16f9b4ef04065	02-May-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Move oat_data_flow_attributes_ to private and put an API The oat_data_flow_attributes had no checking mechanism to ensure bound correctness. This fix handles this and also offers two functions to retrieve the attributes: using the MIR and DecodedInstruction. Change-Id: Ib4f1f749efb923a803d364a4eea83a174527a644 Signed-Off-By: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
9820b7c1dc70e75ad405b9e6e63578fa9fe94e94	02-Jan-2014	Vladimir Marko <vmarko@google.com>	Early inlining of simple methods. Inlining "special" methods: empty methods, methods returning constants or their arguments, simple getters and setters. Bug: 8164439 Change-Id: I8c7fa9c14351fbb2470000b378a22974daaef236
3d73ba2ce682edfaf41f29521bd6039c6499a1c5	06-Mar-2014	Vladimir Marko <vmarko@google.com>	Avoid Cache*LoweringInfo pass when there's no GET/PUT/INVOKE. Add new data flow flags indicating instance/static field access. Record merged flags of all insns and use them to skip the CacheFieldLoweringInfo pass if the method uses no fields and the CacheMethodLoweringInfo pass if it has no invokes. Change-Id: I36a36b438ca9b0f104a7baddc0497d736495cc3c
83cc7ae96d4176533dd0391a1591d321b0a87f4f	12-Feb-2014	Vladimir Marko <vmarko@google.com>	Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
da7a69b3fa7bb22d087567364b7eb5a75824efd8	09-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Enable compiler temporaries Compiler temporaries are a facility for having virtual register sized space for dealing with intermediate values during MIR transformations. They receive explicit space in managed frames so they can have a home location in case they need to be spilled. The facility also supports "special" temporaries which have specific semantic purpose and their location in frame must be tracked. The compiler temporaries are treated in the same way as virtual registers so that the MIR level transformations do not need to have special logic. However, generated code needs to know stack layout so that it can distinguish between home locations. MIRGraph has received an interface for dealing with compiler temporaries. This interface allows allocation of wide and non-wide virtual register temporaries. The information about how temporaries are kept on stack has been moved to stack.h. This is was necessary because stack layout is dependent on where the temporaries are placed. Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4e97c539408f47145526f0062c1c06df99146a73	07-Jan-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	Added pass framework The patch adds a Middle-End pass system and normalizes the current passes into the pass framework. Passes have: - A start, work, and end functions. - A gate to determine to apply the pass. - Can provide a CFG dump folder. mir_dataflow.cc, mir_graph.cc, mir_optimization.cc, ssa_transformation.cc: - Changed due to moving code into bb_optimizations.cc. - Moved certain functions from private to public due to needed from the passes. pass.cc, pass.h: - Pass base class pass_driver.cc, pass_driver.h: - The pass driver implementation. frontend.cc: - Replace the function calls to the passes with the pass driver. Change-Id: I88cd82efbf6499df9e6c7f135d7e294dd724a079 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
1da1e2fceb0030b4b76b43510b1710a9613e0c2e	15-Nov-2013	buzbee <buzbee@google.com>	More compile-time tuning Another round of compile-time tuning, this time yeilding in the vicinity of 3% total reduction in compile time (which means about double that for the Quick Compile portion). Primary improvements are skipping the basic block combine optimization pass when using Quick (because we already have big blocks), combining the null check elimination and type inference passes, and limiting expensive local value number analysis to only those blocks which might benefit from it. Following this CL, the actual compile phase consumes roughly 60% of the total dex2oat time on the host, and 55% on the target (Note, I'm subtracting out the Deduping time here, which the timing logger normally counts against the compiler). A sample breakdown of the compilation time follows (this taken on PlusOne.apk w/ a Nexus 4): 39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling) 10.25% -> MIROpt:SSATransform: 361.31 8.45% -> BuildMIRGraph: 297.80 7.55% -> Assemble: 266.16 6.87% -> MIROpt:NCE_TypeInference: 242.22 5.56% -> Dedupe: 196.15 3.45% -> MIROpt:BBOpt: 121.53 3.20% -> RegisterAllocation: 112.69 3.00% -> PcMappingTable: 105.65 2.90% -> GcMap: 102.22 2.68% -> Launchpads: 94.50 1.16% -> MIROpt:InitRegLoc: 40.94 1.16% -> Cleanup: 40.93 1.10% -> MIROpt:CodeLayout: 38.80 0.97% -> MIROpt:ConstantProp: 34.35 0.96% -> MIROpt:UseCount: 33.75 0.86% -> MIROpt:CheckFilters: 30.28 0.44% -> SpecialMIR2LIR: 15.53 0.44% -> MIROpt:BBCombine: 15.41 (cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3) Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
17189ac098b2f156713db1821b49db7b2f018bbe	08-Nov-2013	buzbee <buzbee@google.com>	Quick compiler compile-time/memory use improvement This CL delivers a surprisingly large reduction in compile time, as well as a significant reduction in memory usage by conditionally removing a CFG construction feature introduced to support LLVM bitcode generation. In short, bitcode requires all potential exception edges to be explicitly present in the CFG. The Quick compiler (based on the old JIT), can ignore, at least for the purposes of dataflow analysis, potential throw points that do not have a corresponding catch block. To support LLVM, we create a target basic block for every potentially throwing instruction to give us a destination for the exception edge. Then, following the check elimination pass, we remove blocks whose edges have gone away. However, if we're not using LLVM, we can skip the creation of those things in the first place. The savings are significant. Single-threaded compilation time on the host looks to be reduced by something in the vicinity of 10%. We create roughly 60% fewer basic blocks (and, importantly, the creation of fewer basic block nodes has a multiplying effect on memory use reduction because it results in fewer dataflow bitmaps). Some basic block redution stats: boot: 2325802 before, 844846 after. Phonesky: 485472 before, 156014 after. PlusOne: 806232 before, 243156 after. Thinkfree: 864498 before, 264858 after. Another nice side effect of this change is giving the basic block optimization pass generally larger scope. For arena memusage in the boot class path (compiled on the host): Average Max Before: 50,863 88,017,820 After: 41,964 4,914,208 The huge reduction in max arena memory usage is due to the collapsing of a large initialization method. Specifically, with complete exception edges org.ccil.cowan.tagsoup.Scheme requires 13,802 basic blocks. With exception edges collapsed, it requires 4. This change also caused 2 latent bugs to surface. 1) The dex parsing code did not expect that the target of a switch statement could reside in the middle of the same basic block ended by that same switch statement. 2) The x86 backend introduced a 5-operand LIR instruction for indexed memops. However, there was no corresponding change to the use/def mask creation code. Thus, the 5th operand was never included in the use/def mask. This allowed the instruction scheduling code to incorrectly move a use above a definition. We didn't see this before because the affected x86 instructions were only used for aget/aput, and prior to this CL those Dalvik opcodes caused a basic block break because of the implied exception edge - which prevented the code motion. And finally, also included is some minor tuning of register use weighting. Change-Id: I3f2cab7136dba2bded71e9e33b452b95e8fffc0e
0b1191cfece83f6f8d4101575a06555a2d13387a	28-Oct-2013	Bill Buzbee <buzbee@google.com>	Revert "Revert "Null check elimination improvement"" This reverts commit 31aa97cfec5ee76b2f2496464e1b6f9e11d21a29. ..and thereby brings back change 380165, which was reverted because it was buggy. Three problems with the original CL: 1. The author ran the pre-submit tests, but used -j24 and failed to search the output for fail messages. 2. The new null check analysis pass uses an interative approach to identify whether a null check is needed. It is possible that the null-check-required state may oscillate, and a logic error caused it to stick in the "no check needed" state. 3. Our old nemesis Dalvik untyped constants, in which 0 values can be used both as object reference and non-object references. This CL conservatively treats all CONST definitions as potential object definitions for the purposes of null check elimination. Change-Id: I3c1744e44318276e42989502a314585e56ac57a0
31aa97cfec5ee76b2f2496464e1b6f9e11d21a29	26-Oct-2013	Ian Rogers <irogers@google.com>	Revert "Null check elimination improvement" This reverts commit 4db179d1821a9e78819d5adc8057a72f49e2aed8. Change-Id: I059c15c85860c6c9f235b5dabaaef2edebaf1de2
4db179d1821a9e78819d5adc8057a72f49e2aed8	23-Oct-2013	buzbee <buzbee@google.com>	Null check elimination improvement See b/10862777 Improves the null check elimination pass by tracking visibility of object definitions, rather than successful uses of object dereferences. For boot class path, increases static null check elimination success rate from 98.4% to 98.6%. Reduces size of boot.oat by ~300K bytes. Fixes loop nesting depth computation, which is used by register promotion, and tweaked the heuristics. Fixes a bug in verbose listing output in which a basic block id is directly dereferenced, rather than first being converted to a pointer. Change-Id: Id01c20b533cdb12ea8fc4be576438407d0a34cec
0d82948094d9a198e01aa95f64012bdedd5b6fc9	12-Oct-2013	buzbee <buzbee@google.com>	64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
56c717860df2d71d66fb77aa77f29dd346e559d3	06-Sep-2013	buzbee <buzbee@google.com>	Compile-time tuning Specialized the dataflow iterators and did a few other minor tweaks. Showing ~5% compile-time improvement in a single-threaded environment; less in multi-threaded (presumably because we're blocked by something else). Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
65ec92cf13c9d11c83711443a02e4249163d47f1	06-Sep-2013	Ian Rogers <irogers@google.com>	Refactor CompilerDriver::ComputeInvokeInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I4a7b8099abe91ea60f93a56077f4989303fa4876
1e54d68ce8e77dfe63340275d11a072c5184c89a	06-Sep-2013	Sebastien Hertz <shertz@google.com>	Disable devirtualization detection in DEX-to-DEX compiler. This CL allows the DEX-to-DEX compiler to disable devirtualization detection. This allows to quicken invoke-virtual/range instructions that used to be eligible for devirtualization. Bug: 10632943 Change-Id: I6c9f4d3249cf42b47f004be5825b3186fa83501e
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db	25-Aug-2013	Mathieu Chartier <mathieuc@google.com>	New arena memory allocator. Before we were creating arenas for each method. The issue with doing this is that we needed to memset each memory allocation. This can be improved if you start out with arenas that contain all zeroed memory and recycle them for each method. When you give memory back to the arena pool you do a single memset to zero out all of the memory that you used. Always inlined the fast path of the allocation code. Removed the "zero" parameter since the new arena allocator always returns zeroed memory. Host dex2oat time on target oat apks (2 samples each). Before: real 1m11.958s user 4m34.020s sys 1m28.570s After: real 1m9.690s user 4m17.670s sys 1m23.960s Target device dex2oat samples (Mako, Thinkfree.apk): Without new arena allocator: 0m26.47s real 0m54.60s user 0m25.85s system 0m25.91s real 0m54.39s user 0m26.69s system 0m26.61s real 0m53.77s user 0m27.35s system 0m26.33s real 0m54.90s user 0m25.30s system 0m26.34s real 0m53.94s user 0m27.23s system With new arena allocator: 0m25.02s real 0m54.46s user 0m19.94s system 0m25.17s real 0m55.06s user 0m20.72s system 0m24.85s real 0m55.14s user 0m19.30s system 0m24.59s real 0m54.02s user 0m20.07s system 0m25.06s real 0m55.00s user 0m20.42s system Correctness of Thinkfree.apk.oat verified by diffing both of the oat files. Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
38f85e4892f6504971bde994fec81fd61780ac30	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/operators issues Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
df62950e7a32031b82360c407d46a37b94188fbb	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
2ce745c06271d5223d57dbf08117b20d5b60694a	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513	12-Jul-2013	Brian Carlstrom <bdc@google.com>	Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81