2ab40eb3c23559205ac7b9b039bd749458e8a761 |
|
02-Jun-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
ART: Add Invokes to DecodedInstruction Add a method Invokes to test for the kInvoke flag. Also moved IsPseudoMirOp to DecodedInstruction to use it for the various querry methods. Change-Id: I59a2056b7b802b8393fa2b0d977304d252b38c89 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
|
f2466a766947f952c5c41186a6c5e26fb862bb37 |
|
10-Jul-2014 |
Udayan Banerji <udayan.banerji@intel.com> |
ART: Handle Extended MIRs in a uniform manner The special handling is needed since some extended MIRs can hold values in args array, and we might want to handle the dataflow for those in a specialized manner. Current dataflow attributes may not be able to describe it for the extended MIRs. Change-Id: I8b64f3142a4304282bb31f1d4686eba72284d97d Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43 |
|
09-Jul-2014 |
Udayan Banerji <udayan.banerji@intel.com> |
X86 Backend support for vectorized float and byte 16x16 operations Add support for reserving vector registers for the duration of vector loop. Add support for 16x16 multiplication, shifts, and add reduce. Changed the vectorization implementation to be able to use the dataflow elements for SSA recreation and fixed a few implementation details. Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Olivier Come <olivier.come@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
35ba7f3a78d38885ec54e61ed060d2771eeceea7 |
|
31-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: fix array overrun. MIRGraph::InlineCalls() was using the MIR opcode to recover Dalvik instruction flags - something that is only valid for Dalvik opcodes and not the set of extended MIR opcodes. This is probably the 3rd or 4th time we've had a bug using the MIR opcode in situations that are only valid for the Dalvik opcode subset. I took the opportunity to scan the code for other cases of this (didn't find any), and did some cleanup while I was in the neighborhood. We should probably rework the DalvikOpcode/MirOpcode model whenver we get around to removing DalvikInstruction from MIR. Internal bug b/15352667: out-of-bound access in mir_optimization.cc Change-Id: I75f06780468880892151e3cdd313e14bfbbaa489
|
2469e60e6ff08c2a0b4cd1e209246c5d91027679 |
|
07-May-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
ART: Setting up cleanup - Moved code around to actually have the clean-up code in a PassDriver format. This allows us to better control what is being called after an optimization It also allows the use of a centralized pass system for both optimizations and cleanup. Change-Id: I9d21e9bb9ee663739722f440d82adf04f73e380c Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Yixin Shou <yixin.shou@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
4896d7b6fb75add25f2d6ba84346ac83d8ba9d51 |
|
02-May-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
ART: Better SSA Allocation when recreating SSA The SSA calculation is not allocation friendly. This makes the SSARepresentation remember how much is allocated and not reallocate if SSA should be recalculated. Also added some allocation friendly code for the dominance code. Change-Id: Icd5586b7e2fefae8e1535975ab400b1ca95b500f Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
|
d4750f202170a448119c1813a964574bfea0dded |
|
24-May-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "ART: Better SSA Allocation when recreating SSA" Temporarily reverting until memory footprint cost of adding a vreg to ssa entrance map on every applicable MIR node can be assessed.. This reverts commit cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e. Change-Id: Ia9c03bfc5d365ad8d8b949e870f1e3bcda7f9a54
|
0add77a86599260aba3ea4b56e9db3da6bb881a8 |
|
06-May-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
ART: Ensure use counts updated when adding SSA reg Ensure that matching data structures are updated when adding SSA registers late in the compile. Change-Id: I8e664dddf52c1a9095ba5b7a8df84e5a733bbc43 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
29a2648821ea4d0b5d3aecb9f835822fdfe6faa1 |
|
03-May-2014 |
Ian Rogers <irogers@google.com> |
Move DecodedInstruction into MIR. Change-Id: I188dc7fef4f4033361c78daf2015b869242191c6
|
cb73fb35e5f7c575ed491c0c8e2d2b1a0a22ea2e |
|
02-May-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
ART: Better SSA Allocation when recreating SSA The SSA calculation is not allocation friendly. This makes the SSARepresentation remember how much is allocated and not reallocate if SSA should be recalculated. Also added some allocation friendly code for the dominance code. Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Change-Id: I6418b402434bd850b45771c75b7631b7b84a8f66
|
cc794c3dc5b45601da23fb0d7bc16f9b4ef04065 |
|
02-May-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
ART: Move oat_data_flow_attributes_ to private and put an API The oat_data_flow_attributes had no checking mechanism to ensure bound correctness. This fix handles this and also offers two functions to retrieve the attributes: using the MIR and DecodedInstruction. Change-Id: Ib4f1f749efb923a803d364a4eea83a174527a644 Signed-Off-By: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
|
9820b7c1dc70e75ad405b9e6e63578fa9fe94e94 |
|
02-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Early inlining of simple methods. Inlining "special" methods: empty methods, methods returning constants or their arguments, simple getters and setters. Bug: 8164439 Change-Id: I8c7fa9c14351fbb2470000b378a22974daaef236
|
3d73ba2ce682edfaf41f29521bd6039c6499a1c5 |
|
06-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Avoid Cache*LoweringInfo pass when there's no GET/PUT/INVOKE. Add new data flow flags indicating instance/static field access. Record merged flags of all insns and use them to skip the CacheFieldLoweringInfo pass if the method uses no fields and the CacheMethodLoweringInfo pass if it has no invokes. Change-Id: I36a36b438ca9b0f104a7baddc0497d736495cc3c
|
83cc7ae96d4176533dd0391a1591d321b0a87f4f |
|
12-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
|
da7a69b3fa7bb22d087567364b7eb5a75824efd8 |
|
09-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Enable compiler temporaries Compiler temporaries are a facility for having virtual register sized space for dealing with intermediate values during MIR transformations. They receive explicit space in managed frames so they can have a home location in case they need to be spilled. The facility also supports "special" temporaries which have specific semantic purpose and their location in frame must be tracked. The compiler temporaries are treated in the same way as virtual registers so that the MIR level transformations do not need to have special logic. However, generated code needs to know stack layout so that it can distinguish between home locations. MIRGraph has received an interface for dealing with compiler temporaries. This interface allows allocation of wide and non-wide virtual register temporaries. The information about how temporaries are kept on stack has been moved to stack.h. This is was necessary because stack layout is dependent on where the temporaries are placed. Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
4e97c539408f47145526f0062c1c06df99146a73 |
|
07-Jan-2014 |
Jean Christophe Beyler <jean.christophe.beyler@intel.com> |
Added pass framework The patch adds a Middle-End pass system and normalizes the current passes into the pass framework. Passes have: - A start, work, and end functions. - A gate to determine to apply the pass. - Can provide a CFG dump folder. mir_dataflow.cc, mir_graph.cc, mir_optimization.cc, ssa_transformation.cc: - Changed due to moving code into bb_optimizations.cc. - Moved certain functions from private to public due to needed from the passes. pass.cc, pass.h: - Pass base class pass_driver.cc, pass_driver.h: - The pass driver implementation. frontend.cc: - Replace the function calls to the passes with the pass driver. Change-Id: I88cd82efbf6499df9e6c7f135d7e294dd724a079 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
|
1da1e2fceb0030b4b76b43510b1710a9613e0c2e |
|
15-Nov-2013 |
buzbee <buzbee@google.com> |
More compile-time tuning Another round of compile-time tuning, this time yeilding in the vicinity of 3% total reduction in compile time (which means about double that for the Quick Compile portion). Primary improvements are skipping the basic block combine optimization pass when using Quick (because we already have big blocks), combining the null check elimination and type inference passes, and limiting expensive local value number analysis to only those blocks which might benefit from it. Following this CL, the actual compile phase consumes roughly 60% of the total dex2oat time on the host, and 55% on the target (Note, I'm subtracting out the Deduping time here, which the timing logger normally counts against the compiler). A sample breakdown of the compilation time follows (this taken on PlusOne.apk w/ a Nexus 4): 39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling) 10.25% -> MIROpt:SSATransform: 361.31 8.45% -> BuildMIRGraph: 297.80 7.55% -> Assemble: 266.16 6.87% -> MIROpt:NCE_TypeInference: 242.22 5.56% -> Dedupe: 196.15 3.45% -> MIROpt:BBOpt: 121.53 3.20% -> RegisterAllocation: 112.69 3.00% -> PcMappingTable: 105.65 2.90% -> GcMap: 102.22 2.68% -> Launchpads: 94.50 1.16% -> MIROpt:InitRegLoc: 40.94 1.16% -> Cleanup: 40.93 1.10% -> MIROpt:CodeLayout: 38.80 0.97% -> MIROpt:ConstantProp: 34.35 0.96% -> MIROpt:UseCount: 33.75 0.86% -> MIROpt:CheckFilters: 30.28 0.44% -> SpecialMIR2LIR: 15.53 0.44% -> MIROpt:BBCombine: 15.41 (cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3) Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
|
17189ac098b2f156713db1821b49db7b2f018bbe |
|
08-Nov-2013 |
buzbee <buzbee@google.com> |
Quick compiler compile-time/memory use improvement This CL delivers a surprisingly large reduction in compile time, as well as a significant reduction in memory usage by conditionally removing a CFG construction feature introduced to support LLVM bitcode generation. In short, bitcode requires all potential exception edges to be explicitly present in the CFG. The Quick compiler (based on the old JIT), can ignore, at least for the purposes of dataflow analysis, potential throw points that do not have a corresponding catch block. To support LLVM, we create a target basic block for every potentially throwing instruction to give us a destination for the exception edge. Then, following the check elimination pass, we remove blocks whose edges have gone away. However, if we're not using LLVM, we can skip the creation of those things in the first place. The savings are significant. Single-threaded compilation time on the host looks to be reduced by something in the vicinity of 10%. We create roughly 60% fewer basic blocks (and, importantly, the creation of fewer basic block nodes has a multiplying effect on memory use reduction because it results in fewer dataflow bitmaps). Some basic block redution stats: boot: 2325802 before, 844846 after. Phonesky: 485472 before, 156014 after. PlusOne: 806232 before, 243156 after. Thinkfree: 864498 before, 264858 after. Another nice side effect of this change is giving the basic block optimization pass generally larger scope. For arena memusage in the boot class path (compiled on the host): Average Max Before: 50,863 88,017,820 After: 41,964 4,914,208 The huge reduction in max arena memory usage is due to the collapsing of a large initialization method. Specifically, with complete exception edges org.ccil.cowan.tagsoup.Scheme requires 13,802 basic blocks. With exception edges collapsed, it requires 4. This change also caused 2 latent bugs to surface. 1) The dex parsing code did not expect that the target of a switch statement could reside in the middle of the same basic block ended by that same switch statement. 2) The x86 backend introduced a 5-operand LIR instruction for indexed memops. However, there was no corresponding change to the use/def mask creation code. Thus, the 5th operand was never included in the use/def mask. This allowed the instruction scheduling code to incorrectly move a use above a definition. We didn't see this before because the affected x86 instructions were only used for aget/aput, and prior to this CL those Dalvik opcodes caused a basic block break because of the implied exception edge - which prevented the code motion. And finally, also included is some minor tuning of register use weighting. Change-Id: I3f2cab7136dba2bded71e9e33b452b95e8fffc0e
|
0b1191cfece83f6f8d4101575a06555a2d13387a |
|
28-Oct-2013 |
Bill Buzbee <buzbee@google.com> |
Revert "Revert "Null check elimination improvement"" This reverts commit 31aa97cfec5ee76b2f2496464e1b6f9e11d21a29. ..and thereby brings back change 380165, which was reverted because it was buggy. Three problems with the original CL: 1. The author ran the pre-submit tests, but used -j24 and failed to search the output for fail messages. 2. The new null check analysis pass uses an interative approach to identify whether a null check is needed. It is possible that the null-check-required state may oscillate, and a logic error caused it to stick in the "no check needed" state. 3. Our old nemesis Dalvik untyped constants, in which 0 values can be used both as object reference and non-object references. This CL conservatively treats all CONST definitions as potential object definitions for the purposes of null check elimination. Change-Id: I3c1744e44318276e42989502a314585e56ac57a0
|
31aa97cfec5ee76b2f2496464e1b6f9e11d21a29 |
|
26-Oct-2013 |
Ian Rogers <irogers@google.com> |
Revert "Null check elimination improvement" This reverts commit 4db179d1821a9e78819d5adc8057a72f49e2aed8. Change-Id: I059c15c85860c6c9f235b5dabaaef2edebaf1de2
|
4db179d1821a9e78819d5adc8057a72f49e2aed8 |
|
23-Oct-2013 |
buzbee <buzbee@google.com> |
Null check elimination improvement See b/10862777 Improves the null check elimination pass by tracking visibility of object definitions, rather than successful uses of object dereferences. For boot class path, increases static null check elimination success rate from 98.4% to 98.6%. Reduces size of boot.oat by ~300K bytes. Fixes loop nesting depth computation, which is used by register promotion, and tweaked the heuristics. Fixes a bug in verbose listing output in which a basic block id is directly dereferenced, rather than first being converted to a pointer. Change-Id: Id01c20b533cdb12ea8fc4be576438407d0a34cec
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
56c717860df2d71d66fb77aa77f29dd346e559d3 |
|
06-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning Specialized the dataflow iterators and did a few other minor tweaks. Showing ~5% compile-time improvement in a single-threaded environment; less in multi-threaded (presumably because we're blocked by something else). Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
|
65ec92cf13c9d11c83711443a02e4249163d47f1 |
|
06-Sep-2013 |
Ian Rogers <irogers@google.com> |
Refactor CompilerDriver::ComputeInvokeInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I4a7b8099abe91ea60f93a56077f4989303fa4876
|
1e54d68ce8e77dfe63340275d11a072c5184c89a |
|
06-Sep-2013 |
Sebastien Hertz <shertz@google.com> |
Disable devirtualization detection in DEX-to-DEX compiler. This CL allows the DEX-to-DEX compiler to disable devirtualization detection. This allows to quicken invoke-virtual/range instructions that used to be eligible for devirtualization. Bug: 10632943 Change-Id: I6c9f4d3249cf42b47f004be5825b3186fa83501e
|
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db |
|
25-Aug-2013 |
Mathieu Chartier <mathieuc@google.com> |
New arena memory allocator. Before we were creating arenas for each method. The issue with doing this is that we needed to memset each memory allocation. This can be improved if you start out with arenas that contain all zeroed memory and recycle them for each method. When you give memory back to the arena pool you do a single memset to zero out all of the memory that you used. Always inlined the fast path of the allocation code. Removed the "zero" parameter since the new arena allocator always returns zeroed memory. Host dex2oat time on target oat apks (2 samples each). Before: real 1m11.958s user 4m34.020s sys 1m28.570s After: real 1m9.690s user 4m17.670s sys 1m23.960s Target device dex2oat samples (Mako, Thinkfree.apk): Without new arena allocator: 0m26.47s real 0m54.60s user 0m25.85s system 0m25.91s real 0m54.39s user 0m26.69s system 0m26.61s real 0m53.77s user 0m27.35s system 0m26.33s real 0m54.90s user 0m25.30s system 0m26.34s real 0m53.94s user 0m27.23s system With new arena allocator: 0m25.02s real 0m54.46s user 0m19.94s system 0m25.17s real 0m55.06s user 0m20.72s system 0m24.85s real 0m55.14s user 0m19.30s system 0m24.59s real 0m54.02s user 0m20.07s system 0m25.06s real 0m55.00s user 0m20.42s system Correctness of Thinkfree.apk.oat verified by diffing both of the oat files. Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
|
38f85e4892f6504971bde994fec81fd61780ac30 |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/operators issues Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
|
df62950e7a32031b82360c407d46a37b94188fbb |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|