2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
1095793154d2ff33323ba9edaa4f83373bdb6c8e |
|
24-Mar-2015 |
Goran Jakovljevic <Goran.Jakovljevic@imgtec.com> |
[MIPS] Refactoring code for quick compiler Code from compiler/dex/quick/mips64 is merged with code in mips folder. Change-Id: I785983c21549141306484647da86a0bb4815daaa
|
f6737f7ed741b15cfd60c2530dab69f897540735 |
|
23-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
|
027f0ff64c2512b9a5f1f54f3fea1bec481eb0f5 |
|
28-Feb-2015 |
Douglas Leung <douglas.leung@imgtec.com> |
ART: Add Mips32r6 backend support Add Mips32r6 compiler support. Don't use deprecated Mips32r2 instructions if running in Mips32r6 mode. Change-Id: I54e689aa8c026ccb75c4af515aa2794f471c9f67
|
a2e18ed9397f21c96eae4a26df9ca35a6a97341d |
|
27-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Fix inlining of Mir2Lir Missed -inl includes. Change-Id: I39e6d603c7f5d36693aca3816653594488bff63f
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
8ebdc2bdbbae5dd014bce8d438f0eca02bad9ff9 |
|
14-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Fix indentation in Mips backend Fix the indentation to be standard. Change-Id: I39a16716be3429dfef6df0a585e24423b46363a2
|
8d36591d93920e7b7830c3ffee3759b561f5339e |
|
13-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Use jalr instead of jr for Mips Use the jalr instruction instead of jr in stubs and compiled code. Change-Id: Idacc5167a5bb0113dc2e7716e4767e5ed07b5e0b
|
d582fa4ea62083a7598dded5b82dc2198b3daac7 |
|
06-Nov-2014 |
Ian Rogers <irogers@google.com> |
Instruction set features for ARM64, MIPS and X86. Also, refactor how feature strings are handled so they are additive or subtractive. Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler rather than #ifdefs that wouldn't have worked in cross-compilation. Add SIMD features for x86/x86-64 proposed in: https://android-review.googlesource.com/#/c/112370/ Bug: 18056890 Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
|
277ccbd200ea43590dfc06a93ae184a765327ad0 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: More warnings Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general, and -Wunused-but-set-parameter for GCC builds. Change-Id: I81bbdd762213444673c65d85edae594a523836e5
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
fc787ecd91127b2c8458afd94e5148e2ae51a1f5 |
|
10-Oct-2014 |
Ian Rogers <irogers@google.com> |
Enable -Wimplicit-fallthrough. Falling through switch cases on a clang build must now annotate the fallthrough with the FALLTHROUGH_INTENDED macro. Bug: 17731372 Change-Id: I836451cd5f96b01d1ababdbf9eef677fe8fa8324
|
5aa6e04061ced68cca8111af1e9c19781b8a9c5d |
|
14-Jun-2014 |
Ian Rogers <irogers@google.com> |
Tidy x86 assembler. Use helper functions to compute when the kind has a SIB, a ModRM and RegReg form. Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
|
660188264dee3c8f3510e2e24c11816c6b60f197 |
|
06-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Use utils.h::RoundUp instead of explicit bit-fiddling Change-Id: I249a2cfeb044d3699d02e13d42b8e72518571640
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
9da5c1013215176f2a4dbe7a804be899e12d5f68 |
|
28-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler, MIPS resource cleanup MIPS architecture includes internal registers HI and LO. Similar to condition codes in other architectures, these internal resouces must be accounted for during instruction scheduling. Previously, the Quick backend for MIPS dealt with them by defining rHI and rLO pseudo registers - treating them as actual registers for def/use masks. This CL changes the handling of these resources to be in line with how condition codes are used elsewhere - leaving register definitions to be used for registers. Change-Id: Idcd77f3107b0c9b081ad05b1aab663fb9f41492d
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
1da1e2fceb0030b4b76b43510b1710a9613e0c2e |
|
15-Nov-2013 |
buzbee <buzbee@google.com> |
More compile-time tuning Another round of compile-time tuning, this time yeilding in the vicinity of 3% total reduction in compile time (which means about double that for the Quick Compile portion). Primary improvements are skipping the basic block combine optimization pass when using Quick (because we already have big blocks), combining the null check elimination and type inference passes, and limiting expensive local value number analysis to only those blocks which might benefit from it. Following this CL, the actual compile phase consumes roughly 60% of the total dex2oat time on the host, and 55% on the target (Note, I'm subtracting out the Deduping time here, which the timing logger normally counts against the compiler). A sample breakdown of the compilation time follows (this taken on PlusOne.apk w/ a Nexus 4): 39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling) 10.25% -> MIROpt:SSATransform: 361.31 8.45% -> BuildMIRGraph: 297.80 7.55% -> Assemble: 266.16 6.87% -> MIROpt:NCE_TypeInference: 242.22 5.56% -> Dedupe: 196.15 3.45% -> MIROpt:BBOpt: 121.53 3.20% -> RegisterAllocation: 112.69 3.00% -> PcMappingTable: 105.65 2.90% -> GcMap: 102.22 2.68% -> Launchpads: 94.50 1.16% -> MIROpt:InitRegLoc: 40.94 1.16% -> Cleanup: 40.93 1.10% -> MIROpt:CodeLayout: 38.80 0.97% -> MIROpt:ConstantProp: 34.35 0.96% -> MIROpt:UseCount: 33.75 0.86% -> MIROpt:CheckFilters: 30.28 0.44% -> SpecialMIR2LIR: 15.53 0.44% -> MIROpt:BBCombine: 15.41 (cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3) Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
|
a61f49539a59b610e557b5513695295639496750 |
|
23-Aug-2013 |
buzbee <buzbee@google.com> |
Add timing logger to Quick compiler Current Quick compiler breakdown for compiling the boot class path: MIR2LIR: 29.674% MIROpt:SSATransform: 17.656% MIROpt:BBOpt: 11.508% BuildMIRGraph: 7.815% Assemble: 6.898% MIROpt:ConstantProp: 5.151% Cleanup: 4.916% MIROpt:NullCheckElimination: 4.085% RegisterAllocation: 3.972% GcMap: 2.359% Launchpads: 2.147% PcMappingTable: 2.145% MIROpt:CodeLayout: 0.697% LiteralData: 0.654% SpecialMIR2LIR: 0.323% Change-Id: I9f77e825faf79e6f6b214bb42edcc4b36f55d291
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
409fe94ad529d9334587be80b9f6a3d166805508 |
|
11-Oct-2013 |
buzbee <buzbee@google.com> |
Quick assembler fix This CL re-instates the select pattern optimization disabled by CL 374310, and fixes the underlying problem: improper handling of the kPseudoBarrier LIR opcode. The bug was introduced in the recent assembler restructuring. In short, LIR pseudo opcodes (which have values < 0), should always have size 0 - and thus cause no bits to be emitted during assembly. In this case, bad logic caused us to set the size of a kPseudoBarrier opcode via lookup through the EncodingMap. Because all pseudo ops are < 0, this meant we did an array underflow load, picking up whatever garbage was located before the EncodingMap. This explains why this error showed up recently - we'd previuosly just gotten a lucky layout. This CL corrects the faulty logic, and adds DCHECKs to uses of the EncodingMap to ensure that we don't try to access w/ a pseudo op. Additionally, the existing is_pseudo_op() macro is replaced with IsPseudoLirOp(), named similar to the existing IsPseudoMirOp(). Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
|
b48819db07f9a0992a72173380c24249d7fc648a |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
|
252254b130067cd7a5071865e793966871ae0246 |
|
09-Sep-2013 |
buzbee <buzbee@google.com> |
More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so *only* because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
|
38f85e4892f6504971bde994fec81fd61780ac30 |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/operators issues Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|