d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Bug: 27856014 (cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c) Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
|
46817b876ab00d6b78905b80ed12b4344c522b6c |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
|
dee58d6bb6d567fcd0c4f39d8d690c3acaf0e432 |
|
07-Apr-2016 |
David Brazdil <dbrazdil@google.com> |
Revert "Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals"" This patch merges the instruction-building phases from HGraphBuilder and SsaBuilder into a single HInstructionBuilder class. As a result, it is not necessary to generate HLocal, HLoadLocal and HStoreLocal instructions any more, as the builder produces SSA form directly. Saves 5-15% of arena-allocated memory (see bug for more data): GMS 20.46MB => 19.26MB (-5.86%) Maps 24.12MB => 21.47MB (-10.98%) YouTube 28.60MB => 26.01MB (-9.05%) This CL fixed an issue with parsing quickened instructions. Bug: 27894376 Bug: 27998571 Bug: 27995065 Change-Id: I20dbe1bf2d0fe296377478db98cb86cba695e694
|
60328910cad396589474f8513391ba733d19390b |
|
04-Apr-2016 |
David Brazdil <dbrazdil@google.com> |
Revert "Refactor HGraphBuilder and SsaBuilder to remove HLocals" Bug: 27995065 This reverts commit e3ff7b293be2a6791fe9d135d660c0cffe4bd73f. Change-Id: I5363c7ce18f47fd422c15eed5423a345a57249d8
|
e3ff7b293be2a6791fe9d135d660c0cffe4bd73f |
|
02-Mar-2016 |
David Brazdil <dbrazdil@google.com> |
Refactor HGraphBuilder and SsaBuilder to remove HLocals This patch merges the instruction-building phases from HGraphBuilder and SsaBuilder into a single HInstructionBuilder class. As a result, it is not necessary to generate HLocal, HLoadLocal and HStoreLocal instructions any more, as the builder produces SSA form directly. Saves 5-15% of arena-allocated memory (see bug for more data): GMS 20.46MB => 19.26MB (-5.86%) Maps 24.12MB => 21.47MB (-10.98%) YouTube 28.60MB => 26.01MB (-9.05%) Bug: 27894376 Change-Id: Iefe28d40600c169c5d306fd2c77034ae19476d90
|
badd826664896d4a9628a5a89b78016894aa414b |
|
02-Feb-2016 |
David Brazdil <dbrazdil@google.com> |
ART: Run SsaBuilder from HGraphBuilder First step towards merging the two passes, which will later result in HGraphBuilder directly producing SSA form. This CL mostly just updates tests broken by not being able to inspect the pre-SSA form. Using HLocals outside the HGraphBuilder is now deprecated. Bug: 27150508 Change-Id: I00fb6050580f409dcc5aa5b5aa3a536d6e8d759e
|
b3e773eea39a156b3eacf915ba84e3af1a5c14fa |
|
26-Jan-2016 |
David Brazdil <dbrazdil@google.com> |
ART: Implement support for instruction inlining Optimizing HIR contains 'non-materialized' instructions which are emitted at their use sites rather than their defining sites. This was not properly handled by the liveness analysis which did not adjust the use positions of the inputs of such instructions. Despite the analysis being incorrect, the current use cases never produce incorrect code. This patch generalizes the concept of inlined instructions and updates liveness analysis to set the compute use positions correctly. Change-Id: Id703c154b20ab861241ae5c715a150385d3ff621
|
4833f5a1990c76bc2be89504225fb13cca22bedf |
|
16-Dec-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Refactor SsaBuilder for more precise typing info This reverts commit 68289a531484d26214e09f1eadd9833531a3bc3c. Now uses Primitive::Is64BitType instead of Primitive::ComponentSize because it was incorrectly optimized by GCC. Bug: 26208284 Bug: 24252151 Bug: 24252100 Bug: 22538329 Bug: 25786318 Change-Id: Ib39f3da2b92bc5be5d76f4240a77567d82c6bebe
|
68289a531484d26214e09f1eadd9833531a3bc3c |
|
16-Dec-2015 |
Alex Light <allight@google.com> |
Revert "ART: Refactor SsaBuilder for more precise typing info" This reverts commit d9510dfc32349eeb4f2145c801f7ba1d5bccfb12. Bug: 26208284 Bug: 24252151 Bug: 24252100 Bug: 22538329 Bug: 25786318 Change-Id: I5f491becdf076ff51d437d490405ec4e1586c010
|
d9510dfc32349eeb4f2145c801f7ba1d5bccfb12 |
|
05-Nov-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Refactor SsaBuilder for more precise typing info This patch refactors the SsaBuilder to do the following: 1) All phis are constructed live and marked dead if not used or proved to be conflicting. 2) Primitive type propagation, now not a separate pass, identifies conflicting types and marks corresponding phis dead. 3) When compiling --debuggable, DeadPhiHandling used to revive phis which had only environmental uses but did not attempt to resolve conflicts. This pass was removed as obsolete and is now superseded by primitive type propagation (identifying conflicting phis) and SsaDeadPhiEliminiation (keeping phis live if debuggable + env use). 4) Resolving conflicts requires correct primitive type information on all instructions. This was not the case for ArrayGet instructions which can have ambiguous types in the bytecode. To this end, SsaBuilder now runs reference type propagation and types ArrayGets from the type of the input array. 5) With RTP being run inside the SsaBuilder, it is not necessary to run it as a separate optimization pass. Optimizations can now assume that all instructions of type kPrimNot have reference type info after SsaBuilder (with the exception of NullConstant). 6) Graph now contains a reference type to be assigned to NullConstant. All reference type instructions therefore have RTI, as now enforced by the SsaChecker. Bug: 24252151 Bug: 24252100 Bug: 22538329 Bug: 25786318 Change-Id: I7a3aee1ff66c82d64b4846611c547af17e91d260
|
ec7802a102d49ab5c17495118d4fe0bcc7287beb |
|
01-Oct-2015 |
Vladimir Marko <vmarko@google.com> |
Add DCHECKs to ArenaVector and ScopedArenaVector. Implement dchecked_vector<> template that DCHECK()s element access and insert()/emplace()/erase() positions. Change the ArenaVector<> and ScopedArenaVector<> aliases to use the new template instead of std::vector<>. Remove DCHECK()s that have now become unnecessary from the Optimizing compiler. Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
|
fa6b93c4b69e6d7ddfa2a4ed0aff01b0608c5a3a |
|
15-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag arena allocations in HGraph. Replace GrowableArray with ArenaVector in HGraph and related classes HEnvironment, HLoopInformation, HInvoke and HPhi, and tag allocations with new arena allocation types. Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
|
0a23d74dc2751440822960eab218be4cb8843647 |
|
07-May-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Add a parent environment to HEnvironment. This code has no functionality change. It adds a placeholder for chaining inlined frames. Change-Id: I5ec57335af76ee406052345b947aad98a6a4423a
|
0d9f17de8f21a10702de1510b73e89d07b3b9bbf |
|
15-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Move the linear order to the HGraph. Bug found by Zheng Xu: SsaLivenessAnalysis being a stack allocated object, we should not refer to it in later phases of the compiler. Specifically, the code generator was using the linear order, which was stored in the liveness analysis object. Change-Id: I574641f522b7b86fc43f3914166108efc72edb3b
|
fb8d279bc011b31d0765dc7ca59afea324fd0d0c |
|
01-Apr-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing] Implement x86/x86_64 math intrinsics Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
09b8463493aeb6ea2bce05f67d3457d5fcc8a7d9 |
|
13-Feb-2015 |
Mark Mendell <mark.p.mendell@intel.com> |
[optimizing compiler] x86 goodness Implement the x86 version of https://android-review.googlesource.com/#/c/129560/, which made some enhancements to x86_64 code. - Use leal to implement 3 operand adds - Use testl rather than cmpl to 0 for registers - Use leaq for x86_64 for adds with constant in int32_t range Note: - The range and register allocator tests seem quite fragile. I had to change ADD_INT_LIT8 to XOR_INT_LIT8 for the register allocator test to get the code to run. It seems like this is a bit hard-coded to expected code generation sequences. I also changes BuildTwoAdds to BuildTwoSubs for the same reason. - For the live range test, I just changed the expected output, as the Locations were different. Change-Id: I402f2e95ddc8be4eb0befb3dae1b29feadfa29ab Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
b666f4805c8ae707ea6fd7f6c7f375e0b000dba8 |
|
18-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move arenas into runtime Moved arena pool into the runtime. Motivation: Allow GC to use arena allocators, recycle arena pool for linear alloc. Bug: 19264997 Change-Id: I8ddbb6d55ee923a980b28fb656c758c5d7697c2f
|
5e8b137d28c840b128e2488f954cccee3e86db14 |
|
23-Jan-2015 |
David Brazdil <dbrazdil@google.com> |
Create HGraph outside Builder, print timings This patch refactors the way HGraph objects are created, moving the instantiation out of the Builder class and creating the CodeGenerator earlier. The patch uses this to build a single interface for printing timings info and dumping the CFG. Change-Id: I2eb63eabf28e2d0f5cdc7affaa690c3a4b1bdd21
|
ea55b934cff1280318f5514039549799227cfa3d |
|
27-Jan-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Further refactor use lists Change-Id: I9e3219575a508ca5141d851bfcaf848302480c32
|
ed59619b370ef23ffbb25d1d01f615e60a9262b6 |
|
23-Jan-2015 |
David Brazdil <dbrazdil@google.com> |
Optimizing: Speed up HEnvironment use removal Removal of use records from HEnvironment vregs involved iterating over potentially large linked lists which made compilation of huge methods very slow. This patch turns use lists into doubly-linked lists, stores pointers to the relevant nodes inside HEnvironment and subsequently turns the removals into constant-time operations. Change-Id: I0e1d4d782fd624e7b8075af75d4adf0a0634a1ee
|
cd6dffedf1bd8e6dfb3fb0c933551f9a90f7de3f |
|
08-Jan-2015 |
Calin Juravle <calin@google.com> |
Add implicit null checks for the optimizing compiler - for backends: arm, arm64, x86, x86_64 - fixed parameter passing for CodeGenerator - 003-omnibus-opcodes test verifies that NullPointerExceptions work as expected Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
|
e53798a7e3267305f696bf658e418c92e63e0834 |
|
01-Dec-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Inlining support in optimizing. Currently only inlines simple things that don't require an environment, such as: - Returning a constant. - Returning a parameter. - Returning an arithmetic operation. Change-Id: Ie844950cb44f69e104774a3cf7a8dea66bc85661
|
f537012ceb6cba8a78b36a5065beb9588451a250 |
|
02-Dec-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Treat SSA transformation special, as we may have to bailout. We forgot to bailout when we found a non-natural loop (on which our optimizations don't work). Change-Id: I11976b5af4c98f4f29267a74c74d34b5ad81e20c
|
a3c00e54f9b711bf3fc55ce5e7d4f8765e2ea9fa |
|
25-Nov-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix tests now that dead phis are removed when building SSA. Change-Id: Ie795f5f1c7c44ec1a3ea2bac822b6255bfb8d45c
|
8e3964b766652a0478e8e0e303e8556c997675f1 |
|
17-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Remove the notion of dies at entry. - Instead, explicitly say that the output does not overlap. - Inputs that must be in a fixed register do die at entry, as we know they have a location that others can not take. - There is also no need to differentiate between an input move and a connecting sibling move - those can be put in the same parallel move instruction. Change-Id: I1b2b2827906601f822b59fb9d6a21d48e43bae27
|
476df557fed5f0b3f32f8d11a654674bb403a8f8 |
|
09-Oct-2014 |
Roland Levillain <rpl@google.com> |
Use Is*() helpers to shorten code in the optimizing compiler. Change-Id: I79f31833bc9a0aa2918381aa3fb0b05d45f75689
|
360231a056e796c36ffe62348507e904dc9efb9b |
|
08-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix code generation of materialized conditions. Move the logic for knowing if a condition needs to be materialized in an optimization pass (so that the information does not change as a side effect of another optimization). Also clean-up arm and x86_64 codegen: - arm: ldr and str are for power-users when a constant is in play. We should use LoadFromOffset and StoreToOffset. - x86_64: fix misuses of movq instead of movl. Change-Id: I01a03b91803624be2281a344a13ad5efbf4f3ef3
|
9ae0daa60c568f98ef0020e52366856ff314615f |
|
30-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add support for inputs dying at entry of instructions. - Start using it in places where it makes sense. - Also improve suspend check on arm to use subs directly. Change-Id: I09ac0589f5ccb9b850ee757c76dcbcf35ee8cd01
|
8ddb00ca935733f5d3b07816e5bb33d6cabe6ec4 |
|
29-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Improve detection of lifetime holes. The check concluding that the next use was in a successor was too conservative: two blocks following each other in terms of liveness are not necessarily predecessor/sucessor. Change-Id: Ideec98046c812aa5fb63781141b5fde24c706d6d
|
9ebc72c99e6b703bda611d7c918c9cf3dfb43e55 |
|
25-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Make suspend checks note have side effects. Also adjust gtests. Change-Id: I5e1a3e53115812b45ec7f4b6f50ba468fa7ac6b1
|
fbc695f9b8e2084697e19c1355ab925f99f0d235 |
|
15-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Implement suspend checks in new compiler."" This reverts commit 7e3652c45c30c1f2f840e6088e24e2db716eaea7. Change-Id: Ib489440c34e41cba9e9e297054f9274f6e81a2d8
|
8a16d97fb8f031822b206e65f9109a071da40563 |
|
11-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix valgrind errors. For now just stack allocate the code generator. Will think about cleaning up the root problem later (CodeGenerator being an arena object). Change-Id: I161a6f61c5f27ea88851b446f3c1e12ee9c594d7
|
e50383288a75244255d3ecedcc79ffe9caf774cb |
|
04-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Support fields in optimizing compiler. - Required support for temporaries, to be only used by baseline compiler. - Also fixed a few invalid assumptions around locations and instructions that don't need materialization. These instructions should not have an Out. Change-Id: Idc4a30dd95dd18015137300d36bec55fc024cf62
|
31d76b42ef5165351499da3f8ee0ac147428c5ed |
|
09-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Plug code generator into liveness analysis. Also implement spill slot support. Change-Id: If5e28811e9fbbf3842a258772c633318a2f4fafc
|
ec7e4727e99aa1416398ac5a684f5024817a25c7 |
|
06-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix some bugs in graph construction/simplification methods. Also fix a brano during SSA construction. The code should not have been commented out. Added a test to cover what the code intends. Change-Id: Ia00ae79dcf75eb0d412f07649d73e7f94dbfb6f0
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
a7062e05e6048c7f817d784a5b94e3122e25b1ec |
|
22-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add a linear scan register allocator to the optimizing compiler. This is a "by-the-book" implementation. It currently only deals with allocating registers, with no hint optimizations. The changes remaining to make it functional are: - Allocate spill slots. - Resolution and placements of Move instructions. - Connect it to the code generator. Change-Id: Ie0b2f6ba1b98da85425be721ce4afecd6b4012a4
|
ddb311fdeca82ca628fed694c4702f463b5c4927 |
|
16-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Build live ranges in preparation for register allocation. Change-Id: I7ae24afaa4e49276136bf34f4ba7d62db7f28c01
|