2dd7b672ea0afd7ea4448b43d24829e9886de3af |
|
07-Dec-2017 |
Aart Bik <ajcbik@google.com> |
Fixed spilling bug (visible on ARM64): missed SIMD type. Test: test-art-host test-art-target Change-Id: I6f321446f54943e02f250732ec9da729f633c3a9
|
e764d2e50c544c2cb98ee61a15d613161ac6bd17 |
|
05-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
Use ScopedArenaAllocator for register allocation. Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB This is because all the memory previously used by Scheduler is reused by the register allocator; the register allocator has a higher peak usage of the ArenaStack. And continue the "arena"->"allocator" renaming. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
|
ca6fff898afcb62491458ae8bcd428bfb3043da1 |
|
03-Oct-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Use ScopedArenaAllocator for pass-local data. Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
0ebe0d83138bba1996e9c8007969b5381d972b32 |
|
21-Sep-2017 |
Vladimir Marko <vmarko@google.com> |
ART: Introduce compiler data type. Replace most uses of the runtime's Primitive in compiler with a new class DataType. This prepares for introducing new types, such as Uint8, that the runtime does not need to know about. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 23964345 Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
5e3afa950f05bca470ef6b92460940f37831c27f |
|
20-Sep-2017 |
Aart Bik <ajcbik@google.com> |
Ensure extract is seen as having scalar result. Rationale: Extracting from a vector yields a scalar, yet our parallel mover and one DCHECK did not account for that fact (note that moving towards a vector type system will prevent such errors). Regression test for this is part of the SAD CL. Test: test-art-host test-art-target Bug: 64091002 Change-Id: Id154edd1a069c54e7d8da069c368dea0a8f973f4
|
c9c310487b8730fce5edfa72e79c4188629898a3 |
|
29-Jun-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Turn a few DCHECK into CHECKs. To help diagnose b/63070152. bug: 63070152 Test: test.py Change-Id: I1ac1cf9bfe1bc15ecfa94b5b8537cd3afda6fd14
|
82b0740f03b1a6acab4558214d3edc362e27e238 |
|
01-Mar-2017 |
Vladimir Marko <vmarko@google.com> |
Use IntrusiveForwardList<> for Env-/UsePosition. Test: m test-art-host-gtest Test: testrunner.py --host Change-Id: I2b720e2ed8f96303cf80e9daa6d5278bf0c3da2f
|
f8f5a16ed7bad1e18179e38453e59c96a944de10 |
|
07-Feb-2017 |
Aart Bik <ajcbik@google.com> |
ART vectorizer. Rationale: Make SIMD great again with a retargetable and easily extendable vectorizer. Provides a full x86/x86_64 and a proof-of-concept ARM implementation. Sample improvement (without any perf tuning yet) for Linpack on x86 is about 20% to 50%. Test: test-art-host, test-art-target (angler) Bug: 34083438, 30933338 Change-Id: Ifb77a0f25f690a87cd65bf3d5e9f6be7ea71d6c1
|
5576f3741c58cb8b5fb2f68f3b3a9415efe05f4f |
|
24-Mar-2017 |
Aart Bik <ajcbik@google.com> |
Implement a SIMD spilling slot. Rationale: The last ART vectorizer break-out CL \O/ This ensures spilling on x86 and x86_4 is correct. Also, it paves the way to wider SIMD on ARM and MIPS. Test: test-art-host Bug: 34083438 Change-Id: I5b27d18c2045f3ab70b64c335423b3ff2a507ac2
|
cc89525c13894247cb82a1973617da6cba286f0c |
|
21-Mar-2017 |
Aart Bik <ajcbik@google.com> |
Change 1/2 spill slots to more general number of spill slots. Rationale: This prepares requesting a different number of spill slots during SIMD vectorization. Bug: 34083438 Test: test-art-host, test-art-host-gtest-register_allocator_test Change-Id: I6d22966ba483deec72b5eea5061c403c12b2ada7
|
2c45bc9137c29f886e69923535aff31a74d90829 |
|
25-Oct-2016 |
Vladimir Marko <vmarko@google.com> |
Remove H[Reverse]PostOrderIterator and HInsertionOrderIterator. Use range-based loops instead, introducing helper functions ReverseRange() for iteration in reverse order in containers. When the contents of the underlying container change inside the loop, use an index-based loop that better exposes the container data modifications, compared to the old iterator interface that's hiding it which may lead to subtle bugs. Test: m test-art-host Change-Id: I2a4e6c508b854c37a697fc4b1e8423a8c92c5ea0
|
9620230700d4b451097c2163faa70627c9d8088a |
|
05-Oct-2016 |
Aart Bik <ajcbik@google.com> |
Refactoring of graph linearization and linear order. Rationale: Ownership of graph's linear order and iterators was a bit unclear now that other phases are using it. New approach allows phases to compute their own order, while ssa_liveness is sole owner for graph (since it is not mutated afterwards). Also shortens lifetime of loop's arena. Test: test-art-host Change-Id: Ib7137d1203a1e0a12db49868f4117d48a4277f30
|
20e9db6db787e007e7032878c9899b28ec43e93f |
|
14-Sep-2016 |
Aart Bik <ajcbik@google.com> |
Make LinearizeGraph() public (and move it to nodes files) Rationale: It is strange that HLinearOrderIterator is defined (and visible) in nodes.h, but clients have no way to build this order. This CL makes the building available at the usual place. Change-Id: Ib66f2edf6dfc8edd6b429bd4bea3ac7e37440b28 Tests: m test-art
|
30f766688006813ce90f42160c4b31112e90da60 |
|
02-Sep-2016 |
David Brazdil <dbrazdil@google.com> |
Cache result of an expensive DCHECK LiveInterval::AddBackEdgeUses tests whether linear order is well formed on debug builds. This is expensive and can significantly hinder compilation times when many back edge uses are added. This patch moves the IsLinearOrderWellFormed test at the end of linear order generation. Bug: 31163119 Change-Id: Ic4fe66bee2055f4b2cb065d9451ad5f21ba00676
|
d9ffd0dd7266f6a5e76f29d98dbe1a04f64cbb9b |
|
22-Jun-2016 |
Matthew Gharrity <gharrma@google.com> |
Implement a graph coloring register allocator Test: m test-art-host Change-Id: I8c0d77f339ab02b33588a54b96ecce5c8322cfce
|
e90049140fdfb89080e5cc9b000b0c9be8c18bcd |
|
16-Jun-2016 |
Vladimir Marko <vmarko@google.com> |
Create a typedef for HInstruction::GetInputs() return type. And some other cleanup after https://android-review.googlesource.com/230742 Test: No new tests. ART test suite passed (tested on host). Change-Id: I4743bf17544d0234c6ccb46dd0c1b9aae5c93e17
|
372f10e5b0b34e2bb6e2b79aeba6c441e14afd1f |
|
17-May-2016 |
Vladimir Marko <vmarko@google.com> |
Refactor handling of input records. Introduce HInstruction::GetInputRecords(), a new virtual function that returns an ArrayRef<> to all input records. Implement all other functions dealing with input records as wrappers around GetInputRecords(). Rewrite functions that previously used multiple virtual calls to deal with input records, especially in loops, to prefetch the ArrayRef<> only once for each instruction. Besides avoiding all the extra calls, this also allows the compiler (clang++) to perform additional optimizations. This speeds up the Nexus 5 boot image compilation by ~0.5s (4% of "Compile Dex File", 2% of dex2oat time) on AOSP ToT. Change-Id: Id8ebe0fb9405e38d918972a11bd724146e4ca578
|
d7c2fdc939bb7efb3e7204d62e54c6a3f7d77f9b |
|
10-May-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix another case of live_in at irreducible loop entry. GVN was implicitly extending the liveness of an instruction across an irreducible loop. Fix this problem by clearing the value set at loop entries that contain an irreducible loop. bug:28252896 (cherry picked from commit 77ce6430af2709432b22344ed656edd8ec80581b) Change-Id: Ie0121e83b2dfe47bcd184b90a69c0194d13fce54
|
77ce6430af2709432b22344ed656edd8ec80581b |
|
10-May-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix another case of live_in at irreducible loop entry. GVN was implicitly extending the liveness of an instruction across an irreducible loop. Fix this problem by clearing the value set at loop entries that contain an irreducible loop. bug:28252896 Change-Id: I68823cb88dceb4c2b4545286ba54fd0c958a48b0
|
d59f3b1b7f5c1ab9f0731ff9dc60611e8d9a6ede |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Bug: 27856014 (cherry picked from commit 46817b876ab00d6b78905b80ed12b4344c522b6c) Change-Id: Ifb2d7b357064b003244e92c0d601d81a05e56a7b
|
46817b876ab00d6b78905b80ed12b4344c522b6c |
|
29-Mar-2016 |
Vladimir Marko <vmarko@google.com> |
Use iterators "before" the use node in HUserRecord<>. Create a new template class IntrusiveForwardList<> that mimicks std::forward_list<> except that all allocations are handled externally. This is essentially the same as boost::intrusive::slist<> but since we're not using Boost we have to reinvent the wheel. Use the new container to replace the HUseList and use the iterators to "before" use nodes in HUserRecord<> to avoid the extra pointer to the previous node which was used exclusively for removing nodes from the list. This reduces the size of the HUseListNode by 25%, 32B to 24B in 64-bit compiler, 16B to 12B in 32-bit compiler. This translates directly to overall memory savings for the 64-bit compiler but due to rounding up of the arena allocations to 8B, we do not get any improvement in the 32-bit compiler. Compiling the Nexus 5 boot image with the 64-bit dex2oat on host this CL reduces the memory used for compiling the most hungry method, BatteryStats.dumpLocked(), by ~3.3MiB: Before: MEM: used: 47829200, allocated: 48769120, lost: 939920 Number of arenas allocated: 345, Number of allocations: 815492, avg size: 58 ... UseListNode 13744640 ... After: MEM: used: 44393040, allocated: 45361248, lost: 968208 Number of arenas allocated: 319, Number of allocations: 815492, avg size: 54 ... UseListNode 10308480 ... Note that while we do not ship the 64-bit dex2oat to the device, the JIT compilation for 64-bit processes is using the 64-bit libart-compiler. Bug: 28173563 Change-Id: I985eabd4816f845372d8aaa825a1489cf9569208
|
3563c44464ca55b2106373b35110e5ecaae38abf |
|
18-Apr-2016 |
Vladimir Marko <vmarko@google.com> |
Fix inlining loops in OSR mode. When compiling a method in OSR mode and the method does not contain a loop (arguably, a very odd case) but we inline another method with a loop and then the final DCE re-runs the loop identification, the inlined loop would previously be marked as irreducible. However, the SSA liveness analysis expects irreducible loop to have extra loop Phis which were already eliminated from the loop before the inner graph was inlined to the outer graph, so we would fail a DCHECK(). We fix this by not marking inlined loops as irreducible when compiling in OSR mode. Bug: 28210356 (cherry picked from commit fd66c50d64c38e40bafde83b4872e27bbff7546d) Change-Id: I149273b766d1c713c571baad6033c5f70e6dd960
|
fd66c50d64c38e40bafde83b4872e27bbff7546d |
|
18-Apr-2016 |
Vladimir Marko <vmarko@google.com> |
Fix inlining loops in OSR mode. When compiling a method in OSR mode and the method does not contain a loop (arguably, a very odd case) but we inline another method with a loop and then the final DCE re-runs the loop identification, the inlined loop would previously be marked as irreducible. However, the SSA liveness analysis expects irreducible loop to have extra loop Phis which were already eliminated from the loop before the inner graph was inlined to the outer graph, so we would fail a DCHECK(). We fix this by not marking inlined loops as irreducible when compiling in OSR mode. Bug: 28210356 Change-Id: If10057ed883333c62a878ed2ae3fe01bb5280e33
|
badd826664896d4a9628a5a89b78016894aa414b |
|
02-Feb-2016 |
David Brazdil <dbrazdil@google.com> |
ART: Run SsaBuilder from HGraphBuilder First step towards merging the two passes, which will later result in HGraphBuilder directly producing SSA form. This CL mostly just updates tests broken by not being able to inspect the pre-SSA form. Using HLocals outside the HGraphBuilder is now deprecated. Bug: 27150508 Change-Id: I00fb6050580f409dcc5aa5b5aa3a536d6e8d759e
|
674f519fe00ae07e0db90c4374f785bb418ae332 |
|
02-Feb-2016 |
David Brazdil <dbrazdil@google.com> |
ART: Enable multi-level instruction inlining Change-Id: I4b4c927d7b1598dc197793c25185fb079dec7fe1
|
b3e773eea39a156b3eacf915ba84e3af1a5c14fa |
|
26-Jan-2016 |
David Brazdil <dbrazdil@google.com> |
ART: Implement support for instruction inlining Optimizing HIR contains 'non-materialized' instructions which are emitted at their use sites rather than their defining sites. This was not properly handled by the liveness analysis which did not adjust the use positions of the inputs of such instructions. Despite the analysis being incorrect, the current use cases never produce incorrect code. This patch generalizes the concept of inlined instructions and updates liveness analysis to set the compute use positions correctly. Change-Id: Id703c154b20ab861241ae5c715a150385d3ff621
|
15bd22849ee6a1ffb3fb3630f686c2870bdf1bbc |
|
05-Jan-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement irreducible loop support in optimizing. So we don't fallback to the interpreter in the presence of irreducible loops. Implications: - A loop pre-header does not necessarily dominate a loop header. - Non-constant redundant phis will be kept in loop headers, to satisfy our linear scan register allocation algorithm. - while-graph optimizations, such as gvn, licm, lse, and dce need to know when they are dealing with irreducible loops. Change-Id: I2cea8934ce0b40162d215353497c7f77d6c9137e
|
ec7802a102d49ab5c17495118d4fe0bcc7287beb |
|
01-Oct-2015 |
Vladimir Marko <vmarko@google.com> |
Add DCHECKs to ArenaVector and ScopedArenaVector. Implement dchecked_vector<> template that DCHECK()s element access and insert()/emplace()/erase() positions. Change the ArenaVector<> and ScopedArenaVector<> aliases to use the new template instead of std::vector<>. Remove DCHECK()s that have now become unnecessary from the Optimizing compiler. Change-Id: Ib8506bd30d223f68f52bd4476c76d9991acacadc
|
2aaa4b5532d30c4e65d8892b556400bb61f9dc8c |
|
17-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag more arena allocations. Replace GrowableArray with ArenaVector and tag arena allocations with new allocation types. As part of this, make the register allocator a bit more efficient, doing bulk insert/erase. Some loops are now O(n) instead of O(n^2). Change-Id: Ifac0871ffb34b121cc0447801a2d07eefd308c14
|
fa6b93c4b69e6d7ddfa2a4ed0aff01b0608c5a3a |
|
15-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag arena allocations in HGraph. Replace GrowableArray with ArenaVector in HGraph and related classes HEnvironment, HLoopInformation, HInvoke and HPhi, and tag allocations with new arena allocation types. Change-Id: I3d79897af405b9a1a5b98bfc372e70fe0b3bc40d
|
77a48ae01bbc5b05ca009cf09e2fcb53e4c8ff23 |
|
15-Sep-2015 |
David Brazdil <dbrazdil@google.com> |
Revert "Revert "ART: Register allocation and runtime support for try/catch"" The original CL triggered b/24084144 which has been fixed by Ib72e12a018437c404e82f7ad414554c66a4c6f8c. This reverts commit 659562aaf133c41b8d90ec9216c07646f0f14362. Change-Id: Id8980436172457d0fcb276349c4405f7c4110a55
|
659562aaf133c41b8d90ec9216c07646f0f14362 |
|
14-Sep-2015 |
David Brazdil <dbrazdil@google.com> |
Revert "ART: Register allocation and runtime support for try/catch" Breaks libcore test org.apache.harmony.security.tests.java.security.KeyStorePrivateKeyEntryTest#testGetCertificateChain. Need to investigate. This reverts commit b022fa1300e6d78639b3b910af0cf85c43df44bb. Change-Id: Ib24d3a80064d963d273e557a93469c95f37b1f6f
|
b022fa1300e6d78639b3b910af0cf85c43df44bb |
|
20-Aug-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Register allocation and runtime support for try/catch This patch completes a series of CLs that add support for try/catch in the Optimizing compiler. With it, Optimizing can compile all methods containing try/catch, provided they don't contain catch loops. Future work will focus on improving performance of the generated code. SsaLivenessAnalysis was updated to propagate liveness information of instructions live at catch blocks, and to keep location information on instructions which may be caught by catch phis. RegisterAllocator was extended to spill values used after catch, and to allocate spill slots for catch phis. Catch phis generated for the same vreg share a spill slot as the raw value must be the same. Location builders and slow paths were updated to reflect the fact that throwing an exception may not lead to escaping the method. Instruction code generators are forbidden from using of implicit null checks in try blocks as live registers need to be saved before handing over to the runtime. CodeGenerator emits a stack map for each catch block, storing locations of catch phis. CodeInfo and StackMapStream recognize this new type of stack map and store them separate from other stack maps to avoid dex_pc conflicts. After having found the target catch block to deliver an exception to, QuickExceptionHandler looks up the dex register maps at the throwing instruction and the catch block and copies the values over to their respective locations. The runtime-support approach was selected because it allows for the best performance in the normal control-flow path, since no propagation of catch phi values is necessary until the exception is thrown. In addition, it also greatly simplifies the register allocation phase. ConstantHoisting was removed from LICMTest because it instantiated (now abstract) HConstant and was bogus anyway (constants are always in the entry block). Change-Id: Ie31038ad8e3ee0c13a5bbbbaf5f0b3e532310e4e
|
6058455d486219994921b63a2d774dc9908415a2 |
|
03-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag basic block allocations with their source. Replace GrowableArray with ArenaVector in HBasicBlock and, to track the source of allocations, assign one new and two Quick's arena allocation types to these vectors. Rename kArenaAllocSuccessor to kArenaAllocSuccessors. Bug: 23736311 Change-Id: Ib52e51698890675bde61f007fe6039338cf1a025
|
145acc5361deb769eed998f057bc23abaef6e116 |
|
03-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Optimizing: Tag basic block allocations with their source." Reverting so that we can have more discussion about the STL API. This reverts commit 91e11c0c840193c6822e66846020b6647de243d5. Change-Id: I187fe52f2c16b6e7c5c9d49c42921eb6c7063dba
|
91e11c0c840193c6822e66846020b6647de243d5 |
|
02-Sep-2015 |
Vladimir Marko <vmarko@google.com> |
Optimizing: Tag basic block allocations with their source. Replace GrowableArray with ArenaVector in HBasicBlock and, to track the source of allocations, assign one new and two Quick's arena allocation types to these vectors. Rename kArenaAllocSuccessor to kArenaAllocSuccessors. Bug: 23736311 Change-Id: I984aef6e615ae2380a532f5c6726af21015f43f5
|
681652d8e8a33bc07c5c082a71aea13d0f15e0a0 |
|
23-Jul-2015 |
Mingyao Yang <mingyao@google.com> |
HDeoptimize should hold values live in env. Values that are not live in compiled code anymore may still be needed in interpreter, due to code motion, etc. (cherry-picked from commit 718493c6c3c8e380663cb8a94e57ce160a6c473f) Bug: 22665511 Change-Id: I8b85833c5c462f8fe36f86d6026a51b07563995a
|
718493c6c3c8e380663cb8a94e57ce160a6c473f |
|
23-Jul-2015 |
Mingyao Yang <mingyao@google.com> |
HDeoptimize should hold values live in env. Values that are not live in compiled code anymore may still be needed in interpreter, due to code motion, etc. Bug: 22665511 Change-Id: I8b85833c5c462f8fe36f86d6026a51b07563995a
|
94015b939060f5041d408d48717f22443e55b6ad |
|
04-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect."" Fix was to special case baseline for x86, which does not have enough registers to allocate the current method. This reverts commit c345f141f11faad177aa9635a78088d00cf66086. Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
|
c345f141f11faad177aa9635a78088d00cf66086 |
|
04-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Use HCurrentMethod in HInvokeStaticOrDirect." Fails on baseline/x86. This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a. Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
|
38207af82afb6f99c687f64b15601ed20d82220a |
|
01-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Use HCurrentMethod in HInvokeStaticOrDirect. Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
|
8272688499c2232355db34d94057983fd436173d |
|
01-Jun-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Tweak one hint and one split in the linear scan. - Return a hinted register if it is available. Otherwise another move will be necessary. - Use SplitBetween instead of raw split when a register is not fully available. This will find the best split position. Change-Id: Ie464e536204ab556eb09345fe6426621eb86e5ac
|
0a23d74dc2751440822960eab218be4cb8843647 |
|
07-May-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Add a parent environment to HEnvironment. This code has no functionality change. It adds a placeholder for chaining inlined frames. Change-Id: I5ec57335af76ee406052345b947aad98a6a4423a
|
db216f4d49ea1561a74261c29f1264952232728a |
|
05-May-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Relax the only one back-edge restriction. The rule is in the way for better register allocation, as it creates an artificial join point between multiple paths. Change-Id: Ia4392890f95bcea56d143138f28ddce6c572ad58
|
fbda5f3e1378f07ae202f62da625ee43a063a052 |
|
29-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Find better split positions in the register allocator. In a standard if/else control flow graph, this avoids doing a move in one branch if the other branch decided to move an interval. This also needs a new register hint kind, which is what was the location of the interval at the predecessor block. Change-Id: I18b78264587b4d693540fbb5e014d12df2add3e2
|
579026039080252878106118645ed70706f4838e |
|
21-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Add synthesize uses at back edge. This reduces the cost of linearizing the graph (hence removing the notion of back edge). Since linear scan allocates/spills registers based on next use, adding a use at a back edge ensures we do count for loop uses. Change-Id: Idaa882cb120edbdd08ca6bff142d326a8245bd14
|
4ed947a58de87d19d0609be773207c905ccb0f7f |
|
27-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Dissociate uses with environment uses. They are most of the times in the way when iterating. They also complicate the logic of (future) back edge uses. Change-Id: I152595d9913073fe901b267ca623fa0fe7432484
|
241a486267bdb59b32fe4c8db370eb936068fb39 |
|
16-Apr-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Replace expensive calls to Covers in reg alloc LiveInterval::Covers is implemented as a linear-time search over liveness ranges and can therefore be rather expensive and should be avoided unless necessary. This patch replaces calls to Covers when searching for a sibling with the cheaper IsDefinedAt call. Change-Id: I93fc73529c15a518335f4cbdc3a0def52d9501e5
|
0d9f17de8f21a10702de1510b73e89d07b3b9bbf |
|
15-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Move the linear order to the HGraph. Bug found by Zheng Xu: SsaLivenessAnalysis being a stack allocated object, we should not refer to it in later phases of the compiler. Specifically, the code generator was using the linear order, which was stored in the liveness analysis object. Change-Id: I574641f522b7b86fc43f3914166108efc72edb3b
|
d8126bef62df7f40f2e6abc74004f52e664daf45 |
|
27-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix locations at environment uses. We were too agressive in not recording environment uses when the instruction was not of type object. We have to record the use to the use list of an interval, but it should not affect the live ranges of that interval. Change-Id: Id16fb7cc06f14083766d408a345837793583b6ea
|
f01d34445953e6b9c9b13de1dd32a5c0ee5abab5 |
|
27-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement a proper solution for temps. We used to play some trickery when updating locations of temps. This change creates a proper use of the temp, and use it for updating its location. Change-Id: I53e9447b87a55137a3a79841db21ad3864854825
|
46e2a3915aa68c77426b71e95b9f3658250646b7 |
|
16-Mar-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Boolean simplifier The optimization recognizes the negation pattern generated by 'javac' and replaces it with a single condition. To this end, boolean values are now consistently assumed to be represented by an integer. This is a first optimization which deletes blocks from the HGraph and does so by replacing the corresponding entries with null. Hence, existing code can continue indexing the list of blocks with the block ID, but must check for null when iterating over the list. Change-Id: I7779da69cfa925c6521938ad0bcc11bc52335583
|
915b9d0c13bb5091875d868fbfa551d7b65d7477 |
|
11-Mar-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Tweak liveness when instructions are used in environments. Instructions remain live when debuggable, but only instructions with object types remain live when non-debuggable. Enable StackVisitor::GetThisObject for optimizing. Change-Id: Id87b2cbf33a02450059acc9993995782e5f28987
|
5b8e6a594b827f7dc88b2e3d895e08f5b3f22446 |
|
25-Feb-2015 |
David Brazdil <dbrazdil@google.com> |
ART: Cache last returned range in LiveInterval::Covers Optimizing spends ~10% of compilation time in the register allocator. One of the frequently called methods is LiveInterval::Covers which has linear complexity w.r.t. the number of gaps in liveness intervals. This patch leverages the fact that the register allocator calls Covers with non-decreasing position values and caches the last returned result to start the iteration closer to the result the next time the method is invoked. Stats from compiling the framework show that this optimization reduces the average number of iterations needed to find the result by 40%. Change-Id: I4dd26b900879d5e1d03818ebc1e117cc6a53053c
|
da02afe615191a19eae9a039786c4c4fc20dbfff |
|
11-Feb-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Support hints for register pairs. Change-Id: Ia49dc5bf3e9a2bd481425bfe7fbeea9feb66c8e6
|
c0572a451944f78397619dec34a38c36c11e9d2a |
|
06-Feb-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Optimize leaf methods. Avoid suspend checks and stack changes when not needed. Change-Id: I0fdb31e8c631e99091b818874a558c9aa04b1628
|
ed59619b370ef23ffbb25d1d01f615e60a9262b6 |
|
23-Jan-2015 |
David Brazdil <dbrazdil@google.com> |
Optimizing: Speed up HEnvironment use removal Removal of use records from HEnvironment vregs involved iterating over potentially large linked lists which made compilation of huge methods very slow. This patch turns use lists into doubly-linked lists, stores pointers to the relevant nodes inside HEnvironment and subsequently turns the removals into constant-time operations. Change-Id: I0e1d4d782fd624e7b8075af75d4adf0a0634a1ee
|
840e5461a85f8908f51e7f6cd562a9129ff0e7ce |
|
07-Jan-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement double and float support for arm in register allocator. The basic approach is: - An instruction that needs two registers gets two intervals. - When allocating the low part, we also allocate the high part. - When splitting a low (or high) interval, we also split the high (or low) equivalent. - Allocation follows the (S/D register) requirement that low registers are always even and the high equivalent is low + 1. Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
|
a8eed3acbc39c71ec22dc2943e71eaa07c6507dd |
|
24-Nov-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Fix the computation of linear ordering."" PS2 fixes the obvious typos/wrong refactoring. This reverts commit e50fa5887b1342b845826197d81950e26753fc9c. Change-Id: I22f81d63a12cf01aafd61535abc2399d936d49c2
|
e50fa5887b1342b845826197d81950e26753fc9c |
|
24-Nov-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Fix the computation of linear ordering." Build is broken. This reverts commit 3054a90063d379ab8c9e5a42a7daf0d644b48b07. Change-Id: I259bc2bd6a58e30391b8176f3db5fdb5c07e4d6d
|
3054a90063d379ab8c9e5a42a7daf0d644b48b07 |
|
21-Nov-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix the computation of linear ordering. The register allocator makes assumptions on the order, and we ended up not computing the right one. The algorithm worked fine when the loop header is the block branching to the exit, but in the presence of breaks or do/while, it was incorrect. Change-Id: Iad0a89872cd3f7b7a8b2bdf560f0d03493f93ba5
|
277ccbd200ea43590dfc06a93ae184a765327ad0 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: More warnings Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general, and -Wunused-but-set-parameter for GCC builds. Change-Id: I81bbdd762213444673c65d85edae594a523836e5
|
296bd60423e0630d8152b99fb7afb20fbff5a18a |
|
07-Oct-2014 |
Mingyao Yang <mingyao@google.com> |
Some improvement to reg alloc. Change-Id: If579a37791278500a7e5bc763f144c241f261920
|
102cbed1e52b7c5f09458b44903fe97bb3e14d5f |
|
15-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Implement register allocator for floating point registers. Also: - Fix misuses of emitting the rex prefix in the x86_64 assembler. - Fix movaps code generation in the x86_64 assembler. Change-Id: Ib6dcf6e7c4a9c43368cfc46b02ba50f69ae69cbe
|
56b9ee6fe1d6880c5fca0e7feb28b25a1ded2e2f |
|
09-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Stop converting from Location to ManagedRegister. Now the source of truth is the Location object that knows which register (core, pair, fpu) it needs to refer to. Change-Id: I62401343d7479ecfb24b5ed161ec7829cda5a0b1
|
01ef345767ea609417fc511e42007705c9667546 |
|
01-Oct-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add trivial register hints to the register allocator. - Add hints for phis, same as first input, and expected registers. - Make the if instruction accept non-condition instructions. Change-Id: I34fa68393f0d0c19c68128f017b7a05be556fbe5
|
8ddb00ca935733f5d3b07816e5bb33d6cabe6ec4 |
|
29-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Improve detection of lifetime holes. The check concluding that the next use was in a successor was too conservative: two blocks following each other in terms of liveness are not necessarily predecessor/sucessor. Change-Id: Ideec98046c812aa5fb63781141b5fde24c706d6d
|
8a16d97fb8f031822b206e65f9109a071da40563 |
|
11-Sep-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix valgrind errors. For now just stack allocate the code generator. Will think about cleaning up the root problem later (CodeGenerator being an arena object). Change-Id: I161a6f61c5f27ea88851b446f3c1e12ee9c594d7
|
e77493c7217efdd1a0ecef521a6845a13da0305b |
|
21-Aug-2014 |
Ian Rogers <irogers@google.com> |
Make common BitVector operations inline-able. Change-Id: Ie25de4fae56c6712539f04172c42e3eff57df7ca
|
e50383288a75244255d3ecedcc79ffe9caf774cb |
|
04-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Support fields in optimizing compiler. - Required support for temporaries, to be only used by baseline compiler. - Also fixed a few invalid assumptions around locations and instructions that don't need materialization. These instructions should not have an Out. Change-Id: Idc4a30dd95dd18015137300d36bec55fc024cf62
|
31d76b42ef5165351499da3f8ee0ac147428c5ed |
|
09-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Plug code generator into liveness analysis. Also implement spill slot support. Change-Id: If5e28811e9fbbf3842a258772c633318a2f4fafc
|
ec7e4727e99aa1416398ac5a684f5024817a25c7 |
|
06-Jun-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Fix some bugs in graph construction/simplification methods. Also fix a brano during SSA construction. The code should not have been commented out. Added a test to cover what the code intends. Change-Id: Ia00ae79dcf75eb0d412f07649d73e7f94dbfb6f0
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
a7062e05e6048c7f817d784a5b94e3122e25b1ec |
|
22-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add a linear scan register allocator to the optimizing compiler. This is a "by-the-book" implementation. It currently only deals with allocating registers, with no hint optimizations. The changes remaining to make it functional are: - Allocate spill slots. - Resolution and placements of Move instructions. - Connect it to the code generator. Change-Id: Ie0b2f6ba1b98da85425be721ce4afecd6b4012a4
|
a5b8fde2d2bc3167078694fad417fddfe442a6fd |
|
23-May-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite BitVector index iterator. The BitVector::Iterator was not iterating over the bits but rather over indexes of the set bits. Therefore, we rename it to IndexIterator and provide a BitVector::Indexes() to get a container-style interface with begin() and end() for range based for loops. Also, simplify InsertPhiNodes where the tmp_blocks isn't needed since the phi_nodes and input_blocks cannot lose any blocks in subsequent iterations, so we can do the Union() directly in those bit vectors and we need to repeat the loop only if we have new input_blocks, rather than on phi_nodes change. And move the temporary bit vectors to scoped arena. Change-Id: I6cb87a2f60724eeef67c6aaa34b36ed5acde6d43
|
ddb311fdeca82ca628fed694c4702f463b5c4927 |
|
16-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Build live ranges in preparation for register allocation. Change-Id: I7ae24afaa4e49276136bf34f4ba7d62db7f28c01
|
0d3f578909d0d1ea072ca68d78301b6fb7a44451 |
|
14-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Linearize the graph before creating live ranges. Change-Id: I02eb5671e3304ab062286131745c1366448aff58
|
f635e63318447ca04731b265a86a573c9ed1737c |
|
14-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add a compilation tracing mechanism to the new compiler. Code mostly imported from: https://android-review.googlesource.com/#/c/81653/. Change-Id: I150fe942be0fb270e03fabb19032180f7a065d13
|
622d9c31febd950255b36a48b47e1f630197c5fe |
|
12-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Add loop recognition and CFG simplifications in new compiler. We do three simplifications: - Split critical edges, for code generation from SSA (new). - Ensure one back edge per loop, to simplify loop recognition (new). - Ensure only one pre header for a loop, to simplify SSA creation (existing). Change-Id: I9bfccd4b236a00486a261078627b091c8a68be33
|
804d09372cc3d80d537da1489da4a45e0e19aa5d |
|
02-May-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Build live-in, live-out and kill sets for each block. This information will be used when computing live ranges of instructions. Change-Id: I345ee833c1ccb4a8e725c7976453f6d58d350d74
|