a26cb57f46fd3f27a930d9d688fe8670c1f24754 |
|
23-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
ART stack unwinding fixes for libunwind/gdb/lldb. dex2oat can already generate unwinding and symbol information which allows tools to create backtrace of mixed native and Java code. This is a cherry pick from aosp/master which fixes several issues. Most notably: * It enables generation of ELF-64 on 64-bit systems (in dex2oat, C compilers already produce ELF-64). Libunwind requires ELF-64 on 64-bit systems for backtraces to work. * It enables loading of ELF files with dlopen. This is required for libunwind to be able to generate backtrace of current process (i.e. the process requesting backtrace of itself). * It adds unit test to test the above (32 vs 64 bit, in-proces vs out-of-process, application code vs framework code). * Some other fixes or clean-ups which should not be of much significance but which are easier to include to make the important CLs cherry-pick cleanly. This is squash of the following commits from aosp/master: 7381010 ART: CFI Test e1bbed2 ART: Blacklist CFI test for non-compiled run-tests aab9f73 ART: Blacklist CFI test for JIT 4437219 ART: Blacklist CFI test for Heap Poisoning a3a49fe Switch to using ELF-64 for 64-bit architectures. 297ed22 Write 64-bit address in DWARF if we are on 64-bit architecture. 24981a1 Set correct size of PT_PHDR ELF segment. 1a146bf Link .dynamic to .dynstr 67a0653 Make some parts of ELF more (pointer) aligned. f50fa82 Enable 64-bit CFI tests. 49e1fab Use dlopen to load oat files. 5dedb80 Add more logging output for dlopen. aa03870 Find the dlopened file using address rather than file path. 82e73dc Release dummy MemMaps corresponding to dlopen. 5c40961 Test that we can unwind framework code. 020c543 Add more log output to the CFI test. 88da3b0 ART: Fix CFI test wrt/ PIC a70e5b9 CFI test: kill the other process in native code. ad5fa8c Support generation of CFI in .debug_frame format. 90688ae Fix build - large frame size of ElfWriterQuick<ElfTypes>::Write. 97dabb7 Fix build breakage in dwarf_test. 388d286 Generate just single ARM mapping symbol. f898087 Split .oat_patches to multiple sections. 491a7fe Fix build - large frame size of ElfWriterQuick<ElfTypes>::Write (again). 8363c77 Add --generate-debug-info flag and remove the other two flags. 461d72a Generate debug info for core.oat files. Bug: 21924613 Change-Id: I3f944a08dd2ed1df4d8a807da4fee423fdd35eb7
|
b7fd412dd21eb362931b3a0716c94fd189a66295 |
|
04-Jun-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Quick: Create GC map based on compiler data. DO NOT MERGE" This reverts commit 7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7. Change-Id: Iadb4462bf8e834c6a847c01ee6eb332a325de22c
|
c8d000a12d853a72999c96e3b73587bad2be6954 |
|
04-Jun-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE" This reverts commit fad2cbf97c71b9742ccd88cc1a5ba13fa918e677. Change-Id: I175dd9e49014b71a300d987678032bd624a99cf1
|
fad2cbf97c71b9742ccd88cc1a5ba13fa918e677 |
|
25-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE Follow-up to https://android-review.googlesource.com/143222 (cherry picked from commit 6e07183e822a32856da9eb60006989496e06a9cc) Change-Id: I916743c845d9568063cd6a4b2ef71e9cbc43dee8
|
7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. DO NOT MERGE The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 (cherry picked from commit 767c752fddc64e280dba507457e4f06002b5f678) Change-Id: Id75428fd0a2f6bdd2ccb20ce75cdeab01150e455
|
3d21bdf8894e780d349c481e5c9e29fe1556051c |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
c2d3221037734567442c5fb9b11a7967b5c87b79 |
|
07-May-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Abolish kMirOpCheckPart2. The tricks played with kMirOpCheckPart2 are making the native GC map generation unnecessarily complex. They have caused problems in the past and now there is bad interaction with the DCE. Rather than fixing it time and again, remove the pseudo-insn. (The whole purpose of those tricks seems to be to allow the register tracking to be used for the throwing insn before resetting the tracking for the next block. However, it's questionable whether that's better than processing the throwing insn with the subsequent instructions.) Bug: 20736048 (cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb) Change-Id: I8a60d26c5e6b6b608d68b8bb6b66d411f9a28f90
|
f80552b7e5f627a5dd07af017b7d65dec010ca48 |
|
07-May-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Abolish kMirOpCheckPart2. The tricks played with kMirOpCheckPart2 are making the native GC map generation unnecessarily complex. They have caused problems in the past and now there is bad interaction with the DCE. Rather than fixing it time and again, remove the pseudo-insn. (The whole purpose of those tricks seems to be to allow the register tracking to be used for the throwing insn before resetting the tracking for the next block. However, it's questionable whether that's better than processing the throwing insn with the subsequent instructions.) Bug: 20736048 (cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb) Change-Id: I8a60d26c5e6b6b608d68b8bb6b66d411f9a28f90
|
0b49f0234e5c01de23501b6a70f09f491fe7b3e0 |
|
07-May-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Abolish kMirOpCheckPart2. The tricks played with kMirOpCheckPart2 are making the native GC map generation unnecessarily complex. They have caused problems in the past and now there is bad interaction with the DCE. Rather than fixing it time and again, remove the pseudo-insn. (The whole purpose of those tricks seems to be to allow the register tracking to be used for the throwing insn before resetting the tracking for the next block. However, it's questionable whether that's better than processing the throwing insn with the subsequent instructions.) Bug: 20736048 (cherry picked from commit e299f167c9559401548eab71678d4b779e46c2fb) Change-Id: Ifae6c5bd961a2619b50fd3440261762cb9151460
|
e299f167c9559401548eab71678d4b779e46c2fb |
|
07-May-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Abolish kMirOpCheckPart2. The tricks played with kMirOpCheckPart2 are making the native GC map generation unnecessarily complex. They have caused problems in the past and now there is bad interaction with the DCE. Rather than fixing it time and again, remove the pseudo-insn. (The whole purpose of those tricks seems to be to allow the register tracking to be used for the throwing insn before resetting the tracking for the next block. However, it's questionable whether that's better than processing the throwing insn with the subsequent instructions.) Bug: 20736048 Change-Id: I4767e4609914d3b6990da4416e5093e4ca209780
|
2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
8dc7324da5bd0f2afd2ab558ab04882329a61fe8 |
|
12-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Add --include-cfi compiler option. Decouple generation of CFI from the rest of debug symbols. This makes it possible to generate oat with CFI but without the rest of debug symbols. This is in line with intention of the .eh_frame section. The section does not have the .debug_ prefix because it is considered somewhat different to the rest of debug symbols. Change-Id: I32816ecd4f30ac4e0dc69d69a4993e349c737f96
|
c6b4dd8980350aaf250f0185f73e9c42ec17cd57 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Implement CFI for Optimizing. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: I1a3480e3a4a99f48bf7e6e63c4e83a80cfee40a2
|
1961b609bfefaedb71cee3651c4f931cc3e7393d |
|
08-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: PC-relative loads from dex cache arrays on x86. Rewrite all PC-relative addressing on x86 and implement PC-relative loads from dex cache arrays. Don't adjust the base to point to the start of the method, let it point to the anchor, i.e. the target of the "call +0" insn. Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
|
1109fb3cacc8bb667979780c2b4b12ce5bb64549 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Implement CFI for Quick. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
|
8c57831b2b07185ee1986b9af68a351e1ca584c3 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Remove the old CFI infrastructure. Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
|
cc23481b66fd1f2b459d82da4852073e32f033aa |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
|
b207e1473dda1730604a28db2b4fa52f2998aeae |
|
02-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Pass linker patches around as const. Change-Id: I0eabd713d29475db9eb6e186f331dbfb00e0cf6b
|
6f7158927fee233255f8e96719c374694b10cad3 |
|
30-Mar-2015 |
David Srbecky <dsrbecky@google.com> |
Write .debug_line section using the new DWARF library. Also simplify dex to java mapping and handle mapping in prologues and epilogues. Change-Id: I410f06024580f2a8788f2c93fe9bca132805029a
|
20f85597828194c12be10d3a927999def066555e |
|
19-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
|
356a1811f2f79d98194475fdbfb5f6b7768455b5 |
|
27-Mar-2015 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
Quick: Finding upper half of kMirOpCheckPart2 should passthough empty blocks Mir2Lir::InitReferenceVRegs trying to find throwing instruction for kMirOpCheckPart2 should traverse possible empty blocks which compiler optimizations could generate between them. Change-Id: I2ab29dd36635fd4c4ef2dd81b51e571e206775e6 Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
|
6e07183e822a32856da9eb60006989496e06a9cc |
|
25-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Fix "select" pattern to update data used for GC maps. Follow-up to https://android-review.googlesource.com/143222 Change-Id: I1c12af9a19f76e64fd209f6cc2eaec5587b3083b
|
f6737f7ed741b15cfd60c2530dab69f897540735 |
|
23-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
|
767c752fddc64e280dba507457e4f06002b5f678 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 Change-Id: Id3a2f863b16bdb8969df7004c868773084aec421
|
6ea651f0f4c7de4580beb2e887d86802c1ae0738 |
|
24-Feb-2015 |
Maja Gagic <maja.gagic@imgtec.com> |
Initial support for quick compiler on MIPS64r6. Change-Id: I6f43027b84e4a98ea320cddb972d9cf39bf7c4f8
|
d37f91902048b23ad5fe5b20aba0ebc92e0b4896 |
|
04-Mar-2015 |
Andreas Gampe <agampe@google.com> |
ART: Do not produce CFI when not asked for Insignificant time savings on the host, but also reduces native allocation size. Change-Id: Iea3d335e5375a0076306059d094e5b994e24b9e6
|
80b96d1a76790527f72a660ac03d9c215eed17ce |
|
19-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Replace a few std::vector with ArenaVector in Mir2Lir. Change-Id: I7867d60afc60f57cdbbfd312f02883854d65c805
|
a78ef44266c38cc4895554e973156a7c7896dd87 |
|
12-Feb-2015 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Fix InsertCaseLabel to return boundary_lir always This patch doesn't return new_label when cu_->verbose, because we will not assign offsets to new_label at this stage. Change-Id: Ie7f625848b0cf7cabfbba694b5c20b0784bc8501 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
72f53af0307b9109a1cfc0671675ce5d45c66d3a |
|
12-Nov-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Remove MIRGraph::dex_pc_to_block_map_ This patch removes MIRGraph::dex_pc_to_block_map_, adds a local variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and updates several functions to pass dex_pc_to_block_map. The goal is to limit the scope of dex_pc_to_block_map and the usage of FindBlock, so that various compiler optimizations cannot rely on dex pc to look up basic blocks to avoid duplicated dex pc issues. Also, this patch changes quick targets to use successor blocks for switch case target generation at Mir2Lir::InstallSwitchTables(). Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
9c462086269324350516b3394d478f1d71a4b5d1 |
|
27-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Even more Quick cleanup Remove Backend. Change-Id: I247cc65ccda6a362ba1a8f5e73e7f12ecd980a87
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
7e499925f8b4da46ae51040e9322690f3df992e6 |
|
06-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
|
e21dc3db191df04c100620965bee4617b3b24397 |
|
09-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Swap-space in the compiler Introduce a swap-space and corresponding allocator to transparently switch native allocations to memory backed by a file. Bug: 18596910 (cherry picked from commit 62746d8d9c4400e4764f162b22bfb1a32be287a9) Change-Id: I131448f3907115054a592af73db86d2b9257ea33
|
62746d8d9c4400e4764f162b22bfb1a32be287a9 |
|
09-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Swap-space in the compiler Introduce a swap-space and corresponding allocator to transparently switch native allocations to memory backed by a file. Bug: 18596910 Change-Id: I131448f3907115054a592af73db86d2b9257ea33
|
717a3e447c6f7a922cf9c3efe522747a187a045d |
|
13-Nov-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Re-factor Quick ABI support Now every architecture must provide a mapper between VRs parameters and physical registers. Additionally as a helper function architecture can provide a bulk copy helper for GenDalvikArgs utility. All other things becomes a common code stuff: GetArgMappingToPhysicalReg, GenDalvikArgsNoRange, GenDalvikArgsRange, FlushIns. Mapper now uses shorty representation of input parameters. This is required due to location are not enough to detect the type of parameter (fp or core). For the details see https://android-review.googlesource.com/#/c/113936/. Change-Id: Ie762b921e0acaa936518ee6b63c9a9d25f83e434 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
27dee8bcd7b4a53840b60818da8d2c819ef199bd |
|
02-Dec-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
X86_64 QBE: use RIP addressing Take advantage of RIP addressing in 64 bit mode to improve the code generation for accesses to the constant area as well as packed switches. Avoid computing the address of the start of the method, which is needed in 32 bit mode. To do this, we add a new 'pseudo-register' kRIPReg to minimize the changes needed to get the new addressing mode to be generated. Change-Id: Ia28c93f98b09939806d91ff0bd7392e58996d108 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
ab972ef472001fa113d54486d7592979e33480b3 |
|
04-Dec-2014 |
Mathieu Chartier <mathieuc@google.com> |
Remove method verification results right after compiling a method This saves memory since it allows the code arrays from methods compiled in future methods to use the ram we just freed from the verification results. GmsCore.apk: Before: dex2oat took 77.383s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=77MB free=13KB After: dex2oat took 72.180s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=60MB free=13KB Bug: 18596910 Change-Id: I5d6df380e4fe58751a2b304202083f4d30b33b7c (cherry picked from commit 25fda92083d5b93b38cc1f6b12ac6a44d992d6a4)
|
25fda92083d5b93b38cc1f6b12ac6a44d992d6a4 |
|
04-Dec-2014 |
Mathieu Chartier <mathieuc@google.com> |
Remove method verification results right after compiling a method This saves memory since it allows the code arrays from methods compiled in future methods to use the ram we just freed from the verification results. GmsCore.apk: Before: dex2oat took 77.383s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=77MB free=13KB After: dex2oat took 72.180s (threads: 2) arena alloc=6MB java alloc=30MB native alloc=60MB free=13KB Bug: 18596910 Change-Id: I5d6df380e4fe58751a2b304202083f4d30b33b7c
|
7ab2fce83cd72c0963128b098a78606e77ea15d5 |
|
28-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Refactor handling of conditional branches with known result. Detect IF_cc and IF_ccZ instructions with known results in the basic block optimization phase (instead for the codegen phase) and replace them with GOTO/NOP. Kill blocks that are unreachable as a result. Change-Id: I169c2fa6f1e8af685f4f3a7fe622f5da862ce329
|
743b98cd3d7db1cfd6b3d7f7795e8abd9d07a42d |
|
24-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Skip null check in MarkGCCard() for known non-null values. Use GVN's knowledge of non-null values to set a new MIR flag for IPUT/SPUT/APUT to skip the value null check. Change-Id: I97a8d1447acb530c9bbbf7b362add366d1486ee1
|
807140048f82a2b87ee5bcf337f23b6a3d1d5269 |
|
21-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Add fast string sharpening String sharpening changes const strings to PC relative loads instead of always going through the dex cache. This saves code size and probably improves performance slightly. Before: 49602992 system@framework@boot.oat After: 49385904 system@framework@boot.oat Pre-cursor to removing dex_cache_strings_ field from ArtMethod. Bug: 17643507 Change-Id: I1787f48774631eee0accafeea257aa8d0e91e8d6
|
bf535be514570fc33fc0a6347a87dcd9097d9bfd |
|
19-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
|
785d2f2116bb57418d81bb55b55a087afee11053 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: Replace COMPILE_ASSERT with static_assert (compiler) Replace all occurrences of COMPILE_ASSERT in the compiler tree. Change-Id: Icc40a38c8bdeaaf7305ab3352a838a2cd7e7d840
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
d8c3e3608a7b47e82186e4f8118541ef06d9eab2 |
|
08-Oct-2014 |
Alexei Zavjalov <alexei.zavjalov@intel.com> |
ART: X86: GenLongArith should handle overlapped VRs In a case, when src and dest VRs are overlapped when we called GenLongArith it may cause the incorrect use of regs. The solution is to map src to an physical reg and work with this reg instead of mem. Renamed BadOverlap() to PartiallyIntersects() for consistency. Change-Id: Ia3fc7f741f0a92556e1b2a1b084506662ef04c9d Signed-off-by: Katkov, Serguei I <serguei.i.katkov@intel.com> Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
|
27cc09337cdff14f592f4e22fd235809ebe0d6a7 |
|
08-Sep-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: oat patches should be 32-bit ints. This makes the arm64 backend consistent with the behaviour of the code in oat_writer.cc and in the patchoat tool. It also reduces the size of boot.oat by 1.6% (aosp_arm64-eng build). Change-Id: Ia0b96737159c08955cd7b776ee396ff578cd58f6
|
750359753444498d509a756fa9a042e9f3c432df |
|
12-Sep-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Deprecate CompilationUnit's code_item The code_item field is tracked in both the CompilationUnit and the MIRGraph. However, the existence of this field in CompilationUnit promotes bad practice because it creates assumption only a single code_item can be part of method. This patch deprecates this field and updates MIRGraph methods to make it easy to get same information as before. Part of this is the update to interface GetNumDalvikInsn which ensures to count all code_items in MIRGraph. Some dead code was also removed because it was not friendly to these updates. Change-Id: Ie979be73cc56350321506cfea58f06d688a7fe99 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
f4da675bbc4615c5f854c81964cac9dd1153baea |
|
01-Aug-2014 |
Vladimir Marko <vmarko@google.com> |
Implement method calls using relative BL on ARM. Store the linker patches with each CompiledMethod instead of keeping them in CompilerDriver. Reorganize oat file creation to apply the patches as we're writing the method code. Add framework for platform-specific relative call patches in the OatWriter. Implement relative call patches for ARM. Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
|
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 |
|
22-Sep-2014 |
Vladimir Marko <vmarko@google.com> |
Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
|
589e046c483ca0dbee6c28fb617997f43ee28b94 |
|
05-Sep-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Slow path should break def tracking Slow path usually results in invocation of runtime. Runtime should be ensured that all VR on stack are up to date. To do this we reset def tracking system at the moment of adding slow path and as a result actual writes to stack for all VRs will not be optimized away. The decision is conservative to be safe however probably not all runtime calls can potentially require VRs to be on stack. In this case we will need insert reset def tracking in all places where dangerous slow path is used. Change-Id: I2cb7698a12c17354060fdbb944e1da1fb922c23b Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
de0b996661351450fa4d918706c5322e001c29c9 |
|
27-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix read-out-of-bounds in the compiler In case of a wide dalvik register, asking for the constant value can lead to a read out of bounds. Bug: 17302671 (cherry picked from commit ade731854d18839823e57fb2d3d67238c5467d15) Change-Id: Ie1849cd67cc418c97cbd7a8524f027f9b66e4c96
|
ade731854d18839823e57fb2d3d67238c5467d15 |
|
27-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix read-out-of-bounds in the compiler In case of a wide dalvik register, asking for the constant value can lead to a read out of bounds. Bug: 17302671 Change-Id: Ie1849cd67cc418c97cbd7a8524f027f9b66e4c96
|
8d0d03e24325463f0060abfd05dba5598044e9b1 |
|
07-Jun-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
e3ea83811d47152c00abea24a9b420651a33b496 |
|
08-Aug-2014 |
Yevgeny Rouban <yevgeny.y.rouban@intel.com> |
ART source line debug info in OAT files OAT files have source line information enough for ART runtime needs like jump to/from interpreter and thread suspension. But this information is not enough for finer grained source level debugging and low-level profiling (VTune or perf). This patch adds to OAT files two additional sections: .debug_line - DWARF formatted Elf32 section with detailed source line information (mapping from native PC to Java source lines). In addition to the debugging symbols added using the dex2oat option --include-debug-symbols, the source line information is added to the section .debug_line. The source line info can be read by many Elf reading tools like objdump, readelf, dwarfdump, gdb, perf, VTune, ... gdb can use this debug line information in x86. In 64-bit mode the information can be used if the oat file is mapped in the lower address space (address has higher 32 bits zeroed). Relocation works. Testing: 1. art/test/run-test --host --gdb [--64] 001-HelloWorld 2. in gdb: break Main.java:19 3. in gdb: break Runtime.java:111 4. in gdb: run - stops at void java.lang.Runtime.<init>() 5. in gdb: backtrace - shows call stack down to main() 6. in gdb: continue - stops at void Main.main() (only in 32-bit mode) 7. in gdb: backtrace - shows call stack down to main() 8. objdump -W <oat-file> - addresses are from VMA range of .text section reported by objdump -h <file> 9. dwarfdump -ka <oat-file> - no errors expected Size of aosp-x86-eng boot.oat increased by 11% from 80.5Mb to 89.2Mb with two sections added .debug_line (7.2Mb) and .rel.debug (1.5Mb). Change-Id: Ib8828832686e49782a63d5529008ff4814ed9cda Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
|
4fc785398707ede68f29768748b7fe5fa39dde24 |
|
07-Aug-2014 |
Fred Shih <ffred@google.com> |
Fixed build breakage due to incorrect class TypeId. Fixed incorrect type id being inserted in code buffer and got rid of inefficient pointer wrapping in LoadClassType. Change-Id: I7ee1d957ebcd816445c26199723ac50787d926d7
|
e7f82e2515f47f3c3292281312d7031a34a58ffc |
|
06-Aug-2014 |
Fred Shih <ffred@google.com> |
Added support for patching classes from different dex files. Added support for class patching from different dex files and moved ScopedObjectAccess from the quick compiler to driver. Slight refactoring for clarity. Bug: 16656190 Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
|
547cdfd21ee21e4ab9ca8692d6ef47c62ee7ea52 |
|
05-Aug-2014 |
Tong Shen <endlessroad@google.com> |
Emit CFI for x86 & x86_64 JNI compiler. Now for host-side x86 & x86_64 ART, we are able to get complete stacktrace with even mixed C/C++ & Java stack frames. Testing: 1. art/test/run-test --host --gdb [--64] --no-relocate 005 2. In gdb, run 'b art::Class_classForName' which is implementation of a Java native method, then 'r' 3. In gdb, run 'bt'. You should see stack frames down to main() Change-Id: I2d17e9aa0f6d42d374b5362a15ea35a2fce96302
|
8081d2b8d7a743729557051d0294e040e61c747a |
|
31-Jul-2014 |
Vladimir Marko <vmarko@google.com> |
Create allocator adapter for using Arena in std containers. Create ArenaAllocatorAdapter, similar to the existing ScopedArenaAllocatorAdapter, for allocating memory for standard containers via the ArenaAllocator. Add the ability to specify allocation kind rather than just kArenaAllocSTL to both adapters. Move the scoped arena allocator to the scoped_arena_containers.h header file. Define template aliases for containers using the new adapter and change a few MIRGraph and Mir2Lir members to use them. Change-Id: I9bbc50248e0fed81729497b848cb29bf68444268
|
147eb41b53729ec8d5c188d1cac90964a51afb8a |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
|
69dfe51b684dd9d510dbcb63295fe180f998efde |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
|
ccc60264229ac96d798528d2cb7dbbdd0deca993 |
|
05-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework TargetReg(symbolic_reg, wide) Make the standard implementation in Mir2Lir and the specialized one in the x86 backend return a pair when wide = "true". Introduce WideKind enumeration to improve code readability. Simplify generic code based on this implementation. Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
|
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
|
c380191f3048db2a3796d65db8e5d5a5e7b08c65 |
|
08-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
x86_64: Enable fp-reg promotion Patch introduces 4 register XMM12-15 available for promotion of fp virtual registers. Change-Id: I3f89ad07fc8ae98b70f550eada09be7b693ffb67 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
0025a86411145eb7cd4971f9234fc21c7b4aced1 |
|
11-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
|
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf |
|
29-May-2014 |
Dave Allison <dallison@google.com> |
Add implicit null and stack checks for x86 This adds compiler and runtime changes for x86 implicit checks. 32 bit only. Both host and target are supported. By default, on the host, the implicit checks are null pointer and stack overflow. Suspend is implemented but not switched on. Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
|
3d14eb620716e92c21c4d2c2d11a95be53319791 |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Add implicit null and stack checks for x86" It breaks cross compilation with x86_64. This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf. Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
|
a77ee5103532abb197f492c14a9e6fb437054e2a |
|
02-Jul-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
x86_64: TargetReg update for x86 Also includes changes in common code. Elimination of use of TargetReg with one parameter and direct access to special target registers. Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 |
|
22-Jun-2014 |
buzbee <buzbee@google.com> |
Register promotion support for 64-bit targets Not sufficiently tested for 64-bit targets, but should be fairly close. A significant amount of refactoring could stil be done, (in later CLs). With this change we are not making any changes to the vmap scheme. As a result, it is a requirement that if a vreg is promoted to both a 32-bit view and the low half of a 64-bit view it must share the same physical register. We may change this restriction later on to allow for more flexibility for 32-bit Arm. For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to promote, we'd end up with something like: v4 (as an int) -> r10 v4/v5 (as a long) -> r10 v5 (as an int) -> r11 v5/v6 (as a long) -> r11 Fix a couple of ARM64 bugs on the way... Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
|
4b537a851b686402513a7c4a4e60f5457bb8d7c1 |
|
01-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Quick compiler: More size checks, add TargetReg variants Add variants for TargetReg for requesting specific register usage, e.g., wide and ref. More register size checks. With code adapted from https://android-review.googlesource.com/#/c/98605/. Change-Id: I852d3be509d4dcd242c7283da702a2a76357278d
|
de68676b24f61a55adc0b22fe828f036a5925c41 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
3c12c512faf6837844d5465b23b9410889e5eb11 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d |
|
23-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
8dea81ca9c0201ceaa88086b927a5838a06a3e69 |
|
06-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
85089dd28a39dd20f42ac258398b2a08668f9ef1 |
|
26-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: generalize NarrowRegLoc() Some of the RegStorage utilites (DoubleToLowSingle(), DoubleToHighSingle(), etc.) worked only for targets which which treat double precision registers as a pair of aliased single precision registers. This CL elminates those utilities, and replaces them with a new RegisterInfo utility that will search an aliased register set and return the member matching the required storage configuration (if it exists). Change-Id: Iff5de10f467d20a56e1a89df9fbf30d1cf63c240
|
a51a0b0300268b605e3ad71b0e87ff394032c5e7 |
|
21-May-2014 |
Vladimir Marko <vmarko@google.com> |
Method inlining across dex files in boot image. Fix LoadCodeAddress() and LoadMethodAddress() to use the dex file in addition to the method index to uniquely identify the literal. With that fix in place, when we have both the direct code and the direct method, we can safely pass the actual target method id instead of the method id from the same dex file in the method lowering info. This was already done for calls from apps into boot image (and thus there was a bug with a tiny risk of the wrong literal being used) and now we also do that for calls within the boot image. The latter allows the inlining pass to inline many more methods than before in the boot image. Bug: 15021903 Change-Id: Ic765ce9809b43ef07e7db32b8e3fbc9acb09147f
|
b01bf15d18f9b08d77e7a3c6e2897af0e02bf8ca |
|
14-May-2014 |
buzbee <buzbee@google.com> |
64-bit temp register support. Add a 64-bit temp register allocation path. The recent physical register handling rework supports multiple views of the same physical register (or, such as for Arm's float/double regs, different parts of the same physical register). This CL adds a 64-bit core register view for 64-bit targets. In short, each core register will have a 64-bit name, and a 32-bit name. The different views will be kept in separate register pools, but aliasing will be tracked. The core temp register allocation routines will be largely identical - except for 32-bit targets, which will continue to use pairs of 32-bit core registers for holding long values. Change-Id: I8f118e845eac7903ad8b6dcec1952f185023c053
|
700a402244a1a423da4f3ba8032459f4b65fa18f |
|
20-May-2014 |
Ian Rogers <irogers@google.com> |
Now we have a proper C++ library, use std::unique_ptr. Also remove the Android.libcxx.mk and other bits of stlport compatibility mechanics. Change-Id: Icdf7188ba3c79cdf5617672c1cfd0a68ae596a61
|
d65c51a556e6649db4e18bd083c8fec37607a442 |
|
29-Apr-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
ART: Add support for constant vector literals Add in some vector instructions. Implement the ConstVector instruction, which takes 4 words of data and loads it into an XMM register. Initially, only the ConstVector MIR opcode is implemented. Others will be added after this one goes in. Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
e45fb9e7976c8462b94a58ad60b006b0eacec49f |
|
06-May-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: Change arm64 backend to produce A64 code. The arm backend clone is changed to produce A64 code. At the moment this backend can only compile simple methods (both leaf and non-leaf). Most of the work on the assembler (assembler_arm64.cc) has been done. Some work on the LIR generation layer (functions such as OpRegRegImm & friends) is still necessary. The register allocator still needs to be adapted to the A64 instruction set (it is mostly unchanged from the arm backend). Offsets for helpers in gen_invoke.cc still need to be changed to work on 64-bit. Change-Id: I388f99eeb832857981c7d9d5cb5b71af64a4b921
|
72d32629303f8f39362a4099481f48646aed042f |
|
07-May-2014 |
Ian Rogers <irogers@google.com> |
Give Compiler a back reference to the driver. The compiler driver is a single object delegating work to the compiler, rather than passing it through to every Compiler call make it a member of Compiler so that it maybe queried. This simplifies the Compiler API and makes the relationship to CompilerDriver more explicit. Remove reference arguments that contravene code style. Change-Id: Iba47f2e3cbda679a7ec7588f26188d77643aa2c6
|
660188264dee3c8f3510e2e24c11816c6b60f197 |
|
06-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Use utils.h::RoundUp instead of explicit bit-fiddling Change-Id: I249a2cfeb044d3699d02e13d42b8e72518571640
|
f29a4244bbc278843237f0ae242de077e093b580 |
|
05-May-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
x86_64: Fix frame size calculation for 64-bit Calculate frame size in the same way as calculated in patch "64bit changes to the stack walker for the Quick ABI" Change-Id: I8c2458f5973536a84f3fd6ad56167b5cfafa9ab4 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
8194963098247be6bca9cc4a54dbfa65c73e8ccc |
|
02-May-2014 |
Vladimir Marko <vmarko@google.com> |
Replace CountOneBits and __builtin_popcount with POPCOUNT. Clean up utils.h, make some functions constexpr. Change-Id: I2399100280cbce81c3c4f5765f0680c1ddcb5883
|
ff093b31d75658c3404f9b51ee45760f346f06d9 |
|
01-May-2014 |
Ian Rogers <irogers@google.com> |
Fix a few 64-bit compilation of 32-bit code issues. Bug: 13423943 Change-Id: I939389413af0a68c0d95b23cd598b7c42afa4383
|
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 |
|
25-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
|
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6 |
|
29-Apr-2014 |
buzbee <buzbee@google.com> |
Quick compiler: debugging assists A few minor assists to ease A/B debugging in the Quick compiler: 1. To save time, the assemblers for some targets only update the object code offsets on instructions involved with pc-relative fixups. We add code to fix up all offsets when doing a verbose codegen listing. 2. Temp registers are normally allocated in a round-robin fashion. When disabling liveness tracking, we now reset the round-robin pool to 0 on each instruction boundary. This makes it easier to spot real codegen differences. 3. Self-register copies were previously emitted, but marked as nops. Minor change to avoid generating them in the first place and reduce clutter. Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
|
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb |
|
19-Apr-2014 |
buzbee <buzbee@google.com> |
Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
3a74d15ccc9a902874473ac9632e568b19b91b1c |
|
22-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Delete throw launchpads. Bug: 13170824 Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
|
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b |
|
10-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
f9487c039efb4112616d438593a2ab02792e0304 |
|
09-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
|
081f73e888b3c246cf7635db37b7f1105cf1a2ff |
|
07-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
754ddad084ccb610d0cf486f6131bdc69bae5bc6 |
|
19-Feb-2014 |
Dave Allison <dallison@google.com> |
Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
6a58cb16d803c9a7b3a75ccac8be19dd9d4e520d |
|
02-Apr-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
art: Handle x86_64 architecture equal to x86 This patch forces FE/ME to treat x86_64 as x86 exactly. The x86_64 logic will be revised later when assembly will be ready. Change-Id: I4a92477a6eeaa9a11fd710d35c602d8d6f88cbb6 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
|
f943914730db8ad2ff03d49a2cacd31885d08fd7 |
|
27-Mar-2014 |
Dave Allison <dallison@google.com> |
Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
92cf83e001357329cbf41fa15a6e053fab6f4933 |
|
18-Mar-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Run Java tests with the optimizing compiler. Also fix a vector.reserve -> vector.resize braino, and build a GC map that dex2oat expects. Change-Id: I6acf2f90a4c32f90b79bf7709bf2e43931b98757
|
3bc8615332b7848dec8c2297a40f7e4d176c0efb |
|
13-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Use LIRSlowPath for intrinsics, improve String.indexOf(). Rewrite intrinsic launchpads to use the LIRSlowPath. Improve String.indexOf for constant chars by avoiding the check for code points over 0xFFFF. Change-Id: I7fd5583214c5b4ab9c38ee36c5d6f003dd6345a8
|
49161cef10a308aedada18e9aa742498d6e6c8c7 |
|
12-Mar-2014 |
Jeff Hao <jeffhao@google.com> |
Allow patching between dex files in the boot classpath. Change-Id: I53f219a5382d0fcd580e96e50025fdad4fc399df
|
83cc7ae96d4176533dd0391a1591d321b0a87f4f |
|
12-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
|
a1a7074eb8256d101f7b5d256cda26d7de6ce6ce |
|
03-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite kMirOpSelect for all IF_ccZ opcodes. Also improve special cases for ARM and add tests. Change-Id: I06f575b9c7b547dbc431dbfadf2b927151fe16b9
|
2da882315a61072664f7ce3c212307342e907207 |
|
27-Feb-2014 |
Andreas Gampe <agampe@google.com> |
Initial changes towards Generic JNI option Some initial changes that lead to an UNIMPLEMENTED. Works by not compiling for JNI right now and tracking native methods which have neither quick nor portable code. Uses new trampoline. Change-Id: I5448654044eb2717752fd7359f4ef8bd5c17be6e
|
be0e546730e532ef0987cd4bde2c6f5a1b14dd2a |
|
26-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Cache field lowering info in mir_graph. Change-Id: I9f9d76e3ae6c31e88bdf3f59820d31a625da020f
|
ae9fd93c39a341e2dffe15c61cc7d9e841fa92c4 |
|
11-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Tell GDB about Quick ART generated code This is actually a lot of work. To do this, we need: .debug_info .debug_abbrev .debug_frame .debug_str These are generated into the OAT file by OatWriter and ElfWriterQuick. Since the Quick ART runtime doesn't use dlopen to load the OAT files, GDB can't find this information. Use the alternate GDB JIT interface, which can be invoked at runtime. To use this interface, an ELF image needs to be built in memory. Read the information from the OAT file, fixup the addresses to point to the real locations, add a symbol table to hold the .text symbol, and then let GDB know about the information, which will be read from the runtime address space. This is quite primitive now, and could be cleaned up considerably. It probably needs symbol table entries for the methods, and descriptions of parameters and return types. Currently only supported for X86. This defaults to enabled for debug builds. Added dexoat --gen-gdb-info and --no-gen-gdb-info flags to override. Change-Id: I4d18b2370f6dfaa00c8cc1925f10717be3bd1a62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2e589aa58a1372909f95e731fd6b8895f6359c3a |
|
25-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Encode VmapTable entries offset by 2 to reduce size. We're using special values 0xffff and 0xfffe for an fp register marker and for method pointer, respectively. These values were being encoded as 3 bytes each and this changes their encoding to 1 byte. Bug: 9437697 Change-Id: Ic1720e898b131a5d3f6ca87d8e1ecdf76fb4160a
|
3bc01748ef1c3e43361bdf520947a9d656658bf8 |
|
06-Feb-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
GenSpecialCase support for x86 Moved GenSpecialCase from being ARM specific to common code to allow it to be used by x86 quick as well. Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
55d0eac918321e0525f6e6491f36a80977e0d416 |
|
06-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Support Direct Method/Type access for X86 Thumb generates code to optimize calls to methods within core.oat. Implement this for X86 as well, but take advantage of mov with 32 bit immediate and call relative with 32 bit immediate. Fix some incorrect return locations for long inlines. Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bcec6fba95ee7974d3f7b81c3c02e7eb3ca3df00 |
|
17-Jan-2014 |
Dave Allison <dallison@google.com> |
Make slow paths easier to write This adds a class LIRSlowPath that allows for deferred compilation of slow paths. Using this object you can add code that will be invoked out of line using a forward branch. The intention is to move the slow paths out of the main flow and avoid branch-over constructs that will almost always trigger. The forward branch to the slow path code will be predicted false and this will be correct most of the time. The slow path code returns to the instruction after the original branch using an unconditional branch. This is used in the following opcodes: sput, sget, const-string, check-cast, const-class. Others will follow. Bug: 10864890 Change-Id: I17130c5dc20d369bc6bbf50b8cf04343263e888e
|
d69835d841cb7663faaa2f1996e73e8c0b3f6d76 |
|
03-Feb-2014 |
buzbee <buzbee@google.com> |
Art Compiler: fix compiler temps AOSP CL 78835 "Enable compiler temporaries" built on some earlier work to enable the compiler to add temps in the style of Dalvik's vRegs during MIR optimizations. However, it missed an existing fixed-size array whose size depended on the number of temps allocated. The allocation of this array must be delayed until after the number of compiler temps is known. The result was array overrun, and strange failures. Change-Id: I986a3b557e2323e00ba852584de03a02931b3c78
|
21caf91a53d34be300fee7aef4589990dce0fbde |
|
03-Feb-2014 |
buzbee <buzbee@google.com> |
Art Compiler: fix compiler temps Fix for b/12871909 "Art compiler segfaults in several top apps" AOSP CL 78835 "Enable compiler temporaries" built on some earlier work to enable the compiler to add temps in the style of Dalvik's vRegs during MIR optimizations. However, it missed an existing fixed-size array whose size depended on the number of temps allocated. The allocation of this array must be delayed until after the number of compiler temps is known. The result was array overrun, and strange failures. Change-Id: I986a3b557e2323e00ba852584de03a02931b3c78
|
da7a69b3fa7bb22d087567364b7eb5a75824efd8 |
|
09-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Enable compiler temporaries Compiler temporaries are a facility for having virtual register sized space for dealing with intermediate values during MIR transformations. They receive explicit space in managed frames so they can have a home location in case they need to be spilled. The facility also supports "special" temporaries which have specific semantic purpose and their location in frame must be tracked. The compiler temporaries are treated in the same way as virtual registers so that the MIR level transformations do not need to have special logic. However, generated code needs to know stack layout so that it can distinguish between home locations. MIRGraph has received an interface for dealing with compiler temporaries. This interface allows allocation of wide and non-wide virtual register temporaries. The information about how temporaries are kept on stack has been moved to stack.h. This is was necessary because stack layout is dependent on where the temporaries are placed. Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
2730db03beee4d6687ddfb5000c33c0370fbc6eb |
|
27-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Add VerfiedMethod to DexCompilationUnit. Avoid some mutex locking and map lookups. Change-Id: I8e0486af77e38dcd065569572a6b985eb57f4f63
|
c7f832061fea59fd6abd125f26c8ca1faec695a5 |
|
24-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Refactor verification results. Rename VerificationMethodsData to VerificationResults. Create new class VerifiedMethod to hold all the data for a given method. Change-Id: Ife1ac67cede20f3a2f9c7f5345f08a851cf1ed20
|
766e9295d2c34cd1846d81610c9045b5d5093ddd |
|
27-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve GenConstString, GenS{get,put} for x86 Rewrite GenConstString for x86 to skip calling ResolveString when the string is already resolved. Also try to avoid a register copy if the Method* is in a promoted register. Implement the TODO for GenS{get,put} to use compare to memory for x86 by adding a new codegen function to compare directly to memory. Implement a default implementation that uses a temporary register for RISC architectures. Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
4708dcd68eebf1173aef1097dad8ab13466059aa |
|
22-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve x86 long multiply and shifts Generate inline code for long shifts by constants and do long multiplication inline. Convert multiplication by a constant to a shift when we can. Fix some x86 assembler problems and add the new instructions that were needed (64 bit shifts). Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
107c31e598b649a8bb8d959d6a0377937e63e624 |
|
24-Jan-2014 |
Ian Rogers <irogers@google.com> |
64bit friendly printf modifiers in LIR dumping. Also correct header file inclusion ordering. Change-Id: I8fb99e80cf1487e8b2278d4c1d110d14ed18c086
|
be1ca55db3362f5b100c4c65da5342fd299520bb |
|
15-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Use direct class pointers at allocation sites in the compiled code. - Rather than looking up a class from its type ID (and checking if it's resolved/initialized, resolving/initializing if not), use direct class pointers, if possible (boot-code-to-boot-class pointers and app-code-to-boot-class pointers.) - This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4. - Embedding the object size (along with class pointers) caused a 1-2% slowdown in MemAllocTest and isn't implemented in this change. - TODO: do the same for array allocations. - TODO: when/if an application gets its own image, implement app-code-to-app-class pointers. - Fix a -XX:gc bug. cf. https://android-review.googlesource.com/79460/ - Add /tmp/android-data/dalvik-cache to the list of locations to remove oat files in clean-oat-host. cf. https://android-review.googlesource.com/79550 - Add back a dropped UNLIKELY in FindMethodFromCode(). cf. https://android-review.googlesource.com/74205 Bug: 9986565 Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
|
5115473c81ec855a5646a5f755afb26aa7f2b1e9 |
|
02-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Fix oatdump "compilercallbacks" option for runtime. The "compilercallbacks" runtime option replaced "compiler" in I708ca13227c809e07917ff3879a89722017e83a9 . Fix a comment in codegen_util.cc . Change-Id: I2c5ebd56dd96f0ee8e62b602bfe45357565471ff
|
5816ed48bc339c983b40dc493e96b97821ce7966 |
|
27-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Detect special methods at the end of verification. This moves special method handling to method inliner and prepares for eventual inlining of these methods. Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
|
2b5eaa2b49f7489bafdadc4b4463ae27e4261817 |
|
13-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Move compiler code out of method verifier. We want to detect small methods for inlining at the end of the method verification. Instead of adding more compiler code to the runtime, we create a callback from the runtime into the compiler, so that we can keep the code there. Additionally, we move the compiler-related code that was already in the method verifier to the compiler since it doesn't really belong to the runtime in the first place. Change-Id: I708ca13227c809e07917ff3879a89722017e83a9
|
8171fc34bf74ed0df02385787d916bc13eb7f160 |
|
26-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Don't prefix GC map by length. Bug: 11767815 Change-Id: I063917aefdf7674ee1a77736db059c9ee95ea075
|
06606b9c4a1c00154ed15f719ad8ea994e54ee8e |
|
02-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Performance improvement for mapping table creation. Avoid the raw mapping tables altogether. Change-Id: I6d1c786325d369e899a75f15701edbafdd14363f
|
1e6cb63d77090ddc6aa19c755d7066f66e9ff87e |
|
28-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Delta-encoding of mapping tables. Both PC offsets and dalvik offsets are delta-encoded. Since PC offsets are increasing, the deltas are then compressed as unsigned LEB128. Dalvik offsets are not monotonic, so their deltas are compressed as signed LEB128. This reduces the size of the mapping tables by about 30% on average, 25% from the PC offset and 5% from the dalvik offset delta encoding. Bug: 9437697 Change-Id: I600ab9c22dec178088d4947a811cca3bc8bd4cf4
|
5c96e6b4dc354a7439b211b93462fbe8edea5e57 |
|
14-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Rewrite intrinsics detection. Intrinsic methods should be treated as a special case of inline methods. They should be detected early and used to guide other optimizations. This CL rewrites the intrinsics detection so that it can be moved to any compilation phase. Change-Id: I4424a6a869bd98b9c478953c9e3bcaf1c6de2b33
|
0b1191cfece83f6f8d4101575a06555a2d13387a |
|
28-Oct-2013 |
Bill Buzbee <buzbee@google.com> |
Revert "Revert "Null check elimination improvement"" This reverts commit 31aa97cfec5ee76b2f2496464e1b6f9e11d21a29. ..and thereby brings back change 380165, which was reverted because it was buggy. Three problems with the original CL: 1. The author ran the pre-submit tests, but used -j24 and failed to search the output for fail messages. 2. The new null check analysis pass uses an interative approach to identify whether a null check is needed. It is possible that the null-check-required state may oscillate, and a logic error caused it to stick in the "no check needed" state. 3. Our old nemesis Dalvik untyped constants, in which 0 values can be used both as object reference and non-object references. This CL conservatively treats all CONST definitions as potential object definitions for the purposes of null check elimination. Change-Id: I3c1744e44318276e42989502a314585e56ac57a0
|
31aa97cfec5ee76b2f2496464e1b6f9e11d21a29 |
|
26-Oct-2013 |
Ian Rogers <irogers@google.com> |
Revert "Null check elimination improvement" This reverts commit 4db179d1821a9e78819d5adc8057a72f49e2aed8. Change-Id: I059c15c85860c6c9f235b5dabaaef2edebaf1de2
|
a61f49539a59b610e557b5513695295639496750 |
|
23-Aug-2013 |
buzbee <buzbee@google.com> |
Add timing logger to Quick compiler Current Quick compiler breakdown for compiling the boot class path: MIR2LIR: 29.674% MIROpt:SSATransform: 17.656% MIROpt:BBOpt: 11.508% BuildMIRGraph: 7.815% Assemble: 6.898% MIROpt:ConstantProp: 5.151% Cleanup: 4.916% MIROpt:NullCheckElimination: 4.085% RegisterAllocation: 3.972% GcMap: 2.359% Launchpads: 2.147% PcMappingTable: 2.145% MIROpt:CodeLayout: 0.697% LiteralData: 0.654% SpecialMIR2LIR: 0.323% Change-Id: I9f77e825faf79e6f6b214bb42edcc4b36f55d291
|
4db179d1821a9e78819d5adc8057a72f49e2aed8 |
|
23-Oct-2013 |
buzbee <buzbee@google.com> |
Null check elimination improvement See b/10862777 Improves the null check elimination pass by tracking visibility of object definitions, rather than successful uses of object dereferences. For boot class path, increases static null check elimination success rate from 98.4% to 98.6%. Reduces size of boot.oat by ~300K bytes. Fixes loop nesting depth computation, which is used by register promotion, and tweaked the heuristics. Fixes a bug in verbose listing output in which a basic block id is directly dereferenced, rather than first being converted to a pointer. Change-Id: Id01c20b533cdb12ea8fc4be576438407d0a34cec
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
b48819db07f9a0992a72173380c24249d7fc648a |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
|
d91d6d6a80748f277fd938a412211e5af28913b1 |
|
26-Sep-2013 |
Ian Rogers <irogers@google.com> |
Introduce Signature type to avoid string comparisons. Method resolution currently creates strings to then compare with strings formed from methods in other dex files. The temporary strings are purely created for the sake of comparisons. This change creates a new Signature type that represents a method signature but not as a string. This type supports comparisons and so can be used when searching for methods in resolution. With this change malloc is no longer the hottest method during dex2oat (now its memset) and allocations during verification have been reduced. The verifier is commonly what is populating the dex cache for methods and fields not declared in the dex file itself. Change-Id: I5ef0542823fbcae868aaa4a2457e8da7df0e9dae
|
ee39a10e45a6a0880e8b829525c40d6055818560 |
|
19-Sep-2013 |
Ian Rogers <irogers@google.com> |
Use class def index from java.lang.Class. Bug: 10244719 This removes the computation of the dex file index, when necessary this is computed by searching the dex file. Its only necessary in dalvik.system.DexFile.defineClassNative and DexFile::FindInClassPath, the latter not showing up significantly in profiling with this change. (cherry-picked from 8b2c0b9abc3f520495f4387ea040132ba85cae69) Change-Id: I20c73a3b17d86286428ab0fd21bc13f51f36c85c
|
8b2c0b9abc3f520495f4387ea040132ba85cae69 |
|
19-Sep-2013 |
Ian Rogers <irogers@google.com> |
Use class def index from java.lang.Class. Bug: 10244719 Depends on: https://googleplex-android-review.git.corp.google.com/362363 This removes the computation of the dex file index, when necessary this is computed by searching the dex file. Its only necessary in dalvik.system.DexFile.defineClassNative and DexFile::FindInClassPath, the latter not showing up significantly in profiling with this change. Change-Id: I20c73a3b17d86286428ab0fd21bc13f51f36c85c
|
bd663de599b16229085759366c56e2ed5a1dc7ec |
|
11-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: register/bb utilities This CL yeilds about a 4% improvement in the compilation phase of dex2oat (single-threaded; multi-threaded compilation is more difficult to accurately measure). The register utilities could stand to be completely rewritten, but this gets most of the easy benefit. Next up: the assembly phase. Change-Id: Ife5a474e9b1a6d9e501e888dda6749d34eb77e96
|
252254b130067cd7a5071865e793966871ae0246 |
|
09-Sep-2013 |
buzbee <buzbee@google.com> |
More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so *only* because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
|
28c2300d9a85f4e7288fb5d94280332f923b4df3 |
|
07-Sep-2013 |
buzbee <buzbee@google.com> |
More compile-time tuning Small, but measurable, improvement. Change-Id: Ie3c7180f9f9cbfb1729588e7a4b2cf6c6d291c95
|
56c717860df2d71d66fb77aa77f29dd346e559d3 |
|
06-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning Specialized the dataflow iterators and did a few other minor tweaks. Showing ~5% compile-time improvement in a single-threaded environment; less in multi-threaded (presumably because we're blocked by something else). Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
|
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 |
|
06-Sep-2013 |
Ian Rogers <irogers@google.com> |
Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
|
193bad9b9cfd10642043fa2ebbfc68bd5f9ede4b |
|
30-Aug-2013 |
Mathieu Chartier <mathieuc@google.com> |
Multi threaded hashed deduplication during compilation. Moved deduplication to be in the compiler driver instead of oat writer. This enables deduplication to be performed on multiple threads. Also added a hash function to avoid excessive comparison of byte arrays. Improvements: Before (alloats host): real 1m6.967s user 4m22.940s sys 1m22.610s Thinkfree.apk (target mako): 0m23.74s real 0m50.95s user 0m9.50s system 0m24.62s real 0m50.61s user 0m10.07s system 0m24.22s real 0m51.44s user 0m10.09s system 0m23.70s real 0m51.05s user 0m9.97s system 0m23.50s real 0m50.74s user 0m10.63s system After (alloats host): real 1m5.705s user 4m44.030s sys 1m29.990s Thinkfree.apk (target mako): 0m23.32s real 0m51.38s user 0m10.00s system 0m23.49s real 0m51.20s user 0m9.80s system 0m23.18s real 0m50.80s user 0m9.77s system 0m23.52s real 0m51.22s user 0m10.02s system 0m23.50s real 0m51.55s user 0m9.46s system Bug: 10552630 Change-Id: Ia6d06a747b86b0bfc4473b3cd68f8ce1a1c7eb22
|
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db |
|
25-Aug-2013 |
Mathieu Chartier <mathieuc@google.com> |
New arena memory allocator. Before we were creating arenas for each method. The issue with doing this is that we needed to memset each memory allocation. This can be improved if you start out with arenas that contain all zeroed memory and recycle them for each method. When you give memory back to the arena pool you do a single memset to zero out all of the memory that you used. Always inlined the fast path of the allocation code. Removed the "zero" parameter since the new arena allocator always returns zeroed memory. Host dex2oat time on target oat apks (2 samples each). Before: real 1m11.958s user 4m34.020s sys 1m28.570s After: real 1m9.690s user 4m17.670s sys 1m23.960s Target device dex2oat samples (Mako, Thinkfree.apk): Without new arena allocator: 0m26.47s real 0m54.60s user 0m25.85s system 0m25.91s real 0m54.39s user 0m26.69s system 0m26.61s real 0m53.77s user 0m27.35s system 0m26.33s real 0m54.90s user 0m25.30s system 0m26.34s real 0m53.94s user 0m27.23s system With new arena allocator: 0m25.02s real 0m54.46s user 0m19.94s system 0m25.17s real 0m55.06s user 0m20.72s system 0m24.85s real 0m55.14s user 0m19.30s system 0m24.59s real 0m54.02s user 0m20.07s system 0m25.06s real 0m55.00s user 0m20.42s system Correctness of Thinkfree.apk.oat verified by diffing both of the oat files. Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
|
96faf5b363d922ae91cf25404dee0e87c740c7c5 |
|
10-Aug-2013 |
Ian Rogers <irogers@google.com> |
Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a (cherry picked from commit 1809a72a66d245ae598582d658b93a24ac3bf01e)
|
1809a72a66d245ae598582d658b93a24ac3bf01e |
|
10-Aug-2013 |
Ian Rogers <irogers@google.com> |
Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a
|
7934ac288acfb2552bb0b06ec1f61e5820d924a4 |
|
26-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
|
479f83c196d5a95e36196eac548dc6019e70a5be |
|
19-Jul-2013 |
buzbee <buzbee@google.com> |
Dex compiler: re-enable method pattern matching The dex compiler's mechanism to detect simple methods and emit streamlined code was disabled during the last big restructuring (there was a question of how to make it useful for Portable as well as Quick). This CL does not address the Portable question, but turns the optimization back on for Quick. See b/9428200 Change-Id: I9f25b41219d7a243ec64efb18278e5a874766f4d
|
2d88862f0752a7a0e65145b088f49dabd49d4284 |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fixing cpplint readability/casting issues Change-Id: I6821da0e23737995a9b884a04e9b63fac640cd05
|
02c8cc6d1312a2b55533f02f6369dc7c94672f90 |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fixing cpplint whitespace/blank_line, whitespace/end_of_line, whitespace/labels, whitespace/semicolon issues Change-Id: Ide4f8ea608338b3fed528de7582cfeb2011997b6
|
df62950e7a32031b82360c407d46a37b94188fbb |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
|
0cd7ec2dcd8d7ba30bf3ca420b40dac52849876c |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/blank_line issues Change-Id: Ice937e95e23dd622c17054551d4ae4cebd0ef8a2
|
f69863b3039fc621ff4250e262d2a024d5e79ec8 |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/newline issues Change-Id: Ie2049d9f667339e41f36c4f5d09f0d10d8d2c762
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|