b7fd412dd21eb362931b3a0716c94fd189a66295 |
|
04-Jun-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Quick: Create GC map based on compiler data. DO NOT MERGE" This reverts commit 7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7. Change-Id: Iadb4462bf8e834c6a847c01ee6eb332a325de22c
|
7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. DO NOT MERGE The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 (cherry picked from commit 767c752fddc64e280dba507457e4f06002b5f678) Change-Id: Id75428fd0a2f6bdd2ccb20ce75cdeab01150e455
|
3d21bdf8894e780d349c481e5c9e29fe1556051c |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
41b175aba41c9365a1c53b8a1afbd17129c87c14 |
|
19-May-2015 |
Vladimir Marko <vmarko@google.com> |
ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
|
5ea536aa4a6414db01beaf6f8bd8cb9adc5cfc92 |
|
20-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Remove ArtMethod* parameter from dex cache entry points. Load the ArtMethod* using an optimized stack walk instead. This reduces the size of the generated code. Three of the entry points are called only from a slow-path and the fourth (InitializeTypeAndVerifyAccess) is rare and already slow enough that the one or two extra loads (depending on whether we already have the ArtMethod* in a register) are insignificant. And as we're starting to use PC-relative addressing of the dex cache arrays (already done by Quick for the boot image), having the ArtMethod* in a register becomes less likely anyway. Change-Id: Ib19b9d204e355e13bf386662a8b158178bf8ad28
|
2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
fac10700fd99516e8a14f751fe35553021ce6982 |
|
22-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Remove broken Mir2Lir::LocToRegClass(). Its use in intrinsics has been bogus. In all other instances it's been used under the assumption that the inferred type matches the return type of associated calls. However, if the type inference identifies a type mismatch, the assumption doesn't hold and there isn't necessarily a valid value that the function could reasonably return. Bug: 19918641 Change-Id: I050934e6f9eb00427d0b888ee29ae9eeb509bb3f
|
3477307fdf93a1ef9a80d4e096125705c47e8024 |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Use PC-relative dex cache array loads for SGET/SPUT. Change-Id: I890284b73f69120ada5cf9b9ef4a717af3273cd2
|
20f85597828194c12be10d3a927999def066555e |
|
19-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
|
da4d79bc9a4aeb9da7c6259ce4c9c1c3bf545eb8 |
|
24-Mar-2015 |
Roland Levillain <rpl@google.com> |
Unify ART's various implementations of bit_cast. ART had several implementations of art::bit_cast: 1. one in runtime/base/casts.h, declared as: template <class Dest, class Source> inline Dest bit_cast(const Source& source); 2. another one in runtime/utils.h, declared as: template<typename U, typename V> static inline V bit_cast(U in); 3. and a third local version, in runtime/memory_region.h, similar to the previous one: template<typename Source, typename Destination> static Destination MemoryRegion::local_bit_cast(Source in); This CL removes versions 2. and 3. and changes their callers to use 1. instead. That version was chosen over the others as: - it was the oldest one in the code base; and - its syntax was closer to the standard C++ cast operators, as it supports the following use: bit_cast<Destination>(source) since `Source' can be deduced from `source'. Change-Id: I7334fd5d55bf0b8a0c52cb33cfbae6894ff83633
|
767c752fddc64e280dba507457e4f06002b5f678 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 Change-Id: Id3a2f863b16bdb8969df7004c868773084aec421
|
0b40ecf156e309aa17c72a28cd1b0237dbfb8746 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up slow paths. Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
|
22fe45de11ed7afdf21400d2de3abd23f3a62800 |
|
18-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Eliminate check-cast guaranteed by instance-of. Eliminate check-cast if the result of an instance-of with the very same type on the same value is used to branch to the check-cast's block or a dominator of it. Note that there already exists a verifier-based elimination of check-cast but it excludes check-cast on interfaces. This new optimization works for interface types and, since it's GVN-based, it can better recognize when the same reference is used for instance-of and check-cast. Change-Id: Ib315199805099d1cb0534bb4a90dc51baa409685
|
6ea651f0f4c7de4580beb2e887d86802c1ae0738 |
|
24-Feb-2015 |
Maja Gagic <maja.gagic@imgtec.com> |
Initial support for quick compiler on MIPS64r6. Change-Id: I6f43027b84e4a98ea320cddb972d9cf39bf7c4f8
|
7e6a233f5d2268de0bf1600eed77c05753615f52 |
|
24-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Delete bad DCHECK The declaring class of a super field could be in a different dex file. Fixes sdk build. Change-Id: I8258d64eeff6539afb52448595f5a6ec4c71a6bc
|
e5f13e57ff8fa36342beb33830b3ec5942a61cca |
|
24-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Revert "Revert "Add JIT"" Added missing EntryPointToCodePointer. This reverts commit a5ca888d715cd0c6c421313211caa1928be3e399. Change-Id: Ia74df0ef3a7babbdcb0466fd24da28e304e3f5af
|
a5ca888d715cd0c6c421313211caa1928be3e399 |
|
24-Feb-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Add JIT" Sorry, run-test crashes on target: 0-05 12:15:51.633 I/DEBUG (27995): Abort message: 'art/runtime/mirror/art_method.cc:349] Check failed: PcIsWithinQuickCode(reinterpret_cast<uintptr_t>(code), pc) java.lang.Throwable java.lang.Throwable.fillInStackTrace() pc=71e3366b code=0x71e3362d size=ad000000' 10-05 12:15:51.633 I/DEBUG (27995): r0 00000000 r1 0000542b r2 00000006 r3 00000000 10-05 12:15:51.633 I/DEBUG (27995): r4 00000006 r5 b6f9addc r6 00000002 r7 0000010c 10-05 12:15:51.633 I/DEBUG (27995): r8 b63fe1e8 r9 be8e1418 sl b6427400 fp b63fcce0 10-05 12:15:51.633 I/DEBUG (27995): ip 0000542b sp be8e1358 lr b6e9a27b pc b6e9c280 cpsr 40070010 10-05 12:15:51.633 I/DEBUG (27995): Bug: 17950037 This reverts commit 2535abe7d1fcdd0e6aca782b1f1932a703ed50a4. Change-Id: I6f88849bc6f2befed0c0aaa0b7b2a08c967a83c3
|
7656ce0ddba6edaac6d1a2d41a120e3f8110aaa9 |
|
24-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Delete bad DCHECK The declaring class of a super field could be in a different dex file. Fixes sdk build. Change-Id: I6e1738603e71c3c673fa12d9aaa25c9eb84db406
|
2535abe7d1fcdd0e6aca782b1f1932a703ed50a4 |
|
17-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Add JIT Currently disabled by default unless -Xjit is passed in. The proposed JIT is a method JIT which works by utilizing interpreter instrumentation to request compilation of hot methods async during runtime. JIT options: -Xjit / -Xnojit -Xjitcodecachesize:N -Xjitthreshold:integervalue The JIT has a shared copy of a compiler driver which is accessed by worker threads to compile individual methods. Added JIT code cache and data cache, currently sized at 2 MB capacity by default. Most apps will only fill a small fraction of this cache however. Added support to the compiler for compiling interpreter quickened byte codes. Added test target ART_TEST_JIT=TRUE and --jit for run-test. TODO: Clean up code cache. Delete compiled methods after they are added to code cache. Add more optimizations related to runtime checks e.g. direct pointers for invokes. Add method recompilation. Move instrumentation to DexFile to improve performance and reduce memory usage. Bug: 17950037 Change-Id: Ifa5b2684a2d5059ec5a5210733900aafa3c51bca
|
6ce3eba0f2e6e505ed408cdc40d213c8a512238d |
|
16-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Add suspend checks to special methods. Generate suspend checks at the beginning of special methods. If we need to call to runtime, go to the slow path where we create a simplified but valid frame, spill all arguments, call art_quick_test_suspend, restore necessary arguments and return back to the fast path. This keeps the fast path overhead to a minimum. Bug: 19245639 Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
d500b53ff8742f76b63c9f7593082d9e8114b85f |
|
17-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Move some definitions around. In case a method is already virtual, avoid instruction-set tests. Change-Id: I8d98f098e55ade1bc0cfa32bb2aad006caccd07d
|
e7227c628e6f22a823945cc76e554eb2a8b0d925 |
|
13-Jan-2015 |
Vladimir Marko <vmarko@google.com> |
Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd. If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and we need to record the safepoint for the ldrexd rather than strexd. IGET_WIDE was simply missing the memory barrier. Bug: 18993519 (cherry picked from commit ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c) Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
|
ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c |
|
13-Jan-2015 |
Vladimir Marko <vmarko@google.com> |
Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd. If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and we need to record the safepoint for the ldrexd rather than strexd. IGET_WIDE was simply missing the memory barrier. Bug: 18993519 Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
|
7e499925f8b4da46ae51040e9322690f3df992e6 |
|
06-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
|
1cc7dbabd03e0a6c09d68161417a21bd6f9df371 |
|
18-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Reorder entrypoint argument order Shuffle the ArtMethod* referrer backwards for easier removal. Clean up ARM & MIPS assembly code. Change some macros to make future changes easier. Change-Id: Ie2862b68bd6e519438e83eecd9e1611df51d7945
|
8b858e16563ebf8e522df026a6ab409f1bd9b3de |
|
27-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Quick: Redefine the notion of back-egdes. Redefine a back-edge to really mean an edge to a loop head instead of comparing instruction offsets. Generate suspend checks also on fall-through to a loop head; insert an extra GOTO for these edges. Add suspend checks to fused cmp instructions. Rewrite suspend check elimination to track whether there is an invoke on each path from the loop head to a given back edge, instead of using domination info to look for a basic block with invoke that must be on each path. Ignore invokes to intrinsics and move the optimization to a its own pass. The new loops in 109-suspend-check should prevent intrinsics and fused cmp-related regressions. Bug: 18522004 Change-Id: I96ac818f76ccf9419a6e70e9ec00555f9d487a9e
|
6af820639c74e769ffc1f54930f6ebc11364f894 |
|
26-Nov-2014 |
Yevgeny Rouban <yevgeny.y.rouban@intel.com> |
ART: x86 specific clearing higher bits when converting long to int The following problem description is taken from https://android-review.googlesource.com/107261 If destination and source of long-to-int is the same physical register on 64-bit then we do not emit any instructions but consider that destination is a 32-bit view of source register. As a result high part contains garbage. If the destination is used later as index to array access then this garbage is used in computation of address because address is 64-bit. For all other cases garbage is just ignored. A generic solution (113023) for all hw platforms was suggested but rejected later for the sake of HW specific solution: https://android-review.googlesource.com/113023 https://android-review.googlesource.com/114436 This patch is a rework of patch 113023 to stick with x86_64 specific changes: for 64-bit target this patch forces generating reg-to-reg copy if the src and dest are the same physical registers. This makes the higher bits be zeroed by 32-bit move instruction. Change-Id: Id29af839506ff9319ffba08b2e86e240fef4dafd Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
|
eace45873190a27302b3644c32ec82854b59d299 |
|
25-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Move dexCacheStrings from ArtMethod to Class Adds one load for const strings which are not direct. Saves >= 60KB of memory avg per app. Image size: -350KB. Bug: 17643507 Change-Id: I2d1a3253d9de09682be9bc6b420a29513d592cc8 (cherry picked from commit f521f423b66e952f746885dd9f6cf8ef2788955d)
|
f521f423b66e952f746885dd9f6cf8ef2788955d |
|
25-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Move dex cache strings from ArtMethod -> Class Adds one load for const strings which are not direct. Saves >= 60KB of memory avg per app. Image size: -350KB. Bug: 17643507 Change-Id: I2d1a3253d9de09682be9bc6b420a29513d592cc8
|
743b98cd3d7db1cfd6b3d7f7795e8abd9d07a42d |
|
24-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Skip null check in MarkGCCard() for known non-null values. Use GVN's knowledge of non-null values to set a new MIR flag for IPUT/SPUT/APUT to skip the value null check. Change-Id: I97a8d1447acb530c9bbbf7b362add366d1486ee1
|
da96aeda912ff317de2c41e5a49bd244427238ac |
|
27-Oct-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Generate switch targets from successor blocks This patch relies on the successor blocks to generate switch targets in GenSmallPackedSwitch and GenSmallSparseSwitch for all quick targets. In x86, we create a new packed switch table by storing basic block ids instead of dex offsets, and we override MarkPackedCaseLabels and InsertCaseLabel to avoid calling FindBlock. Change-Id: Ibb5983db582f0965aba787b520bd106522453564 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
807140048f82a2b87ee5bcf337f23b6a3d1d5269 |
|
21-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Add fast string sharpening String sharpening changes const strings to PC relative loads instead of always going through the dex cache. This saves code size and probably improves performance slightly. Before: 49602992 system@framework@boot.oat After: 49385904 system@framework@boot.oat Pre-cursor to removing dex_cache_strings_ field from ArtMethod. Bug: 17643507 Change-Id: I1787f48774631eee0accafeea257aa8d0e91e8d6
|
af6925b7fe5dc5a3c8d52ee3370e86e75400f873 |
|
31-Oct-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite GVN's field id and field type handling. Create a helper unit for dex insn classification and cache dex field type (as encoded in the insn) in the MirFieldInfo. Use this for cleanup and a few additional DCHECKs. Change the GVN's field id to match the field lowering info index (MIR::meta::{i,s}field_lowering_info), except where multiple indexes refer to the same field and we use the lowest of the applicable indexes. Use the MirMethodInfo from MIRGraph to retrieve field type for GVN using this index. This slightly reduces GVN compilation time and prepares for further compilation time improvements. Change-Id: I1b1247cdb8e8b6897254e2180f3230f10159bed5
|
bf535be514570fc33fc0a6347a87dcd9097d9bfd |
|
19-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
|
d582fa4ea62083a7598dded5b82dc2198b3daac7 |
|
06-Nov-2014 |
Ian Rogers <irogers@google.com> |
Instruction set features for ARM64, MIPS and X86. Also, refactor how feature strings are handled so they are additive or subtractive. Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler rather than #ifdefs that wouldn't have worked in cross-compilation. Add SIMD features for x86/x86-64 proposed in: https://android-review.googlesource.com/#/c/112370/ Bug: 18056890 Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
|
8073ba11c62ea8aeb8322b18e730005733c06e70 |
|
13-Nov-2014 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
ART: Implicit null check should break def tracking Implicit null check can provoke exception that needs to be sure all VRs are saved on stack. The fix is to reset the def tracking system at the moment of adding an implicit null check. b/18368901 Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com> (cherry picked from commit 9c3617a8f7413bb1181e72bc1f7086d986a86e18) Change-Id: If7d705b8417cf29076da9ce19ec54d698f3cb0d5
|
9c3617a8f7413bb1181e72bc1f7086d986a86e18 |
|
13-Nov-2014 |
Pavel Vyssotski <pavel.n.vyssotski@intel.com> |
ART: Implicit null check should break def tracking Implicit null check can provoke exception that needs to be sure all VRs are saved on stack. The fix is to reset the def tracking system at the moment of adding an implicit null check. Change-Id: Ie8a32b727086438e04e745d4a3f87f096ff36cac Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
|
27503548bc8945da875240751dcd4b1495584669 |
|
06-Nov-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Use correct register class for refs LoadValue requires thar ref location should reguest kRefReg register class. The patch fixes GenFilledNewArray to specify the register class correctly. This is a fix for the crash of dex2oat on 412-new-array unit test. This is a second attempt with an additional fix for arm64. Change-Id: I9f0bb098cd1d1721ef03e8976c1460f8fa49aa2a Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
29b55354f5856c353c0717ce2b570fabbec550ee |
|
11-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Revert "Use correct register class for refs" This reverts commit 5c2555407d823356fb55ea3ffdf281aac00a583e. Change-Id: I0490e9b1a9470e429f31911c9a4f28f71df78cc1
|
5c2555407d823356fb55ea3ffdf281aac00a583e |
|
06-Nov-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Use correct register class for refs LoadValue requires thar ref location should reguest kRefReg register class. The patch fixes GenFilledNewArray to specify the register class correctly. This is a fix for the crash of dex2oat on 412-new-array unit test. Change-Id: I58d969ddac0d84d4024bf686b5b0c12337ca9a37 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
675e09b2753c2fcd521bd8f0230a0abf06e9b0e9 |
|
23-Oct-2014 |
Ningsheng Jian <ningsheng.jian@arm.com> |
ARM: Strength reduction for floating-point division For floating-point division by power of two constants, generate multiplication by the reciprocal instead. Change-Id: I39c79eeb26b60cc754ad42045362b79498c755be
|
277ccbd200ea43590dfc06a93ae184a765327ad0 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: More warnings Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general, and -Wunused-but-set-parameter for GCC builds. Change-Id: I81bbdd762213444673c65d85edae594a523836e5
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
7d4ecd5edba1e4d4147692d67d9177154958801c |
|
30-Oct-2014 |
Ian Rogers <irogers@google.com> |
Avoid signed integer overflow. Caught by -ftrapv there is a benign integer overflow if a switch has the value max int within it (as in run-test 095). Change-Id: I86bb8ce2f9097cb367c031ec51d58d01b31313e2
|
66c6d7bdfdd535e6ecf4461bba3804f1a7794fcd |
|
16-Oct-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite class initialization check elimination. Split the notion of type being in dex cache away from the class being initialized. Include static invokes in the class initialization elimination pass. Change-Id: Ie3760d8fd55b987f9507f32ef51456a57d79e3fb
|
677cd61ad05d993c4d3b22656675874f06d6aabc |
|
15-Oct-2014 |
Ian Rogers <irogers@google.com> |
Make ART compile with GCC -O0 again. Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on architecture. Add to instruction_set_test to warn when InstructionSetFeatures don't agree with ones from system properties, AT_HWCAP and /proc/cpuinfo. Clean-up class linker entry point logic to not return entry points but to test whether the passed code is the particular entrypoint. This works around image trampolines that replicate entrypoints. Bug: 17993736 (cherry picked from commit 6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3) Change-Id: I3e7595f437db4828072589d475a5453b7f31003e
|
6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3 |
|
15-Oct-2014 |
Ian Rogers <irogers@google.com> |
Make ART compile with GCC -O0 again. Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on architecture. Add to instruction_set_test to warn when InstructionSetFeatures don't agree with ones from system properties, AT_HWCAP and /proc/cpuinfo. Clean-up class linker entry point logic to not return entry points but to test whether the passed code is the particular entrypoint. This works around image trampolines that replicate entrypoints. Bug: 17993736 Change-Id: I5f4b49e88c3b02a79f9bee04f83395146ed7be23
|
5c5676b26a08454b3f0133783778991bbe5dd681 |
|
30-Sep-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Add div/rem zero check elimination flag Just as with other throwing bytecodes, it is possible to prove in some cases that a divide/remainder won't throw ArithmeticException. For example, in case two divides with same denominator are in order, then provably the second one cannot throw if the first one did not. This patch adds the elimination flag and updates the signature of several Mir2Lir methods to take the instruction optimization flags into account. Change-Id: I0b078cf7f29899f0f059db1f14b65a37444b84e8 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
fc787ecd91127b2c8458afd94e5148e2ae51a1f5 |
|
10-Oct-2014 |
Ian Rogers <irogers@google.com> |
Enable -Wimplicit-fallthrough. Falling through switch cases on a clang build must now annotate the fallthrough with the FALLTHROUGH_INTENDED macro. Bug: 17731372 Change-Id: I836451cd5f96b01d1ababdbf9eef677fe8fa8324
|
832336b3c9eb892045a8de1bb12c9361112ca3c5 |
|
09-Oct-2014 |
Ian Rogers <irogers@google.com> |
Don't copy fill array data to quick literal pool. Currently quick copies the fill array data from the dex file to the literal pool. It then has to go through hoops to pass this PC relative address down to out-of-line code. Instead, pass the offset of the table to the out-of-line code and use the CodeItem data associated with the ArtMethod. This reduces the size of oat code while greatly simplifying it. Unify the FillArrayData implementation in quick, portable and the interpreters. Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
|
7c02e918e752ab36f0b6cab7528f10c0cf55a4ee |
|
03-Oct-2014 |
buzbee <buzbee@google.com> |
Quick compiler: Fix ambiguous LoadValue() Internal b/17790197 & hat tip to Stephen Kyle The following custom-edited dex program demonstrated incorrect code generation caused by type confusion. In the example, the constant held in v0 is used in both float and int contexts, and the register class gets confused at the if-eq. .method private static getInt()I .registers 4 const/16 v0, 100 const/4 v1, 1 const/4 v2, 7 :loop if-eq v2, v0, :done add-int v2, v2, v1 goto :loop :done add-float v3, v0, v1 return v2 .end method The bug was introduced in c/96499, "Quick compiler: reference cleanup" That CL created a convenience variant of LoadValue which selected the target register type based on the type of the RegLocation. It should not have done so. The type of a RegLocation is the compiler's best guess of the Dalvik type - and Dalvik allows constants to be used in multiple type contexts. All code generation utilities must specify desired register class based on the capabilities of the instructions to be emitted. In the failing case, OpCmpImmBranch (and GenCompareZeroAndBranch) will be using core registers, so the LoadValue must specify either kCoreReg or kRefReg. The CL deletes the dangerous LoadValue() variant. Change-Id: Ie4ec6e51b19676dbbb9628c72c8b3473a419e7ec
|
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 |
|
22-Sep-2014 |
Vladimir Marko <vmarko@google.com> |
Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
|
8d0d03e24325463f0060abfd05dba5598044e9b1 |
|
07-Jun-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
37f05ef45e0393de812d51261dc293240c17294d |
|
17-Jul-2014 |
Fred Shih <ffred@google.com> |
Reduced memory usage of primitive fields smaller than 4-bytes Reduced memory used by byte and boolean fields from 4 bytes down to a single byte and shorts and chars down to two bytes. Fields are now arranged as Reference followed by decreasing component sizes, with fields shuffled forward as needed. Bug: 8135266 Change-Id: I65eaf31ed27e5bd5ba0c7d4606454b720b074752
|
8c18c2aaedb171f9b03ec49c94b0e33449dc411b |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Bug: 16241558 (cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13) Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
|
e7f82e2515f47f3c3292281312d7031a34a58ffc |
|
06-Aug-2014 |
Fred Shih <ffred@google.com> |
Added support for patching classes from different dex files. Added support for class patching from different dex files and moved ScopedObjectAccess from the quick compiler to driver. Slight refactoring for clarity. Bug: 16656190 Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
|
48971b3242e5126bcd800cc9c68df64596b43d13 |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
|
c76c614d681d187d815760eb909e5faf488a3c35 |
|
05-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor long ops in quick compiler Make GenArithOpLong virtual. Let the implementation in gen_common be very basic, without instruction-set checks, and meant as a fall-back. Backends should implement and dispatch to code for better implementations. This allows to remove the GenXXXLong virtual methods from Mir2Lir, and clean up the backends (especially removing some LOG(FATAL) implementations). Change-Id: I6366443c0c325c1999582d281608b4fa229343cf
|
c763e350da562b0c6bebf10599588d4901140e45 |
|
04-Jul-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: Implement InexpensiveConstant methods. Implement IsInexpensiveConstant and friends for A64. Also extending the methods to take the opcode with respect to which the constant is inexpensive. Additionally, logical operations (i.e. and, or, xor) can now handle the immediates 0 and ~0 (which are not logical immediates). Change-Id: I46ce1287703765c5ab54983d13c1b3a1f5838622
|
b99b8d6cffe08d8c9d30175c936e5c88d3101802 |
|
31-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix verifier mishandling erroneous array component types The verifier must not assume that component types are not erroneous. Bug: 16661259 (cherry picked from commit aa910d5ef43256102809e397de305c23f1c315e6) Change-Id: I6d607310593ac337616581bfdff5eb29a8dd1b9d
|
aa910d5ef43256102809e397de305c23f1c315e6 |
|
31-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix verifier mishandling erroneous array component types The verifier must not assume that component types are not erroneous. Bug: 16661259 Change-Id: I23b2f517259ca9c0b8a1aa38f6348fcd61e0b22e
|
0237ac84b1459cb1718dce23f3572ae2fe1bd77e |
|
26-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Special-case cb(n)z even for in-reg constant Call out to OpCmpImmBranch in GenCompareAndBranch if the constant is zero and we are testing == or !=, even when zero has been loaded to a register already. This avoids a register size mismatch on 64b architectures when basically doing a null check, and generally squashes a cmp + branch to a cbz or cbnz on Arm and Mips. X86 is not degraded. Bug: 16562601 (cherry picked from commit b07c1f9f4d6088ca2d4c1a10819e57b19acf7f22) Change-Id: I42701e827feb848470aa991297755d808fa0a077
|
984305917bf57b3f8d92965e4715a0370cc5bcfb |
|
28-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework quick entrypoint code in Mir2Lir, cleanup To reduce the complexity of calling trampolines in generic code, introduce an enumeration for entrypoints. Introduce a header that lists the entrypoint enum and exposes a templatized method that translates an enum value to the corresponding thread offset value. Call helpers are rewritten to have an enum parameter instead of the thread offset. Also rewrite LoadHelper and GenConversionCall this way. It is now LoadHelper's duty to select the right thread offset size. Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to further simplify the call helpers, as well as make OpThreadMem specific to X86 only (removed from Mir2Lir). Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they are now specific to X86 only. Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend. Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented. Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
|
b07c1f9f4d6088ca2d4c1a10819e57b19acf7f22 |
|
26-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Special-case cb(n)z even for in-reg constant Call out to OpCmpImmBranch in GenCompareAndBranch if the constant is zero and we are testing == or !=, even when zero has been loaded to a register already. This avoids a register size mismatch on 64b architectures when basically doing a null check, and generally squashes a cmp + branch to a cbz or cbnz on Arm and Mips. X86 is not degraded. Bug: 16562601 Change-Id: I1997760f43dc186a84247ad30ae91053f71d102d
|
bebee4fd10e5db6cb07f59bc0f73297c900ea5f0 |
|
16-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Bug: 16241558 (cherry picked from commit 90969af6deb19b1dbe356d62fe68d8f5698d3d8f) Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
|
9ee4519afd97121f893f82d41d23164fc6c9ed34 |
|
17-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
x86: GenSelect utility update The is follow-up https://android-review.googlesource.com/#/c/101396/ to make x86 GenSelectConst32 implementation complete. Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
147eb41b53729ec8d5c188d1cac90964a51afb8a |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
|
90969af6deb19b1dbe356d62fe68d8f5698d3d8f |
|
16-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
|
69dfe51b684dd9d510dbcb63295fe180f998efde |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
|
d9cb8ae2ed78f957a773af61759432d7a7bf78af |
|
09-Jul-2014 |
Douglas Leung <douglas@mips.com> |
Fix art test failures for Mips. This patch fixes the following art test failures for Mips: 003-omnibus-opcodes 030-bad-finalizer 041-narrowing 059-finalizer-throw Change-Id: I4e0e9ff75f949c92059dd6b8d579450dc15f4467 Signed-off-by: Douglas Leung <douglas@mips.com>
|
48f5c47907654350ce30a8dfdda0e977f5d3d39f |
|
27-Jun-2014 |
Hans Boehm <hboehm@google.com> |
Replace memory barriers to better reflect Java needs. Replaces barriers that enforce ordering of one access type (e.g. Load) with respect to another (e.g. store) with more general ones that better reflect both Java requirements and actual hardware barrier/fence instructions. The old code was inconsistent and unclear about which barriers implied which others. Sometimes multiple barriers were generated and then eliminated; sometimes it was assumed that certain barriers implied others. The new barriers closely parallel those in C++11, though, for now, we use something closer to the old naming. Bug: 14685856 Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
|
ccc60264229ac96d798528d2cb7dbbdd0deca993 |
|
05-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework TargetReg(symbolic_reg, wide) Make the standard implementation in Mir2Lir and the specialized one in the x86 backend return a pair when wide = "true". Introduce WideKind enumeration to improve code readability. Simplify generic code based on this implementation. Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
|
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
|
0025a86411145eb7cd4971f9234fc21c7b4aced1 |
|
11-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
|
4eca9f53ce6c4e59ebad93a049e86d4965e87c9d |
|
07-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Slow path for iget should expect return in core reg Slow path for iget invokes the C implementation. In all cases the C function returns the result in core reg. So implementation should expect the result in core reg independent on whether it is fp or not. Change-Id: I57fb0e684c38af22316398d8071f087bd4bd253c Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf |
|
29-May-2014 |
Dave Allison <dallison@google.com> |
Add implicit null and stack checks for x86 This adds compiler and runtime changes for x86 implicit checks. 32 bit only. Both host and target are supported. By default, on the host, the implicit checks are null pointer and stack overflow. Suspend is implemented but not switched on. Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
|
8159af6320f7413a9d0fff1e29220c01f29c6e96 |
|
08-Jul-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Check slow_paths_.Size() every time This patch fixes a bug, when a new slow path is created during slowpath->Compile(). Change-Id: I4896a82781102694c25f4483112c6de3c56e072c Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
3d14eb620716e92c21c4d2c2d11a95be53319791 |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Add implicit null and stack checks for x86" It breaks cross compilation with x86_64. This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf. Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
|
a77ee5103532abb197f492c14a9e6fb437054e2a |
|
02-Jul-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
x86_64: TargetReg update for x86 Also includes changes in common code. Elimination of use of TargetReg with one parameter and direct access to special target registers. Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 |
|
22-Jun-2014 |
buzbee <buzbee@google.com> |
Register promotion support for 64-bit targets Not sufficiently tested for 64-bit targets, but should be fairly close. A significant amount of refactoring could stil be done, (in later CLs). With this change we are not making any changes to the vmap scheme. As a result, it is a requirement that if a vreg is promoted to both a 32-bit view and the low half of a 64-bit view it must share the same physical register. We may change this restriction later on to allow for more flexibility for 32-bit Arm. For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to promote, we'd end up with something like: v4 (as an int) -> r10 v4/v5 (as a long) -> r10 v5 (as an int) -> r11 v5/v6 (as a long) -> r11 Fix a couple of ARM64 bugs on the way... Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
|
4b537a851b686402513a7c4a4e60f5457bb8d7c1 |
|
01-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Quick compiler: More size checks, add TargetReg variants Add variants for TargetReg for requesting specific register usage, e.g., wide and ref. More register size checks. With code adapted from https://android-review.googlesource.com/#/c/98605/. Change-Id: I852d3be509d4dcd242c7283da702a2a76357278d
|
2db3e269e3051dacb3c8a4af8f03fdad9b0fd740 |
|
26-Jun-2014 |
Douglas Leung <douglas@mips.com> |
Fix quick mode bugs for Mips. This patch enable quick mode for Mips and allows the emulator to boot. However the emulator is still not 100% functional. It still have problems launching some apps. Change-Id: Id46a39a649a2fd431a9f13b06ecf34cbd1d20930 Signed-off-by: Douglas Leung <douglas@mips.com>
|
de68676b24f61a55adc0b22fe828f036a5925c41 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
3c12c512faf6837844d5465b23b9410889e5eb11 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d |
|
23-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
5655e84e8d71697d8ef3ea901a0b853af42c559e |
|
18-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Implicit checks in the compiler are independent from Runtime When cross-compiling, those flags are independent. This is an initial CL that helps bypass fatal failures when cross-compiling, as not all architectures support (and have turned on) implicit checks. The actual transport for the target architecture when it is different from the runtime needs to be implemented in a follow-up CL. Bug: 15703710 Change-Id: Idc881a9a4abfd38643b862a491a5af9b8841f693
|
7048ae42d2b02d62110e297221460cacf1f6db73 |
|
02-Jun-2014 |
Ian Rogers <irogers@google.com> |
Make class status volatile. Discourage loads and stores from reordering around the status being updated. Bug: 15347354 (cherry-picked from 03dbc04d1d5a3bd62801989b16e994a9ed0dafb5) Change-Id: I29a4144bd3a145017831540ce1130da5401e39c0
|
33ae5583bdd69847a7316ab38a8fa8ccd63093ef |
|
12-Jun-2014 |
buzbee <buzbee@google.com> |
Arm64 hard-float Basic enabling of hard-float for Arm64. In future CLs we'll consolidate the various targets - there is a lot of overlap. Compilation remains turned off in this CL, but I expect to enable a subset shortly. With compilation fully enabled (including the EXPERIMENTAL opcodes with the exception of REM and THROW), we get the following run-test results: 003-omnibus-opcode failures: Classes.checkCast Classes.arrayInstance UnresTest2 Haven't gone deep, but these appear to be related to throw/catch and/or stacktrace. For REM, the generated code looks reasonable to me - my guess is that we've got something wrong on the transition to the runtime. Haven't looked deeper yet, though. The bulk of the other failure also appear to be related to transitioning to the runtime system, or handling try/catch. run-test status: Status with optimizations disabled, REM_FLOAT/DOUBLE and THROW disabled: succeeded tests: 94 failed tests: 22 failed: 003-omnibus-opcodes failed: 004-annotations failed: 009-instanceof2 failed: 024-illegal-access failed: 025-access-controller failed: 031-class-attributes failed: 044-proxy failed: 045-reflect-array failed: 046-reflect failed: 058-enum-order failed: 062-character-encodings failed: 063-process-manager failed: 064-field-access failed: 068-classloader failed: 071-dexfile failed: 083-compiler-regressions failed: 084-class-init failed: 086-null-super failed: 087-gc-after-link failed: 100-reflect2 failed: 107-int-math2 failed: 201-built-in-exception-detail-messages Change-Id: Ib66209285cad8998d77a14781de300af02a96b15
|
7e399fd3a99ba9c9dbfafdf14f75dd318fa7d454 |
|
11-Jun-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
x86_64: Disable all optimizations and fix bugs This disables all optimizations and ensures that art tests still pass. Change-Id: I43217378d6889bb04f4d064f8d53cb3ff4c20aa0 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
8dea81ca9c0201ceaa88086b927a5838a06a3e69 |
|
06-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
a014776f4474579d4dfc72e3374ba45c6f6e5f35 |
|
07-Jun-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
x86_64: Add long bytecode supports (2/2) This patch adds implementation of math and complex long bytcodes, and basic long arithmetic. Change-Id: I811397d7e0ee8ad0d12b23d32ba58314d479d714 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 |
|
01-Jun-2014 |
buzbee <buzbee@google.com> |
Quick compiler: reference cleanup For 32-bit targets, object references are 32 bits wide both in Dalvik virtual registers and in core physical registers. Because of this, object references and non-floating point values were both handled as if they had the same register class (kCoreReg). However, for 64-bit systems, references are 32 bits in Dalvik vregs, but 64 bits in physical registers. Although the same underlying physical core registers will still be used for object reference and non-float values, different register class views will be used to represent them. For example, an object reference in arm64 might be held in x3 at some point, while the same underlying physical register, w3, would be used to hold a 32-bit int. This CL breaks apart the handling of object reference and non-float values to allow the proper register class (or register view) to be used. A new register class, kRefReg, is introduced which will map to a 32-bit core register on 32-bit targets, and 64-bit core registers on 64-bit targets. From this point on, object references should be allocated registers in the kRefReg class rather than kCoreReg. Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
|
03dbc04d1d5a3bd62801989b16e994a9ed0dafb5 |
|
02-Jun-2014 |
Ian Rogers <irogers@google.com> |
Make class status volatile. Discourage loads and stores from reordering around the status being updated. Bug: 15347354 Change-Id: Ice805cb834617747c8209e98a142d3e5c7585719
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
0955f7e470fb733aef07096536e9fba7c99250aa |
|
23-May-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: fixing some assertions. Fixing some assertions while attempting to get libartd.so to work. Fixing also the shift logic in LoadBaseIndexed() and StoreBaseIndexed(). This commit only fixes a part of the assertion issues. Change-Id: I473194d4260dd59a8ee6d73114429728c977ee0e
|
ed65c5e982705defdb597d94d1aa3f2997239c9b |
|
22-May-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
AArch64: Enable LONG_* and INT_* opcodes. This patch fixes some of the issues with LONG and INT opcodes. The patch has been tested and passes all the dalvik tests except for 018 and 107. Change-Id: Idd1923ed935ee8236ab0c7e5fa969eaefeea8708 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
|
e87f9b5185379c8cf8392d65a63e7bf7e51b97e7 |
|
30-Apr-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Allow X86 QBE to be extended Enhancements and updates to allow X86Mir2LIR Backend to be subclassed for experimentation. Add virtual in a whole bunch of places, and make some other changes to get this to work. Change-Id: I0980a19bc5d5725f91660f98c95f1f51c17ee9b6 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2f244e9faccfcca68af3c5484c397a01a1c3a342 |
|
08-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Add more ThreadOffset in Mir2Lir and backends This duplicates all methods with ThreadOffset parameters, so that both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic checks against the compilation unit's instruction set determine which pointer size to use and therefore which methods to call. Methods with unsupported pointer sizes should fatally fail, as this indicates an issue during method selection. Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
|
674744e635ddbdfb311fbd25b5a27356560d30c3 |
|
24-Apr-2014 |
Vladimir Marko <vmarko@google.com> |
Use atomic load/store for volatile IGET/IPUT/SGET/SPUT. Bug: 14112919 Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
|
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 |
|
07-May-2014 |
Vladimir Marko <vmarko@google.com> |
Cleanup ARM load/store wide and remove unused param s_reg. Use a single LDRD/VLDR instruction for wide load/store on ARM, adjust the base pointer if needed. Remove unused parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp() and StoreBaseIndexedDisp() on all architectures. Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
|
455759b5702b9435b91d1b4dada22c4cce7cae3c |
|
06-May-2014 |
Vladimir Marko <vmarko@google.com> |
Remove LoadBaseDispWide and StoreBaseDispWide. Just pass k64 or kDouble to non-wide versions. Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
8668c3cbdcf9471bd97e0da68a240051f2973074 |
|
25-Apr-2014 |
Mathieu Chartier <mathieuc@google.com> |
Add finalizer references from the entrypoints. We now have an invariant where we never allocate finalizable objects with the Initialized or Resolved entrypoints. This speeds up allocation by only doing the check in the slow path. Before: MemAllocTest: 3625, 3707, 3641 EvaluateAndApplyChanges: 3448, 3421, 3413 After: MemAllocTest: 3164, 3109, 3135 EvaluateAndApplyChanges: 3272, 3299, 3353 Bug: 14078487 Change-Id: I2b0534af3e7c75ea5e5257cf3647744f7abfb74e
|
3b004ba749beb0fa0f0601fb5cf1929cc46bc1f1 |
|
30-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Simplify GenConstString Now it's the same code for all platforms. Bug: 13506069 Change-Id: I42c08a9dc99a3079caad01602de84296c9357dd8
|
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 |
|
25-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
|
9c3b089519792245ab9f658865f44b8639b8d696 |
|
24-Apr-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Extracts an utility function of the duplicated code"" This reverts commit b5a14d2a6c18c1ea3c019c53b10af2e8f5dea234. Change-Id: Id09a4cc27ac22db940badf3a277848b38b173eae
|
b5a14d2a6c18c1ea3c019c53b10af2e8f5dea234 |
|
24-Apr-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Extracts an utility function of the duplicated code" This reverts commit 973cc95da6fb617bab133bd7a693c1cb7eafd393. Change-Id: I3883c74ba06116e89d89d9cf085f20cff5d15f77
|
973cc95da6fb617bab133bd7a693c1cb7eafd393 |
|
18-Apr-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
ART: Extracts an utility function of the duplicated code This patch introduces an utility function 'DataOffsetOfType' in 'mirror::Array' class that calculates the data offset at given index in an array of given type. Change-Id: Idb19558653c70a129245f220f0fbb553f898865b Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
|
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb |
|
19-Apr-2014 |
buzbee <buzbee@google.com> |
Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
3a74d15ccc9a902874473ac9632e568b19b91b1c |
|
22-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Delete throw launchpads. Bug: 13170824 Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
|
80365d9bb947edef0eae0bfe62b9f7a239416e6b |
|
18-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Revert "Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException."" This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException. And fix the host test crash. Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
|
7fff544c38f0dec3a213236bb785c3ca13d21a0f |
|
18-Apr-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException." This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
|
d15f4e2ef3b1b4c01a490a00b0f6dc744741ce01 |
|
18-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Fix a use of OpCondBranch that breaks the MIPS build. Change-Id: I09e19cb00c1e3f4bc0b0293f58674c9160094f7f
|
9d46314a309aff327f9913789b5f61200c162609 |
|
18-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing ArrayOutOfBoundsException. Get rid of launchpads for throwing ArrayOutOfBoundsException and use LIRSlowPath instead. Bug: 13170824 Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
|
b4b06678125131367999135e634055509b77b9e8 |
|
16-Apr-2014 |
Yevgeny Rouban <yevgeny.y.rouban@intel.com> |
Fix volatile wide put/get to be atomic on x86 arch Current implementation puts memory barriers for volatile fields. Volatile semantics needs atomicity, which are not guaranteed by memory barriers. The patch forces all wide volatile fields to be loaded/stored using xmm registers. Change-Id: Ie78e186d13ffa237e6e93747b71d26651fa02866 Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
|
e643a179cf5585ba6bafdd4fa51730d9f50c06f6 |
|
08-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing NPE. Get rid of launchpads for throwing NPE and use LIRSlowPath instead. Also clean up some code of using LIRSlowPath for checking div by zero. Bug: 13170824 Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
|
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b |
|
10-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
f9487c039efb4112616d438593a2ab02792e0304 |
|
09-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
|
4289456fa265b833434c2a8eee9e7a16da31c524 |
|
07-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing div by zero exception. Get rid of launchpads for throwing div by zero exception and use LIRSlowPath instead. Add a CallRuntimeHelper that takes no argument for the runtime function. Bug: 13170824 Change-Id: I7e0563e736c6f92bd63e3fbdfe3a777ad333e338
|
081f73e888b3c246cf7635db37b7f1105cf1a2ff |
|
07-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
754ddad084ccb610d0cf486f6131bdc69bae5bc6 |
|
19-Feb-2014 |
Dave Allison <dallison@google.com> |
Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
3da67a558f1fd3d8a157d8044d521753f3f99ac8 |
|
03-Apr-2014 |
Dave Allison <dallison@google.com> |
Add OpEndIT() for marking the end of OpIT blocks In ARM we need to prevent code motion to the inside of an IT block. This was done using a GenBarrier() to mark the end, but it wasn't obvious that this is what was happening. This CL adds an explicit OpEndIT() that takes the LIR of the OpIT for future checks. Bug: 13751744 Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
|
f9719f9abbea060e086fe1304d72be50cbc8808e |
|
02-Apr-2014 |
Zheng Xu <zheng.xu@arm.com> |
ARM: enable optimisation for easy multiply, add modulus pattern. Fix the issue when src/dest registers overlap in easy multiply. Change-Id: Ie8cc098c29c74fd06c1b67359ef94f2c6b88a71e
|
6a58cb16d803c9a7b3a75ccac8be19dd9d4e520d |
|
02-Apr-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
art: Handle x86_64 architecture equal to x86 This patch forces FE/ME to treat x86_64 as x86 exactly. The x86_64 logic will be revised later when assembly will be ready. Change-Id: I4a92477a6eeaa9a11fd710d35c602d8d6f88cbb6 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
|
43a065ce1dda78e963868f9753a6e263721af927 |
|
02-Apr-2014 |
Dave Allison <dallison@google.com> |
Add GenBarrier() calls to terminate all IT blocks. This is needed to prevent things like load hoisting from putting instructions inside the IT block. Bug: 13749123 Change-Id: I98a010453b163ac20a90f626144f798fc06e65a9
|
dd7624d2b9e599d57762d12031b10b89defc9807 |
|
15-Mar-2014 |
Ian Rogers <irogers@google.com> |
Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
|
f943914730db8ad2ff03d49a2cacd31885d08fd7 |
|
27-Mar-2014 |
Dave Allison <dallison@google.com> |
Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
|
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 |
|
28-Mar-2014 |
Ian Rogers <irogers@google.com> |
Revert "Revert "Optimize easy multiply and easy div remainder."" This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088. Remove the part of the change that confused !is_div with being multiply rather than implying remainder. Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
|
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb. (cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088) Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
|
3654a6f50a948ead89627f398aaf86a2c2db0088 |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
|
08df4b3da75366e5db37e696eaa7e855cba01deb |
|
25-Mar-2014 |
Zheng Xu <zheng.xu@arm.com> |
Optimize easy multiply and easy div remainder. Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters. Add special cases for *0 and *1. Add more easy multiply special cases for Arm. Reuse easy multiply in SmallLiteralDivRem() to support remainder cases. Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
99ad7230ccaace93bf323dea9790f35fe991a4a2 |
|
26-Feb-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Relaxed memory barriers for x86 X86 provides stronger memory guarantees and thus the memory barriers can be optimized. This patch ensures that all memory barriers for x86 are treated as scheduling barriers. And in cases where a barrier is needed (StoreLoad case), an mfence is used. Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
0d507d1e0441e6bd6f3affca3a60774ea920f317 |
|
19-Mar-2014 |
Mathieu Chartier <mathieuc@google.com> |
Optimize stack overflow handling. We now subtract the frame size from the stack pointer for methods which have a frame smaller than a certain size. Also changed code to use slow paths instead of launchpads. Delete kStackOverflow launchpad since it is no longer needed. ARM optimizations: One less move per stack overflow check (without fault handler for stack overflows). Use ldr pc instead of ldr r12, b r12. Code size (boot.oat): Before: 58405348 After: 57803236 TODO: X86 doesn't have the case for large frames. This could case an incoming signal to go past the end of the stack (unlikely however). Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
|
530e9a502f705b952968e93332d6873d3e0b32d4 |
|
18-Mar-2014 |
Mathieu Chartier <mathieuc@google.com> |
Fix build. (cherry picked from commit 06cbdb966d40153d21035a2595e7d1d23a2a9b8c) Change-Id: I2f7b1077d5eb735a1f85ab118212ed4a48dc7e76
|
06cbdb966d40153d21035a2595e7d1d23a2a9b8c |
|
18-Mar-2014 |
Mathieu Chartier <mathieuc@google.com> |
Fix build. Change-Id: I5aa92bad1792c59b1c0ccddff92325ec4c907ac2
|
60d7a65f7fb60f502160a2e479e86014c7787553 |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d (cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
|
c0f96d03a1855fda7d94332331b94860404874dd |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
|
bfea9c29e809e04bde4a46591fea64c5a7b922fb |
|
17-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Class initialization check elimination. Also, move null check elimination temporaries to the ScopedArenaAllocator and reuse the same variables in the class initialization check elimination. Change-Id: Ic746f95427065506fa6016d4931e4cb8b34937af
|
9a84ad99aab002e8cfeec9381df971a00e28d05f |
|
13-Mar-2014 |
Zheng Xu <zheng.xu@arm.com> |
Remove duplicated register load in function call to quick entry. CallRuntimeHelperRegLocation() will load parameter to registers for native call, but GenConversionCall() duplicates the work before it call CallRuntimeHelperRegLocation(). Instructions generated before patch: 0xf731007e: f8d9e25c ldr.w lr, [r9, #604] ; pF2l 0xf7310082: ee180a10 vmov.f32 r0, s16 0xf7310086: ee180a10 vmov.f32 r0, s16 0xf731008a: 47f0 blx lr After: 0xf739707e: f8d9e25c ldr.w lr, [r9, #604] ; pF2l 0xf7397082: ee180a10 vmov.f32 r0, s16 0xf7397086: 47f0 blx lr Change-Id: I1868aefa4703a0f8133eaac707f5b80f01293cb8
|
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 |
|
21-Feb-2014 |
Dave Allison <dallison@google.com> |
Implicit null/suspend checks (oat version bump) This adds the ability to use SEGV signals to throw NullPointerException exceptions from Java code rather than having the compiler generate explicit comparisons and branches. It does this by using sigaction to trap SIGSEGV and when triggered makes sure it's in compiled code and if so, sets the return address to the entry point to throw the exception. It also uses this signal mechanism to determine whether to check for thread suspension. Instead of the compiler generating calls to a function to check for threads being suspended, the compiler will now load indirect via an address in the TLS area. To trigger a suspend, the contents of this address are changed from something valid to 0. A SIGSEGV will occur and the handler will check for a valid instruction pattern before invoking the thread suspension check code. If a user program taps SIGSEGV it will prevent our signal handler working. This will cause a failure in the runtime. There are two signal handlers at present. You can control them individually using the flags -implicit-checks: on the runtime command line. This takes a string parameter, a comma separated set of strings. Each can be one of: none switch off null null pointer checks suspend suspend checks all all checks So to switch only suspend checks on, pass: -implicit-checks:suspend There is also -explicit-checks to provide the reverse once we change the default. For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar The default is -implicit-checks:none There is also a property 'dalvik.vm.implicit_checks' whose value is the same string as the command option. The default is 'none'. For example to switch on null checks using the option: setprop dalvik.vm.implicit_checks null It only works for ARM right now. Bumps OAT version number due to change to Thread offsets. Bug: 13121132 Change-Id: If743849138162f3c7c44a523247e413785677370
|
3bc8615332b7848dec8c2297a40f7e4d176c0efb |
|
13-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Use LIRSlowPath for intrinsics, improve String.indexOf(). Rewrite intrinsic launchpads to use the LIRSlowPath. Improve String.indexOf for constant chars by avoiding the check for code points over 0xFFFF. Change-Id: I7fd5583214c5b4ab9c38ee36c5d6f003dd6345a8
|
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 |
|
28-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Revert "Rework Quick compiler's register handling"" This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace. Ready. Fixed the original type, plus some mechanical changes for rebasing. Still needs additional testing, but the problem with the original CL appears to have been a typo in the definition of the x86 double return template RegLocation. Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
|
be0e546730e532ef0987cd4bde2c6f5a1b14dd2a |
|
26-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Cache field lowering info in mir_graph. Change-Id: I9f9d76e3ae6c31e88bdf3f59820d31a625da020f
|
86ec520fc8b696ed6f164d7b756009ecd6e4aace |
|
26-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Rework Quick compiler's register handling" This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c. Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
|
2c1ed456dcdb027d097825dd98dbe48c71599b6c |
|
20-Feb-2014 |
buzbee <buzbee@google.com> |
Rework Quick compiler's register handling For historical reasons, the Quick backend found it convenient to consider all 64-bit Dalvik values held in registers to be contained in a pair of 32-bit registers. Though this worked well for ARM (with double-precision registers also treated as a pair of 32-bit single-precision registers) it doesn't play well with other targets. And, it is somewhat problematic for 64-bit architectures. This is the first of several CLs that will rework the way the Quick backend deals with physical registers. The goal is to eliminate the "64-bit value backed with 32-bit register pair" requirement from the target-indendent portions of the backend and support 64-bit registers throughout. The key RegLocation struct, which describes the location of Dalvik virtual register & register pairs, previously contained fields for high and low physical registers. The low_reg and high_reg fields are being replaced with a new type: RegStorage. There will be a single instance of RegStorage for each RegLocation. Note that RegStorage does not increase the space used. It is 16 bits wide, the same as the sum of the 8-bit low_reg and high_reg fields. At a target-independent level, it will describe whether the physical register storage associated with the Dalvik value is a single 32 bit, single 64 bit, pair of 32 bit or vector. The actual register number encoding is left to the target-dependent code layer. Because physical register handling is pervasive throughout the backend, this restructuring necessarily involves large CLs with lots of changes. I'm going to roll these out in stages, and attempt to segregate the CLs with largely mechanical changes from those which restructure or rework the logic. This CL is of the mechanical change variety - it replaces low_reg and high_reg from RegLocation and introduces RegStorage. It also includes a lot of new code (such as many calls to GetReg()) that should go away in upcoming CLs. The tentative plan for the subsequent CLs is: o Rework standard register utilities such as AllocReg() and FreeReg() to use RegStorage instead of ints. o Rework the target-independent GenXXX, OpXXX, LoadValue, StoreValue, etc. routines to take RegStorage rather than int register encodings. o Take advantage of the vector representation and eliminate the current vector field in RegLocation. o Replace the "wide" variants of codegen utilities that take low_reg/high_reg pairs with versions that use RegStorage. o Add 64-bit register target independent codegen utilities where possible, and where not virtualize with 32-bit general register and 64-bit general register variants in the target dependent layer. o Expand/rework the LIR def/use flags to allow for more registers (currently, we lose out on 16 MIPS floating point regs as well as ARM's D16..D31 for lack of space in the masks). o [Possibly] move the float/non-float determination of a register from the target-dependent encoding to RegStorage. In other words, replace IsFpReg(register_encoding_bits). At the end of the day, all code in the target independent layer should be using RegStorage, as should much of the target dependent layer. Ideally, we won't be using the physical register number encoding extracted from RegStorage (i.e. GetReg()) until the NewLIRx() layer. Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
|
9c86a0279aaf953377aa9e2277592e68bf814989 |
|
21-Feb-2014 |
Ian Rogers <irogers@google.com> |
Revert "Annotate used fields." This reverts commit 7f6cf56942c8469958b273ea968db253051c5b05. Change-Id: Ic389a194c3404ecb5bb563a405bf4a0d6336ea0d
|
7f6cf56942c8469958b273ea968db253051c5b05 |
|
29-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Annotate used fields. Annotate all fields used by a method early during the compilation, check acces rights and record field offset, volatility, etc. Use these annotations when generating code for IGET/IPUT/SGET/SPUT instructions. Change-Id: I4bbf5cca4fecf53c9bf9c93ac1793e2f40c16b5f
|
6607d97166984ce578817269f9775c15b9044190 |
|
10-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Tweak Mir2Lir::GenInstanceofCallingHelper for X86 Make this virtual, and split out the X86 logic. Take advantage of SETcc instruction for X86. I don't think I can do much more due to need to preserve arguments for the calls. Change-Id: I10e3eaa61b61ceac384267e3078bb6f75c37cee4 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
55d0eac918321e0525f6e6491f36a80977e0d416 |
|
06-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Support Direct Method/Type access for X86 Thumb generates code to optimize calls to methods within core.oat. Implement this for X86 as well, but take advantage of mov with 32 bit immediate and call relative with 32 bit immediate. Fix some incorrect return locations for long inlines. Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bcec6fba95ee7974d3f7b81c3c02e7eb3ca3df00 |
|
17-Jan-2014 |
Dave Allison <dallison@google.com> |
Make slow paths easier to write This adds a class LIRSlowPath that allows for deferred compilation of slow paths. Using this object you can add code that will be invoked out of line using a forward branch. The intention is to move the slow paths out of the main flow and avoid branch-over constructs that will almost always trigger. The forward branch to the slow path code will be predicted false and this will be correct most of the time. The slow path code returns to the instruction after the original branch using an unconditional branch. This is used in the following opcodes: sput, sget, const-string, check-cast, const-class. Others will follow. Bug: 10864890 Change-Id: I17130c5dc20d369bc6bbf50b8cf04343263e888e
|
feb2b4e2d1c6538777bb80b60f3a247537b6221d |
|
28-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Redo x86 int arithmetic Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it to allow use of memory operands and knowledge of the fact that x86 has (mostly) two operand instructions. Remove x86 specific code from the generic version. Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide cases. Add some x86 helper routines to simplify generation. Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
df8ee2ea9908db3dde463fed68391b0040517653 |
|
28-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
x86 updates GenInlinedUnsafePut/GenInstanceofFinal Allow x86 to inline GenInlinedUnsafePut by freeing up a temporary register early. Make an x86 specific version of GenInstanceofFinal that uses compare to memory and a setCC instruction. Change-Id: I67788d7ae83776b0b9069fe4b379452190774992 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2730db03beee4d6687ddfb5000c33c0370fbc6eb |
|
27-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Add VerfiedMethod to DexCompilationUnit. Avoid some mutex locking and map lookups. Change-Id: I8e0486af77e38dcd065569572a6b985eb57f4f63
|
766e9295d2c34cd1846d81610c9045b5d5093ddd |
|
27-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve GenConstString, GenS{get,put} for x86 Rewrite GenConstString for x86 to skip calling ResolveString when the string is already resolved. Also try to avoid a register copy if the Method* is in a promoted register. Implement the TODO for GenS{get,put} to use compare to memory for x86 by adding a new codegen function to compare directly to memory. Implement a default implementation that uses a temporary register for RISC architectures. Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bb8f0ab736b61db8f543e433859272e83f96ee9b |
|
28-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Embed array class pointers at array allocation sites. Following https://android-review.googlesource.com/#/c/79302, embed array class pointers at array allocation sites in the compiled code. Change-Id: I67a1292466dfbb7f48e746e5060e992dd93525c5
|
4708dcd68eebf1173aef1097dad8ab13466059aa |
|
22-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve x86 long multiply and shifts Generate inline code for long shifts by constants and do long multiplication inline. Convert multiplication by a constant to a shift when we can. Fix some x86 assembler problems and add the new instructions that were needed (64 bit shifts). Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2bf31e67694da24a19fc1f328285cebb1a4b9964 |
|
23-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve x86 long divide Implement inline division for literal and variable divisors. Use the general case for dividing by a literal by using a double length multiply by the appropriate constant with fixups. This is the Hacker's Delight algorithm. Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
be1ca55db3362f5b100c4c65da5342fd299520bb |
|
15-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Use direct class pointers at allocation sites in the compiled code. - Rather than looking up a class from its type ID (and checking if it's resolved/initialized, resolving/initializing if not), use direct class pointers, if possible (boot-code-to-boot-class pointers and app-code-to-boot-class pointers.) - This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4. - Embedding the object size (along with class pointers) caused a 1-2% slowdown in MemAllocTest and isn't implemented in this change. - TODO: do the same for array allocations. - TODO: when/if an application gets its own image, implement app-code-to-app-class pointers. - Fix a -XX:gc bug. cf. https://android-review.googlesource.com/79460/ - Add /tmp/android-data/dalvik-cache to the list of locations to remove oat files in clean-oat-host. cf. https://android-review.googlesource.com/79550 - Add back a dropped UNLIKELY in FindMethodFromCode(). cf. https://android-review.googlesource.com/74205 Bug: 9986565 Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
|
e02d48fb24747f90fd893e1c3572bb3c500afced |
|
15-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Optimize x86 long arithmetic Be smarter about taking advantage of a constant operand for x86 long add/sub/and/or/xor. Using instructions with immediates and generating results directly into memory reduces the number of temporary registers and avoids hardcoded register usage. Also rewrite the existing non-const x86 arithmetic to avoid fixed register use, and use the fact that x86 instructions are two operand. Pass the opcode to the XXXLong() routines to easily detect two operand DEX opcodes. Add a new StoreFinalValueWide() routine, which is similar to StoreValueWide, but doesn't do an EvalLoc to allocate registers. The src operand must already be in registers, and it just updates the dest location, and calls the right live/dirty routines to get the src into the dest properly. Change-Id: Iefc16e7bc2236a73dc780d3d5137ae8343171f62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
d61ba4ba6fcde666adb5d5c81b1c32f0534fb2c8 |
|
13-Jan-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Revert "Better support for x86 XMM registers"" This reverts commit 8ff67e3338952c70ccf3b609559bf8cc0f379cfd. Fix applied to loc.fp usage. Change-Id: I1eb3005392544fcf30c595923ed25bcee2dc4859
|
8ff67e3338952c70ccf3b609559bf8cc0f379cfd |
|
11-Jan-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Better support for x86 XMM registers" The invalid usage of loc.fp must be corrected before this change can be submitted. This reverts commit 766a5e5940b469ab40e52770862c81cfec1d835b. Change-Id: I1173a9bf829da89cccd9c2898f5e11164987a22b
|
766a5e5940b469ab40e52770862c81cfec1d835b |
|
10-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Better support for x86 XMM registers Currently, ART Quick mode assumes that a double FP register is composed of two single consecutive FP registers. This is true for ARM and MIPS, but not x86. This means that only half of the 8 XMM registers are available for use by x86 doubles. This patch breaks the assumption that a wide FP RegisterLocation must be a paired set of FP registers. This is done by making some routines in common code virtual and overriding them in the X86Mir2Lir class. For these wide fp locations, the high register is set to the same value as the low register, in order to minimize changes to common code. In a couple of places, the common code checks for this case. The changes are also supposed to allow the possibility of using the XMM registers for vector operations,but that support is still WIP. Change-Id: Ic6ef24ea764991c6f4d9fb88d483a619f5a468cb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
5ddb4104ac605d66693b55b79f26f8b8a5505e63 |
|
07-Jan-2014 |
Ian Rogers <irogers@google.com> |
Remove intialized static storage from dex cache. The initialized static storage array is used by compiled code to determine if for a sget/sput class initialization is necessary. The compiled code typically doesn't require this test as the class is pre-initialized or the class being accessed is the same as the current method. Change-Id: Icbc45e692b3d0ac61e559e69edb6c9b29439e571
|
31c2aac7137b69d5622eea09597500731fbee2ef |
|
09-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Rename ClobberCalleeSave to *Caller*, fix it for x86. Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
|
7020278bce98a0735dc6abcbd33bdf1ed2634f1d |
|
23-Oct-2013 |
Dave Allison <dallison@google.com> |
Support hardware divide instruction Bug: 11299025 Uses sdiv for division and a combo of sdiv, mul and sub for modulus. Only does this on processors that are capable of the sdiv instruction, as determined by the build system. Also provides a command line arg --instruction-set-features= to allow cross compilation. Makefile adds the --instruction-set-features= arg to build-time dex2oat runs and defaults it to something obtained from the target architecture. Provides a GetInstructionSetFeatures() function on CompilerDriver that can be queried for various features. The only feature supported right now is hasDivideInstruction(). Also adds a few more instructions to the ARM disassembler b/11535253 is an addition to this CL to be done later. Change-Id: Ia8aaf801fd94bc71e476902749cf20f74eba9f68
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
a9a8254c920ce8e22210abfc16c9842ce0aea28f |
|
04-Oct-2013 |
Ian Rogers <irogers@google.com> |
Improve quick codegen for aput-object. 1) don't type check known null. 2) if we know types in verify don't check at runtime. 3) if we're runtime checking then move all the code out-of-line. Also, don't set up a callee-save frame for check-cast, do an instance-of test then throw an exception if that fails. Tidy quick entry point of Ldivmod to Lmod which it is on x86 and mips. Fix monitor-enter/exit NPE for MIPS. Fix benign bug in mirror::Class::CannotBeAssignedFromOtherTypes, a byte[] cannot be assigned to from other types. Change-Id: I9cb3859ec70cca71ed79331ec8df5bec969d6745
|
d9c4fc94fa618617f94e1de9af5f034549100753 |
|
02-Oct-2013 |
Ian Rogers <irogers@google.com> |
Inflate contended lock word by suspending owner. Bug 6961405. Don't inflate monitors for Notify and NotifyAll. Tidy lock word, handle recursive lock case alongside unlocked case and move assembly out of line (except for ARM quick). Also handle null in out-of-line assembly as the test is quick and the enter/exit code is already a safepoint. To gain ownership of a monitor on behalf of another thread, monitor contenders must not hold the monitor_lock_, so they wait on a condition variable. Reduce size of per mutex contention log. Be consistent in calling thin lock thread ids just thread ids. Fix potential thread death races caused by the use of FindThreadByThreadId, make it invariant that returned threads are either self or suspended now. Code size reduction on ARM boot.oat 0.2%. Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%, nexus 4 speedup 2.09% on DeltaBlue. Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
|
b48819db07f9a0992a72173380c24249d7fc648a |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
|
252254b130067cd7a5071865e793966871ae0246 |
|
09-Sep-2013 |
buzbee <buzbee@google.com> |
More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so *only* because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
|
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 |
|
06-Sep-2013 |
Ian Rogers <irogers@google.com> |
Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
|
11b63d13f0a3be0f74390b66b58614a37f9aa6c1 |
|
27-Aug-2013 |
buzbee <buzbee@google.com> |
Quick compiler: division by literal fix The constant propagation optimization pass attempts to identify constants in Dalvik virtual registers and handle them more efficiently. The use of small constants in divison, though, was handled incorrectly in that the high level code correctly detected the use of a constant, but the actual code generation routine was only expecting the use of a special constant form opcode. see b/10503566 Change-Id: I88aa4d2eafebb2b1af1a1e88049f1845aefae261
|
ea46f950e7a51585db293cd7f047de190a482414 |
|
30-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Refactor java.lang.reflect implementation Cherry-picked from commit ed41d5c44299ec5d44b8514f6e17f802f48094d1. Move to ArtMethod/Field instead of AbstractMethod/Field and have java.lang.reflect APIs delegate to ArtMethod/ArtField. Bug: 10014286. Change-Id: Iafc1d8c5b62562c9af8fb9fd8c5e1d61270536e7
|
468532ea115657709bc32ee498e701a4c71762d4 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
|
848871b4d8481229c32e0d048a9856e5a9a17ef9 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
|
834b394ee759ed31c5371d8093d7cd8cd90014a8 |
|
31-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Merge remote-tracking branch 'goog/dalvik-dev' into merge-art-to-dalvik-dev Change-Id: I323e9e8c29c3e39d50d9aba93121b26266c52a46
|
7655f29fabc0a12765de828914a18314382e5a35 |
|
29-Jul-2013 |
Ian Rogers <irogers@google.com> |
Portable refactorings. Separate quick from portable entrypoints. Move architectural dependencies into arch. Change-Id: I9adbc0a9782e2959fdc3308215f01e3107632b7c
|
166db04e259ca51838c311891598664deeed85ad |
|
26-Jul-2013 |
Ian Rogers <irogers@google.com> |
Move assembler out of runtime into compiler/utils. Other directory layout bits of clean up. There is still work to separate quick and portable in some files (e.g. argument visitor, proxy..). Change-Id: If8fecffda8ba5c4c47a035f0c622c538c6b58351
|
7934ac288acfb2552bb0b06ec1f61e5820d924a4 |
|
26-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
|
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
|
df62950e7a32031b82360c407d46a37b94188fbb |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|