b7fd412dd21eb362931b3a0716c94fd189a66295 |
|
04-Jun-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Quick: Create GC map based on compiler data. DO NOT MERGE" This reverts commit 7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7. Change-Id: Iadb4462bf8e834c6a847c01ee6eb332a325de22c
|
c8d000a12d853a72999c96e3b73587bad2be6954 |
|
04-Jun-2015 |
Vladimir Marko <vmarko@google.com> |
Revert "Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE" This reverts commit fad2cbf97c71b9742ccd88cc1a5ba13fa918e677. Change-Id: I175dd9e49014b71a300d987678032bd624a99cf1
|
fad2cbf97c71b9742ccd88cc1a5ba13fa918e677 |
|
25-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE Follow-up to https://android-review.googlesource.com/143222 (cherry picked from commit 6e07183e822a32856da9eb60006989496e06a9cc) Change-Id: I916743c845d9568063cd6a4b2ef71e9cbc43dee8
|
7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. DO NOT MERGE The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 (cherry picked from commit 767c752fddc64e280dba507457e4f06002b5f678) Change-Id: Id75428fd0a2f6bdd2ccb20ce75cdeab01150e455
|
3d21bdf8894e780d349c481e5c9e29fe1556051c |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
41b175aba41c9365a1c53b8a1afbd17129c87c14 |
|
19-May-2015 |
Vladimir Marko <vmarko@google.com> |
ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
|
848f70a3d73833fc1bf3032a9ff6812e429661d9 |
|
15-Jan-2014 |
Jeff Hao <jeffhao@google.com> |
Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
|
0d22184ec9e5b1e958c031ac92c7f053de3a13a2 |
|
27-Apr-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "[optimizing] Replace FP divide by power of 2"" This reverts commit 067cae2c86627d2edcf01b918ee601774bc76aeb. Change-Id: Iaaa8772500ea7d3dce6ae0829dc0dc3bbc9c14ca
|
5ea536aa4a6414db01beaf6f8bd8cb9adc5cfc92 |
|
20-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Remove ArtMethod* parameter from dex cache entry points. Load the ArtMethod* using an optimized stack walk instead. This reduces the size of the generated code. Three of the entry points are called only from a slow-path and the fourth (InitializeTypeAndVerifyAccess) is rare and already slow enough that the one or two extra loads (depending on whether we already have the ArtMethod* in a register) are insignificant. And as we're starting to use PC-relative addressing of the dex cache arrays (already done by Quick for the boot image), having the ArtMethod* in a register becomes less likely anyway. Change-Id: Ib19b9d204e355e13bf386662a8b158178bf8ad28
|
2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
fac10700fd99516e8a14f751fe35553021ce6982 |
|
22-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Remove broken Mir2Lir::LocToRegClass(). Its use in intrinsics has been bogus. In all other instances it's been used under the assumption that the inferred type matches the return type of associated calls. However, if the type inference identifies a type mismatch, the assumption doesn't hold and there isn't necessarily a valid value that the function could reasonably return. Bug: 19918641 Change-Id: I050934e6f9eb00427d0b888ee29ae9eeb509bb3f
|
1961b609bfefaedb71cee3651c4f931cc3e7393d |
|
08-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: PC-relative loads from dex cache arrays on x86. Rewrite all PC-relative addressing on x86 and implement PC-relative loads from dex cache arrays. Don't adjust the base to point to the start of the method, let it point to the anchor, i.e. the target of the "call +0" insn. Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
|
1109fb3cacc8bb667979780c2b4b12ce5bb64549 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Implement CFI for Quick. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
|
8c57831b2b07185ee1986b9af68a351e1ca584c3 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Remove the old CFI infrastructure. Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
|
cc23481b66fd1f2b459d82da4852073e32f033aa |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
|
3477307fdf93a1ef9a80d4e096125705c47e8024 |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Use PC-relative dex cache array loads for SGET/SPUT. Change-Id: I890284b73f69120ada5cf9b9ef4a717af3273cd2
|
20f85597828194c12be10d3a927999def066555e |
|
19-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
|
6e07183e822a32856da9eb60006989496e06a9cc |
|
25-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Fix "select" pattern to update data used for GC maps. Follow-up to https://android-review.googlesource.com/143222 Change-Id: I1c12af9a19f76e64fd209f6cc2eaec5587b3083b
|
f6737f7ed741b15cfd60c2530dab69f897540735 |
|
23-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
|
767c752fddc64e280dba507457e4f06002b5f678 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Create GC map based on compiler data. The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 Change-Id: Id3a2f863b16bdb8969df7004c868773084aec421
|
0b40ecf156e309aa17c72a28cd1b0237dbfb8746 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up slow paths. Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
|
22fe45de11ed7afdf21400d2de3abd23f3a62800 |
|
18-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Eliminate check-cast guaranteed by instance-of. Eliminate check-cast if the result of an instance-of with the very same type on the same value is used to branch to the check-cast's block or a dominator of it. Note that there already exists a verifier-based elimination of check-cast but it excludes check-cast on interfaces. This new optimization works for interface types and, since it's GVN-based, it can better recognize when the same reference is used for instance-of and check-cast. Change-Id: Ib315199805099d1cb0534bb4a90dc51baa409685
|
80b96d1a76790527f72a660ac03d9c215eed17ce |
|
19-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Replace a few std::vector with ArenaVector in Mir2Lir. Change-Id: I7867d60afc60f57cdbbfd312f02883854d65c805
|
b666f4805c8ae707ea6fd7f6c7f375e0b000dba8 |
|
18-Feb-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move arenas into runtime Moved arena pool into the runtime. Motivation: Allow GC to use arena allocators, recycle arena pool for linear alloc. Bug: 19264997 Change-Id: I8ddbb6d55ee923a980b28fb656c758c5d7697c2f
|
6ce3eba0f2e6e505ed408cdc40d213c8a512238d |
|
16-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Add suspend checks to special methods. Generate suspend checks at the beginning of special methods. If we need to call to runtime, go to the slow path where we create a simplified but valid frame, spill all arguments, call art_quick_test_suspend, restore necessary arguments and return back to the fast path. This keeps the fast path overhead to a minimum. Bug: 19245639 Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
|
e4fcc5ba2284c201c022b52d27f7a1201d696324 |
|
13-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Clean up Scoped-/ArenaAlocator array allocations. Change-Id: Id718f8a4450adf1608306286fa4e6b9194022532
|
72f53af0307b9109a1cfc0671675ce5d45c66d3a |
|
12-Nov-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Remove MIRGraph::dex_pc_to_block_map_ This patch removes MIRGraph::dex_pc_to_block_map_, adds a local variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and updates several functions to pass dex_pc_to_block_map. The goal is to limit the scope of dex_pc_to_block_map and the usage of FindBlock, so that various compiler optimizations cannot rely on dex pc to look up basic blocks to avoid duplicated dex pc issues. Also, this patch changes quick targets to use successor blocks for switch case target generation at Mir2Lir::InstallSwitchTables(). Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
9c462086269324350516b3394d478f1d71a4b5d1 |
|
27-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Even more Quick cleanup Remove Backend. Change-Id: I247cc65ccda6a362ba1a8f5e73e7f12ecd980a87
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
f681570077563bb529a30f9e7c572b837cecfb83 |
|
20-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Make some helpers non-virtual in Mir2Lir These don't need to be virtual. Change-Id: Idca3c0a4e8b5e045d354974bd993492d6c0e70ba
|
d500b53ff8742f76b63c9f7593082d9e8114b85f |
|
17-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Move some definitions around. In case a method is already virtual, avoid instruction-set tests. Change-Id: I8d98f098e55ade1bc0cfa32bb2aad006caccd07d
|
7e499925f8b4da46ae51040e9322690f3df992e6 |
|
06-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
|
1cc7dbabd03e0a6c09d68161417a21bd6f9df371 |
|
18-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Reorder entrypoint argument order Shuffle the ArtMethod* referrer backwards for easier removal. Clean up ARM & MIPS assembly code. Change some macros to make future changes easier. Change-Id: Ie2862b68bd6e519438e83eecd9e1611df51d7945
|
e21dc3db191df04c100620965bee4617b3b24397 |
|
09-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Swap-space in the compiler Introduce a swap-space and corresponding allocator to transparently switch native allocations to memory backed by a file. Bug: 18596910 (cherry picked from commit 62746d8d9c4400e4764f162b22bfb1a32be287a9) Change-Id: I131448f3907115054a592af73db86d2b9257ea33
|
8b858e16563ebf8e522df026a6ab409f1bd9b3de |
|
27-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Quick: Redefine the notion of back-egdes. Redefine a back-edge to really mean an edge to a loop head instead of comparing instruction offsets. Generate suspend checks also on fall-through to a loop head; insert an extra GOTO for these edges. Add suspend checks to fused cmp instructions. Rewrite suspend check elimination to track whether there is an invoke on each path from the loop head to a given back edge, instead of using domination info to look for a basic block with invoke that must be on each path. Ignore invokes to intrinsics and move the optimization to a its own pass. The new loops in 109-suspend-check should prevent intrinsics and fused cmp-related regressions. Bug: 18522004 Change-Id: I96ac818f76ccf9419a6e70e9ec00555f9d487a9e
|
717a3e447c6f7a922cf9c3efe522747a187a045d |
|
13-Nov-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Re-factor Quick ABI support Now every architecture must provide a mapper between VRs parameters and physical registers. Additionally as a helper function architecture can provide a bulk copy helper for GenDalvikArgs utility. All other things becomes a common code stuff: GetArgMappingToPhysicalReg, GenDalvikArgsNoRange, GenDalvikArgsRange, FlushIns. Mapper now uses shorty representation of input parameters. This is required due to location are not enough to detect the type of parameter (fp or core). For the details see https://android-review.googlesource.com/#/c/113936/. Change-Id: Ie762b921e0acaa936518ee6b63c9a9d25f83e434 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
7ab2fce83cd72c0963128b098a78606e77ea15d5 |
|
28-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Refactor handling of conditional branches with known result. Detect IF_cc and IF_ccZ instructions with known results in the basic block optimization phase (instead for the codegen phase) and replace them with GOTO/NOP. Kill blocks that are unreachable as a result. Change-Id: I169c2fa6f1e8af685f4f3a7fe622f5da862ce329
|
6af820639c74e769ffc1f54930f6ebc11364f894 |
|
26-Nov-2014 |
Yevgeny Rouban <yevgeny.y.rouban@intel.com> |
ART: x86 specific clearing higher bits when converting long to int The following problem description is taken from https://android-review.googlesource.com/107261 If destination and source of long-to-int is the same physical register on 64-bit then we do not emit any instructions but consider that destination is a 32-bit view of source register. As a result high part contains garbage. If the destination is used later as index to array access then this garbage is used in computation of address because address is 64-bit. For all other cases garbage is just ignored. A generic solution (113023) for all hw platforms was suggested but rejected later for the sake of HW specific solution: https://android-review.googlesource.com/113023 https://android-review.googlesource.com/114436 This patch is a rework of patch 113023 to stick with x86_64 specific changes: for 64-bit target this patch forces generating reg-to-reg copy if the src and dest are the same physical registers. This makes the higher bits be zeroed by 32-bit move instruction. Change-Id: Id29af839506ff9319ffba08b2e86e240fef4dafd Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
|
743b98cd3d7db1cfd6b3d7f7795e8abd9d07a42d |
|
24-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Skip null check in MarkGCCard() for known non-null values. Use GVN's knowledge of non-null values to set a new MIR flag for IPUT/SPUT/APUT to skip the value null check. Change-Id: I97a8d1447acb530c9bbbf7b362add366d1486ee1
|
da96aeda912ff317de2c41e5a49bd244427238ac |
|
27-Oct-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Generate switch targets from successor blocks This patch relies on the successor blocks to generate switch targets in GenSmallPackedSwitch and GenSmallSparseSwitch for all quick targets. In x86, we create a new packed switch table by storing basic block ids instead of dex offsets, and we override MarkPackedCaseLabels and InsertCaseLabel to avoid calling FindBlock. Change-Id: Ibb5983db582f0965aba787b520bd106522453564 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
807140048f82a2b87ee5bcf337f23b6a3d1d5269 |
|
21-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Add fast string sharpening String sharpening changes const strings to PC relative loads instead of always going through the dex cache. This saves code size and probably improves performance slightly. Before: 49602992 system@framework@boot.oat After: 49385904 system@framework@boot.oat Pre-cursor to removing dex_cache_strings_ field from ArtMethod. Bug: 17643507 Change-Id: I1787f48774631eee0accafeea257aa8d0e91e8d6
|
bf535be514570fc33fc0a6347a87dcd9097d9bfd |
|
19-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
|
d582fa4ea62083a7598dded5b82dc2198b3daac7 |
|
06-Nov-2014 |
Ian Rogers <irogers@google.com> |
Instruction set features for ARM64, MIPS and X86. Also, refactor how feature strings are handled so they are additive or subtractive. Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler rather than #ifdefs that wouldn't have worked in cross-compilation. Add SIMD features for x86/x86-64 proposed in: https://android-review.googlesource.com/#/c/112370/ Bug: 18056890 Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
|
675e09b2753c2fcd521bd8f0230a0abf06e9b0e9 |
|
23-Oct-2014 |
Ningsheng Jian <ningsheng.jian@arm.com> |
ARM: Strength reduction for floating-point division For floating-point division by power of two constants, generate multiplication by the reciprocal instead. Change-Id: I39c79eeb26b60cc754ad42045362b79498c755be
|
080dd413e133ae357ab9572d924f7a884315d535 |
|
05-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Clean up arena objects in Mir2Lir. Change-Id: I93fca37be2ae100ddebf80b6ba7a561b187e8886
|
785d2f2116bb57418d81bb55b55a087afee11053 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: Replace COMPILE_ASSERT with static_assert (compiler) Replace all occurrences of COMPILE_ASSERT in the compiler tree. Change-Id: Icc40a38c8bdeaaf7305ab3352a838a2cd7e7d840
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
5667fdbb6e441dee7534ade18b628ed396daf593 |
|
23-Oct-2014 |
Zheng Xu <zheng.xu@arm.com> |
ARM: Use hardfp calling convention between java to java call. This patch default to use hardfp calling convention. Softfp can be enabled by setting kArm32QuickCodeUseSoftFloat to true. We get about -1 ~ +5% performance improvement with different benchmark tests. Hopefully, we should be able to get more performance by address the left TODOs, as some part of the code takes the original assumption which is not optimal. DONE: 1. Interpreter to quick code 2. Quick code to interpreter 3. Transition assembly and callee-saves 4. Trampoline(generic jni, resolution, invoke with access check and etc.) 5. Pass fp arg reg following aapcs(gpr and stack do not follow aapcs) 6. Quick helper assembly routines to handle ABI differences 7. Quick code method entry 8. Quick code method invocation 9. JNI compiler TODO: 10. Rework ArgMap, FlushIn, GenDalvikArgs and affected common code. 11. Rework CallRuntimeHelperXXX(). Change-Id: I9965d8a007f4829f2560b63bcbbde271bdcf6ec2
|
5c5676b26a08454b3f0133783778991bbe5dd681 |
|
30-Sep-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Add div/rem zero check elimination flag Just as with other throwing bytecodes, it is possible to prove in some cases that a divide/remainder won't throw ArithmeticException. For example, in case two divides with same denominator are in order, then provably the second one cannot throw if the first one did not. This patch adds the elimination flag and updates the signature of several Mir2Lir methods to take the instruction optimization flags into account. Change-Id: I0b078cf7f29899f0f059db1f14b65a37444b84e8 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
d8c3e3608a7b47e82186e4f8118541ef06d9eab2 |
|
08-Oct-2014 |
Alexei Zavjalov <alexei.zavjalov@intel.com> |
ART: X86: GenLongArith should handle overlapped VRs In a case, when src and dest VRs are overlapped when we called GenLongArith it may cause the incorrect use of regs. The solution is to map src to an physical reg and work with this reg instead of mem. Renamed BadOverlap() to PartiallyIntersects() for consistency. Change-Id: Ia3fc7f741f0a92556e1b2a1b084506662ef04c9d Signed-off-by: Katkov, Serguei I <serguei.i.katkov@intel.com> Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
|
832336b3c9eb892045a8de1bb12c9361112ca3c5 |
|
09-Oct-2014 |
Ian Rogers <irogers@google.com> |
Don't copy fill array data to quick literal pool. Currently quick copies the fill array data from the dex file to the literal pool. It then has to go through hoops to pass this PC relative address down to out-of-line code. Instead, pass the offset of the table to the out-of-line code and use the CodeItem data associated with the ArtMethod. This reduces the size of oat code while greatly simplifying it. Unify the FillArrayData implementation in quick, portable and the interpreters. Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
|
7c02e918e752ab36f0b6cab7528f10c0cf55a4ee |
|
03-Oct-2014 |
buzbee <buzbee@google.com> |
Quick compiler: Fix ambiguous LoadValue() Internal b/17790197 & hat tip to Stephen Kyle The following custom-edited dex program demonstrated incorrect code generation caused by type confusion. In the example, the constant held in v0 is used in both float and int contexts, and the register class gets confused at the if-eq. .method private static getInt()I .registers 4 const/16 v0, 100 const/4 v1, 1 const/4 v2, 7 :loop if-eq v2, v0, :done add-int v2, v2, v1 goto :loop :done add-float v3, v0, v1 return v2 .end method The bug was introduced in c/96499, "Quick compiler: reference cleanup" That CL created a convenience variant of LoadValue which selected the target register type based on the type of the RegLocation. It should not have done so. The type of a RegLocation is the compiler's best guess of the Dalvik type - and Dalvik allows constants to be used in multiple type contexts. All code generation utilities must specify desired register class based on the capabilities of the instructions to be emitted. In the failing case, OpCmpImmBranch (and GenCompareZeroAndBranch) will be using core registers, so the LoadValue must specify either kCoreReg or kRefReg. The CL deletes the dangerous LoadValue() variant. Change-Id: Ie4ec6e51b19676dbbb9628c72c8b3473a419e7ec
|
f4da675bbc4615c5f854c81964cac9dd1153baea |
|
01-Aug-2014 |
Vladimir Marko <vmarko@google.com> |
Implement method calls using relative BL on ARM. Store the linker patches with each CompiledMethod instead of keeping them in CompilerDriver. Reorganize oat file creation to apply the patches as we're writing the method code. Add framework for platform-specific relative call patches in the OatWriter. Implement relative call patches for ARM. Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
|
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 |
|
22-Sep-2014 |
Vladimir Marko <vmarko@google.com> |
Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
|
4e67841e99e4a206133e7010653ccd132682296a |
|
09-Sep-2014 |
Mathieu Chartier <mathieuc@google.com> |
Change Reference.get() intrinsic to Reference.getReferent(). The reference intrinsic was incorrectly inlining PhantomReference.get(). We now get around this by adding a layer of indirection. Reference.get() now calls getReferent() which is intrinsified and inlined. Requires: https://android-review.googlesource.com/#/c/107100/ Bug: 17429865 (cherry picked from commit cd48f2d86197d4fe87cc88077bc4af5ba66e5295) Change-Id: Ie91e70abf43cedf3c707c7bb8a5059e19d2a2577
|
cd48f2d86197d4fe87cc88077bc4af5ba66e5295 |
|
09-Sep-2014 |
Mathieu Chartier <mathieuc@google.com> |
Change Reference.get() intrinsic to Reference.getReferent(). The reference intrinsic was incorrectly inlining PhantomReference.get(). We now get around this by adding a layer of indirection. Reference.get() now calls getReferent() which is intrinsified and inlined. Requires: https://android-review.googlesource.com/#/c/107100/ Bug: 17429865 Change-Id: Ie91e70abf43cedf3c707c7bb8a5059e19d2a2577
|
8d0d03e24325463f0060abfd05dba5598044e9b1 |
|
07-Jun-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
37f05ef45e0393de812d51261dc293240c17294d |
|
17-Jul-2014 |
Fred Shih <ffred@google.com> |
Reduced memory usage of primitive fields smaller than 4-bytes Reduced memory used by byte and boolean fields from 4 bytes down to a single byte and shorts and chars down to two bytes. Fields are now arranged as Reference followed by decreasing component sizes, with fields shuffled forward as needed. Bug: 8135266 Change-Id: I65eaf31ed27e5bd5ba0c7d4606454b720b074752
|
53c913bb71b218714823c8c87a1f92830c336f61 |
|
13-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Clean up compiler Clean up the compiler: less extern functions, dis-entangle compilers, hide some compiler specifics, lower global includes. Change-Id: Ibaf88d02505d86994d7845cf0075be5041cc8438
|
9a8a506b1cd639ad4126c19530cd206d8d3923c3 |
|
07-Aug-2014 |
Martyn Capewell <martyn.capewell@arm.com> |
AArch64: Improve MIR to LIR translation for abs Improve translation by using a shorter and more efficient sequence for integer abs, and replacing UBFM with AND for FP abs in integer registers. Change-Id: Ifc39cd7806ed637d5cfc3284c435b5d501047eb5 Signed-off-by: Alexandre Rames <alexandre.rames@arm.com>
|
e3ea83811d47152c00abea24a9b420651a33b496 |
|
08-Aug-2014 |
Yevgeny Rouban <yevgeny.y.rouban@intel.com> |
ART source line debug info in OAT files OAT files have source line information enough for ART runtime needs like jump to/from interpreter and thread suspension. But this information is not enough for finer grained source level debugging and low-level profiling (VTune or perf). This patch adds to OAT files two additional sections: .debug_line - DWARF formatted Elf32 section with detailed source line information (mapping from native PC to Java source lines). In addition to the debugging symbols added using the dex2oat option --include-debug-symbols, the source line information is added to the section .debug_line. The source line info can be read by many Elf reading tools like objdump, readelf, dwarfdump, gdb, perf, VTune, ... gdb can use this debug line information in x86. In 64-bit mode the information can be used if the oat file is mapped in the lower address space (address has higher 32 bits zeroed). Relocation works. Testing: 1. art/test/run-test --host --gdb [--64] 001-HelloWorld 2. in gdb: break Main.java:19 3. in gdb: break Runtime.java:111 4. in gdb: run - stops at void java.lang.Runtime.<init>() 5. in gdb: backtrace - shows call stack down to main() 6. in gdb: continue - stops at void Main.main() (only in 32-bit mode) 7. in gdb: backtrace - shows call stack down to main() 8. objdump -W <oat-file> - addresses are from VMA range of .text section reported by objdump -h <file> 9. dwarfdump -ka <oat-file> - no errors expected Size of aosp-x86-eng boot.oat increased by 11% from 80.5Mb to 89.2Mb with two sections added .debug_line (7.2Mb) and .rel.debug (1.5Mb). Change-Id: Ib8828832686e49782a63d5529008ff4814ed9cda Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
|
8c18c2aaedb171f9b03ec49c94b0e33449dc411b |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Bug: 16241558 (cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13) Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
|
e7f82e2515f47f3c3292281312d7031a34a58ffc |
|
06-Aug-2014 |
Fred Shih <ffred@google.com> |
Added support for patching classes from different dex files. Added support for class patching from different dex files and moved ScopedObjectAccess from the quick compiler to driver. Slight refactoring for clarity. Bug: 16656190 Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
|
48971b3242e5126bcd800cc9c68df64596b43d13 |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
|
547cdfd21ee21e4ab9ca8692d6ef47c62ee7ea52 |
|
05-Aug-2014 |
Tong Shen <endlessroad@google.com> |
Emit CFI for x86 & x86_64 JNI compiler. Now for host-side x86 & x86_64 ART, we are able to get complete stacktrace with even mixed C/C++ & Java stack frames. Testing: 1. art/test/run-test --host --gdb [--64] --no-relocate 005 2. In gdb, run 'b art::Class_classForName' which is implementation of a Java native method, then 'r' 3. In gdb, run 'bt'. You should see stack frames down to main() Change-Id: I2d17e9aa0f6d42d374b5362a15ea35a2fce96302
|
c76c614d681d187d815760eb909e5faf488a3c35 |
|
05-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor long ops in quick compiler Make GenArithOpLong virtual. Let the implementation in gen_common be very basic, without instruction-set checks, and meant as a fall-back. Backends should implement and dispatch to code for better implementations. This allows to remove the GenXXXLong virtual methods from Mir2Lir, and clean up the backends (especially removing some LOG(FATAL) implementations). Change-Id: I6366443c0c325c1999582d281608b4fa229343cf
|
8081d2b8d7a743729557051d0294e040e61c747a |
|
31-Jul-2014 |
Vladimir Marko <vmarko@google.com> |
Create allocator adapter for using Arena in std containers. Create ArenaAllocatorAdapter, similar to the existing ScopedArenaAllocatorAdapter, for allocating memory for standard containers via the ArenaAllocator. Add the ability to specify allocation kind rather than just kArenaAllocSTL to both adapters. Move the scoped arena allocator to the scoped_arena_containers.h header file. Define template aliases for containers using the new adapter and change a few MIRGraph and Mir2Lir members to use them. Change-Id: I9bbc50248e0fed81729497b848cb29bf68444268
|
c763e350da562b0c6bebf10599588d4901140e45 |
|
04-Jul-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: Implement InexpensiveConstant methods. Implement IsInexpensiveConstant and friends for A64. Also extending the methods to take the opcode with respect to which the constant is inexpensive. Additionally, logical operations (i.e. and, or, xor) can now handle the immediates 0 and ~0 (which are not logical immediates). Change-Id: I46ce1287703765c5ab54983d13c1b3a1f5838622
|
6bbf0967d217ab2b7bdbb78bfd076b8fb07a44e8 |
|
14-Jul-2014 |
Alexei Zavjalov <alexei.zavjalov@intel.com> |
ART: Implement the easy long division/remainder by a constant Also optimizes long/int divisions by power-of-two values. Also do some clean-up. Change-Id: Ie414e64aac251c81361ae107d157c14439e6dab5 Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
|
2eba1fa7e9e5f91e18ae3778d529520bd2c78d55 |
|
31-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
AArch64: Add inlining support for ceil(), floor(), rint(), round() This patch adds inlining support for the following Math, StrictMath methods in the ARM64 backend: * double ceil(double) * double floor(double) * double rint(double) * long round(double) * int round(float) Also some cleanup. Change-Id: I9f5a2f4065b1313649f4b0c4380b8176703c3fe1 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
|
63999683329612292d534e6be09dbde9480f1250 |
|
15-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
Revert "Revert "Enable Load Store Elimination for ARM and ARM64"" This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: I3aadb12a787164146a95bc314e85fa73ad91e12b
|
c32447bcc8c36ee8ff265ed678c7df86936a9ebe |
|
27-Jul-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Enable Load Store Elimination for ARM and ARM64" On extended testing, I'm seeing a CHECK failure at utility_arm.cc:1201. This reverts commit fcc36ba2a2b8fd10e6eebd21ecb6329606443ded. Change-Id: Icae3d49cd7c8fcab09f2f989cbcb1d7e5c6d137a
|
fcc36ba2a2b8fd10e6eebd21ecb6329606443ded |
|
15-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
Enable Load Store Elimination for ARM and ARM64 This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: Iefae9b696f87f833ef35c451ed4d49c5a1b6fde0
|
984305917bf57b3f8d92965e4715a0370cc5bcfb |
|
28-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework quick entrypoint code in Mir2Lir, cleanup To reduce the complexity of calling trampolines in generic code, introduce an enumeration for entrypoints. Introduce a header that lists the entrypoint enum and exposes a templatized method that translates an enum value to the corresponding thread offset value. Call helpers are rewritten to have an enum parameter instead of the thread offset. Also rewrite LoadHelper and GenConversionCall this way. It is now LoadHelper's duty to select the right thread offset size. Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to further simplify the call helpers, as well as make OpThreadMem specific to X86 only (removed from Mir2Lir). Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they are now specific to X86 only. Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend. Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented. Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
|
bebee4fd10e5db6cb07f59bc0f73297c900ea5f0 |
|
16-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Bug: 16241558 (cherry picked from commit 90969af6deb19b1dbe356d62fe68d8f5698d3d8f) Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
|
0f45f22eb3c52f0ece4c56989180e79c6680d825 |
|
15-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Bug: 16256184 (cherry picked from commit 7ea6f79bbddd69d5db86a8656a31aaaf64ae2582) Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
|
9ee4519afd97121f893f82d41d23164fc6c9ed34 |
|
17-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
x86: GenSelect utility update The is follow-up https://android-review.googlesource.com/#/c/101396/ to make x86 GenSelectConst32 implementation complete. Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
f9d6aede77c700118e225f8312cd888262b77862 |
|
17-Jul-2014 |
Vladimir Marko <vmarko@google.com> |
Use vabs/fabs on arm/arm64 for intrinsic abs(). Bug: 11579369 (cherry picked from 5030d3ee8c6fe10394912ede107cbc8df63b7b16) Change-Id: I7b0596a8e7e3c87a93b225519c5aeedfe4f22e6d
|
7ea6f79bbddd69d5db86a8656a31aaaf64ae2582 |
|
15-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
|
147eb41b53729ec8d5c188d1cac90964a51afb8a |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
|
f12feb8e0e857f2832545b3f28d31bad5a9d3903 |
|
17-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Stack overflow checks and NPE checks for optimizing. Change-Id: I59e97448bf29778769b79b51ee4ea43f43493d96
|
5030d3ee8c6fe10394912ede107cbc8df63b7b16 |
|
17-Jul-2014 |
Vladimir Marko <vmarko@google.com> |
Use vabs/fabs on arm/arm64 for intrinsic abs(). Bug: 11579369 Change-Id: If09da85e22786faa13a2d74f62cee68ea67bd087
|
d85614222fa062ec809af9d65f04ab6b7dc1c248 |
|
11-Jul-2014 |
Fred Shih <ffred@google.com> |
Revert "Revert "Revert "Revert "Add intrinsic for Reference.get()"""" Fixed TargetReg issue causing build failure for x86. This reverts commit 9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f. (cherry picked from commit 4ee7a665e7f9cd2c5ace2d6304e33f64067b209f) Change-Id: I555f4e06955711262e6b37ffbeabee9698ec695c
|
90969af6deb19b1dbe356d62fe68d8f5698d3d8f |
|
16-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
|
69dfe51b684dd9d510dbcb63295fe180f998efde |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
|
d9cb8ae2ed78f957a773af61759432d7a7bf78af |
|
09-Jul-2014 |
Douglas Leung <douglas@mips.com> |
Fix art test failures for Mips. This patch fixes the following art test failures for Mips: 003-omnibus-opcodes 030-bad-finalizer 041-narrowing 059-finalizer-throw Change-Id: I4e0e9ff75f949c92059dd6b8d579450dc15f4467 Signed-off-by: Douglas Leung <douglas@mips.com>
|
4ee7a665e7f9cd2c5ace2d6304e33f64067b209f |
|
11-Jul-2014 |
Fred Shih <ffred@google.com> |
Revert "Revert "Revert "Revert "Add intrinsic for Reference.get()"""" Fixed TargetReg issue causing build failure for x86. This reverts commit 9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f. Change-Id: I7e6a526954467aaf68deeed999880dfe9aa5f06e
|
ed7a0f2fb84b200ab6ef34e30dcbba4c0cf8d435 |
|
10-Jun-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: improve usage of TargetReg() and friends. TargetReg(arg1) does now always return a 32-bit register. We also avoid using this function directly and rather use the two-arguments overload or TargetPtrReg(). Change-Id: I746b3c29a2a2553b399b5c3e7ee3887c7e7c52c3
|
af263df7f643e699abf622c64447d31bacc14c34 |
|
12-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Change GenPCUseDefEncoding(), turn on Load Hoisting for ARM64 This defines the PC resource mask as empty, as the PC is not accessible on ARM64. Unify code paths with x86 in LoadStoreElimination and LoadHoisting. Change-Id: Iea8b9e666f306c7a6ff52b6c5bf7e05b35346b2c
|
a9b870b73a155ce70c867d5b3f9758fab0b45f07 |
|
11-Jul-2014 |
Christopher Ferris <cferris@google.com> |
Revert "Add intrinsic for Reference.get()" This reverts commit 460503b13bc894828a2d2d47d09e5534b3e91aa1. Change-Id: Ie63f43049307e02e3b90f4e034abc9ea54ca4e24
|
ccc60264229ac96d798528d2cb7dbbdd0deca993 |
|
05-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework TargetReg(symbolic_reg, wide) Make the standard implementation in Mir2Lir and the specialized one in the x86 backend return a pair when wide = "true". Introduce WideKind enumeration to improve code readability. Simplify generic code based on this implementation. Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
|
59a42afc2b23d2e241a7e301e2cd68a94fba51e5 |
|
04-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Update counting VR for promotion For 64-bit it makes sense to compute VR uses together for int and long because core reg is shared. Change-Id: Ie8676ece12c928d090da2465dfb4de4e91411920 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
e9f3e71c90094e87ff83bd5449a2fc4d65f717b2 |
|
04-Jul-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Updates to help classes derived from X86Mir2Lir Just a couple of extra changes to help me out. These changes won't affect anyone else. Change-Id: I0e0985a4f16822d5cbfabbf81c9902d34ebdb5da Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
|
d4415e8bd04c4a9367744ff0149597b4f37a0e0a |
|
11-Jul-2014 |
Christopher Ferris <cferris@google.com> |
Revert "Revert "Add intrinsic for Reference.get()"" This reverts commit a9b870b73a155ce70c867d5b3f9758fab0b45f07. Change-Id: Ic2a9b47f2b911bef4b764d10bc33cf000e4b4211
|
9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f |
|
11-Jul-2014 |
Sebastien Hertz <shertz@google.com> |
Revert "Revert "Revert "Add intrinsic for Reference.get()""" This reverts commit d4415e8bd04c4a9367744ff0149597b4f37a0e0a. Change-Id: I34553ccbdcfea35c7742d21be2a74dc7085ab2a0
|
0025a86411145eb7cd4971f9234fc21c7b4aced1 |
|
11-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
|
460503b13bc894828a2d2d47d09e5534b3e91aa1 |
|
18-Jun-2014 |
Fred Shih <ffred@google.com> |
Add intrinsic for Reference.get() Added an intrinsic function for Reference.get(). Return immediately without going through JNI if the slow path is not currently in use. Otherwise, branch off to the the existing JNI function. Approximately 47x speedup for cases where slow path is not enabled. Change-Id: I13ad65a356fe4e104d8d83980694dc2740d7d039
|
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf |
|
29-May-2014 |
Dave Allison <dallison@google.com> |
Add implicit null and stack checks for x86 This adds compiler and runtime changes for x86 implicit checks. 32 bit only. Both host and target are supported. By default, on the host, the implicit checks are null pointer and stack overflow. Suspend is implemented but not switched on. Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
|
3d14eb620716e92c21c4d2c2d11a95be53319791 |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Add implicit null and stack checks for x86" It breaks cross compilation with x86_64. This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf. Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
|
70c4f06f9965cdb9319a2c85f65acda20086d765 |
|
25-Jun-2014 |
DaniilSokolov <daniil.y.sokolov@intel.com> |
ART: Intrinsic implementation for java.lang.System.arraycopy. Implements intrinsic for java.lang.System.arraycopy(char[], int, char[], int, int) - this method is internal to android class libraries and used in such classes as StringBuffer and StringBuilder. It is not possible to call it from application code. The intrinsic for this method is implemented as inline method (assembly code is generated manually). The intrinsic is x86 32 bit only. Change-Id: Id1b1e0a20d5f6d5f5ebfe1fdc2447b6d8a515432 Signed-off-by: Daniil Sokolov <daniil.y.sokolov@intel.com>
|
a77ee5103532abb197f492c14a9e6fb437054e2a |
|
02-Jul-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
x86_64: TargetReg update for x86 Also includes changes in common code. Elimination of use of TargetReg with one parameter and direct access to special target registers. Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 |
|
22-Jun-2014 |
buzbee <buzbee@google.com> |
Register promotion support for 64-bit targets Not sufficiently tested for 64-bit targets, but should be fairly close. A significant amount of refactoring could stil be done, (in later CLs). With this change we are not making any changes to the vmap scheme. As a result, it is a requirement that if a vreg is promoted to both a 32-bit view and the low half of a 64-bit view it must share the same physical register. We may change this restriction later on to allow for more flexibility for 32-bit Arm. For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to promote, we'd end up with something like: v4 (as an int) -> r10 v4/v5 (as a long) -> r10 v5 (as an int) -> r11 v5/v6 (as a long) -> r11 Fix a couple of ARM64 bugs on the way... Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
|
255e014542b2180620230e4d9d6000ae06846bbd |
|
04-Jul-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
Aarch64: fix references handling in Load*Indexed. Fix the way we handle references in Load/StoreBaseIndexed and friends. We assume references are 64-bit RegStorage entities, with the difference that they are load as 32-bit values. Change-Id: I7fe987ef9e97e9a5042b85378b33d1e85710d8b5
|
23abec955e2e733999a1e2c30e4e384e46e5dde4 |
|
02-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
AArch64: Add few more inline functions This patch adds inlining support for the following functions: * Math.max/min(long, long) * Math.max/min(float, float) * Math.max/min(double, double) * Integer.reverse(int) * Long.reverse(long) Change-Id: Ia2b1619fd052358b3a0d23e5fcbfdb823d2029b9 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
|
dd64450b37776f68b9bfc47f8d9a88bc72c95727 |
|
01-Jul-2014 |
Elena Sayapina <elena.v.sayapina@intel.com> |
x86_64: Unify 64-bit check in x86 compiler Update x86-specific Gen64Bit() check with the CompilationUnit target64 field which is set using unified Is64BitInstructionSet(InstructionSet) check. Change-Id: Ic00ac863ed19e4543d7ea878d6c6c76d0bd85ce8 Signed-off-by: Elena Sayapina <elena.v.sayapina@intel.com>
|
4b537a851b686402513a7c4a4e60f5457bb8d7c1 |
|
01-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Quick compiler: More size checks, add TargetReg variants Add variants for TargetReg for requesting specific register usage, e.g., wide and ref. More register size checks. With code adapted from https://android-review.googlesource.com/#/c/98605/. Change-Id: I852d3be509d4dcd242c7283da702a2a76357278d
|
de68676b24f61a55adc0b22fe828f036a5925c41 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
3c12c512faf6837844d5465b23b9410889e5eb11 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d |
|
23-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
c61b3c984c509d5f7c8eb71b853c81a34b5c28ef |
|
18-Jun-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: implement easy division and reminder. This implements easy division and reminder for integer only (32-bit). The optimisation applies to div/rem by powers of 2 and to div by small literals (between 3-15). Change-Id: I71be7c4de5d2e2e738b88984f13efb08f4388a19
|
7cd26f355ba83be75b72ed628ed5ee84a3245c4f |
|
19-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Target-dependent stack overflow, less check elision Refactor the separate stack overflow reserved sizes from thread.h into instruction_set.h and make sure they're used in the compiler. Refactor the decision on when to elide stack overflow checks: especially with large interpreter stack frames, it is not a good idea to elide checks when the frame size is even close to the reserved size. Currently enforce checks when the frame size is >= 2KB, but make sure that frame sizes 1KB and below will elide the checks (number from experience). Bug: 15728765 Change-Id: I016bfd3d8218170cbccbd123ed5e2203db167c06
|
7071c8d5885175a746723a3b38a347855965be08 |
|
05-Mar-2014 |
Yixin Shou <yixin.shou@intel.com> |
Add x86 inlined abs method for float/double Add the optimized implementation of inlined abs method for float/double for X86 side. Change-Id: I2f367542f321d88a976129f9f7156fd3c2965c8a Signed-off-by: Yixin Shou <yixin.shou@intel.com>
|
4c115b85cc48f4dfc8fc2b0484ddfeb29f02d658 |
|
17-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Revert "Add x86 inlined abs method for float/double" This reverts commit e88b89ad1d1a583daf205c7a387ba13f549f95f1. Change-Id: I2ba21b7442ba3696482d45001e6bd32e8baf9d1f
|
e88b89ad1d1a583daf205c7a387ba13f549f95f1 |
|
05-Mar-2014 |
Yixin Shou <yixin.shou@intel.com> |
Add x86 inlined abs method for float/double Add the optimized implementation of inlined abs method for float/double for X86 side. Change-Id: I4e095644a90524354040174954c1e127c7bb4ee2 Signed-off-by: Yixin Shou <yixin.shou@intel.com>
|
5aa6e04061ced68cca8111af1e9c19781b8a9c5d |
|
14-Jun-2014 |
Ian Rogers <irogers@google.com> |
Tidy x86 assembler. Use helper functions to compute when the kind has a SIB, a ModRM and RegReg form. Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
|
169489b4f4be8c5dd880ba6f152948324d22ff79 |
|
11-Jun-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
AArch64: Add support for inlined methods This patch adds support for Arm64 inlined methods. Change-Id: Ic6aeed6d2d32f65cd1e63cf482f83cdcf958798a
|
8dea81ca9c0201ceaa88086b927a5838a06a3e69 |
|
06-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
58994cdb00b323339bd83828eddc53976048006f |
|
16-May-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
x86_64: Hard Float ABI support in QCG This patch shows our efforts on resolving the ART limitations: - passing "float"/"double" arguments via FPR - passing "long" arguments via single GPR, not pair - passing more than 3 agruments via GPR. Work done: - Extended SpecialTargetRegister enum with kARG4, kARG5, fARG4..fARG7. - Created initial LoadArgRegs/GenDalvikX/FlushIns version in X86Mir2Lir. - Unlimited number of long/double/float arguments support - Refactored (v2) Change-Id: I5deadd320b4341d5b2f50ba6fa4a98031abc3902 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
089142cf1d0c028b5a7c703baf0b97f4a4ada3f7 |
|
05-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Avoid register pool allocations on the heap. Create a helper template class ArrayRef and use it instead of std::vector<> for register pools in target_<arch>.cc to avoid these heap allocations during program startup. Change-Id: I4ab0205af9c1d28a239c0a105fcdc60ba800a70a
|
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 |
|
01-Jun-2014 |
buzbee <buzbee@google.com> |
Quick compiler: reference cleanup For 32-bit targets, object references are 32 bits wide both in Dalvik virtual registers and in core physical registers. Because of this, object references and non-floating point values were both handled as if they had the same register class (kCoreReg). However, for 64-bit systems, references are 32 bits in Dalvik vregs, but 64 bits in physical registers. Although the same underlying physical core registers will still be used for object reference and non-float values, different register class views will be used to represent them. For example, an object reference in arm64 might be held in x3 at some point, while the same underlying physical register, w3, would be used to hold a 32-bit int. This CL breaks apart the handling of object reference and non-float values to allow the proper register class (or register view) to be used. A new register class, kRefReg, is introduced which will map to a 32-bit core register on 32-bit targets, and 64-bit core registers on 64-bit targets. From this point on, object references should be allocated registers in the kRefReg class rather than kCoreReg. Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
|
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 |
|
03-Jun-2014 |
Tim Murray <timmurray@google.com> |
DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
|
0955f7e470fb733aef07096536e9fba7c99250aa |
|
23-May-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: fixing some assertions. Fixing some assertions while attempting to get libartd.so to work. Fixing also the shift logic in LoadBaseIndexed() and StoreBaseIndexed(). This commit only fixes a part of the assertion issues. Change-Id: I473194d4260dd59a8ee6d73114429728c977ee0e
|
85089dd28a39dd20f42ac258398b2a08668f9ef1 |
|
26-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: generalize NarrowRegLoc() Some of the RegStorage utilites (DoubleToLowSingle(), DoubleToHighSingle(), etc.) worked only for targets which which treat double precision registers as a pair of aliased single precision registers. This CL elminates those utilities, and replaces them with a new RegisterInfo utility that will search an aliased register set and return the member matching the required storage configuration (if it exists). Change-Id: Iff5de10f467d20a56e1a89df9fbf30d1cf63c240
|
642fe34958ba7fafa81341823241616edde0380c |
|
24-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: fix register clobbering. Ensure all aliased children of a register set are clobbered when any member is clobbered. Additionally, use a clobbering mask to avoid clobbering non-overlapping siblings. Change-Id: Ic0d88a30f3e5b7a359396f6541d602739fa3124a
|
ed65c5e982705defdb597d94d1aa3f2997239c9b |
|
22-May-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
AArch64: Enable LONG_* and INT_* opcodes. This patch fixes some of the issues with LONG and INT opcodes. The patch has been tested and passes all the dalvik tests except for 018 and 107. Change-Id: Idd1923ed935ee8236ab0c7e5fa969eaefeea8708 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
|
a51a0b0300268b605e3ad71b0e87ff394032c5e7 |
|
21-May-2014 |
Vladimir Marko <vmarko@google.com> |
Method inlining across dex files in boot image. Fix LoadCodeAddress() and LoadMethodAddress() to use the dex file in addition to the method index to uniquely identify the literal. With that fix in place, when we have both the direct code and the direct method, we can safely pass the actual target method id instead of the method id from the same dex file in the method lowering info. This was already done for calls from apps into boot image (and thus there was a bug with a tiny risk of the wrong literal being used) and now we also do that for calls within the boot image. The latter allows the inlining pass to inline many more methods than before in the boot image. Bug: 15021903 Change-Id: Ic765ce9809b43ef07e7db32b8e3fbc9acb09147f
|
b01bf15d18f9b08d77e7a3c6e2897af0e02bf8ca |
|
14-May-2014 |
buzbee <buzbee@google.com> |
64-bit temp register support. Add a 64-bit temp register allocation path. The recent physical register handling rework supports multiple views of the same physical register (or, such as for Arm's float/double regs, different parts of the same physical register). This CL adds a 64-bit core register view for 64-bit targets. In short, each core register will have a 64-bit name, and a 32-bit name. The different views will be kept in separate register pools, but aliasing will be tracked. The core temp register allocation routines will be largely identical - except for 32-bit targets, which will continue to use pairs of 32-bit core registers for holding long values. Change-Id: I8f118e845eac7903ad8b6dcec1952f185023c053
|
e87f9b5185379c8cf8392d65a63e7bf7e51b97e7 |
|
30-Apr-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Allow X86 QBE to be extended Enhancements and updates to allow X86Mir2LIR Backend to be subclassed for experimentation. Add virtual in a whole bunch of places, and make some other changes to get this to work. Change-Id: I0980a19bc5d5725f91660f98c95f1f51c17ee9b6 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
082833c8d577db0b2bebc100602f31e4e971613e |
|
18-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler, out of registers fix It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This could result in "out of registers" failures, as well as other more subtle problems. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Cherry-pick of internal change 468162 Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
|
05d3aeb33683b16837741f9348d6fba9a8432068 |
|
18-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler, out of registers fix Fixes b/15024623 It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
|
d65c51a556e6649db4e18bd083c8fec37607a442 |
|
29-Apr-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
ART: Add support for constant vector literals Add in some vector instructions. Implement the ConstVector instruction, which takes 4 words of data and loads it into an XMM register. Initially, only the ConstVector MIR opcode is implemented. Others will be added after this one goes in. Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
b14329f90f725af0f67c45dfcb94933a426d63ce |
|
15-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix MonitorExit code on ARM We do not emit barriers on non-SMP systems. But on ARM, we have places that need to conditionally execute, which is done through an IT instruction. The guide of said instruction thus changes between SMP and non-SMP systems. To cleanly approach this, change the API so that GenMemBarrier returns whether it generated an instruction. ARM will have to query the result and update any dependent IT. Throw a build system error if TARGET_CPU_SMP is not set. Fix runtime/Android.mk to work with new multilib host. Bug: 14989275 Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
|
c93ac8b73b5772e43b6dd1cc9e1deee79ca68849 |
|
13-May-2014 |
Vladimir Marko <vmarko@google.com> |
Fix special getter/setter to use RegClassForFieldLoadStore(). This ensures correct register class is used for volatile load/store in these getters and setters. Bug: 14112919 Change-Id: Ib7aa83d441fb007e97f9acc2a778bc20ffed837c
|
9ee801f5308aa3c62ae3bedae2658612762ffb91 |
|
12-May-2014 |
Dmitry Petrochenko <dmitry.petrochenko@intel.com> |
Add x86_64 code generation support Utilizes r0..r7 in register allocator, implements spill/unsill core regs as well as operations with stack pointer. Change-Id: I973d5a1acb9aa735f6832df3d440185d9e896c67 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
|
2f244e9faccfcca68af3c5484c397a01a1c3a342 |
|
08-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Add more ThreadOffset in Mir2Lir and backends This duplicates all methods with ThreadOffset parameters, so that both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic checks against the compilation unit's instruction set determine which pointer size to use and therefore which methods to call. Methods with unsupported pointer sizes should fatally fail, as this indicates an issue during method selection. Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
|
ba57451494946a128703e1cbd8bf5969ee8dc598 |
|
13-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: fix compile-time perf regression The recent changes to the temp register liveness tracking introduced a measureable compile-time performance regression. This CL cleans it up. Change-Id: Id698b93e957f0ecab7ddfab94727f85e49cf10cf
|
0dc242d6fc1254e6ca1c31e08e612bbf45644b17 |
|
12-May-2014 |
Vladimir Marko <vmarko@google.com> |
Avoid unnecessary copy/load in EvalLoc() and LoadValue(). EvalLoc()/EvalLocWide() are used to prepare a register where a value is subsequently stored, so they shouldn't copy the old value to the new register for register class mismatch. The only exception where we actually need a copy is LoadValue()/LoadValueWide(), so we inline the old code that makes the copy there. We also avoid loading inexpensive constants when the value is already in the register. Change-Id: I07519e9d4d9b3f7272233d196435f3035e4a3ca9
|
30adc7383a74eb3cb6db3bf42cea3a5595055ce1 |
|
10-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler: Fix liveness tracking Rework temp register liveness tracking to play nicely with aliased physical registers, and re-enable liveness tracking optimization. Add a pair of x86 utility routines that act like UpdateLoc(), but only show in-register live temps if they are of the expected register class. Change-Id: I92779e0da2554689103e7488025be281f1a58989
|
674744e635ddbdfb311fbd25b5a27356560d30c3 |
|
24-Apr-2014 |
Vladimir Marko <vmarko@google.com> |
Use atomic load/store for volatile IGET/IPUT/SGET/SPUT. Bug: 14112919 Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
|
e45fb9e7976c8462b94a58ad60b006b0eacec49f |
|
06-May-2014 |
Matteo Franchin <matteo.franchin@arm.com> |
AArch64: Change arm64 backend to produce A64 code. The arm backend clone is changed to produce A64 code. At the moment this backend can only compile simple methods (both leaf and non-leaf). Most of the work on the assembler (assembler_arm64.cc) has been done. Some work on the LIR generation layer (functions such as OpRegRegImm & friends) is still necessary. The register allocator still needs to be adapted to the A64 instruction set (it is mostly unchanged from the arm backend). Offsets for helpers in gen_invoke.cc still need to be changed to work on 64-bit. Change-Id: I388f99eeb832857981c7d9d5cb5b71af64a4b921
|
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 |
|
07-May-2014 |
Vladimir Marko <vmarko@google.com> |
Cleanup ARM load/store wide and remove unused param s_reg. Use a single LDRD/VLDR instruction for wide load/store on ARM, adjust the base pointer if needed. Remove unused parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp() and StoreBaseIndexedDisp() on all architectures. Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
|
455759b5702b9435b91d1b4dada22c4cce7cae3c |
|
06-May-2014 |
Vladimir Marko <vmarko@google.com> |
Remove LoadBaseDispWide and StoreBaseDispWide. Just pass k64 or kDouble to non-wide versions. Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 |
|
25-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
|
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6 |
|
29-Apr-2014 |
buzbee <buzbee@google.com> |
Quick compiler: debugging assists A few minor assists to ease A/B debugging in the Quick compiler: 1. To save time, the assemblers for some targets only update the object code offsets on instructions involved with pc-relative fixups. We add code to fix up all offsets when doing a verbose codegen listing. 2. Temp registers are normally allocated in a round-robin fashion. When disabling liveness tracking, we now reset the round-robin pool to 0 on each instruction boundary. This makes it easier to spot real codegen differences. 3. Self-register copies were previously emitted, but marked as nops. Minor change to avoid generating them in the first place and reduce clutter. Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
|
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb |
|
19-Apr-2014 |
buzbee <buzbee@google.com> |
Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
3a74d15ccc9a902874473ac9632e568b19b91b1c |
|
22-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Delete throw launchpads. Bug: 13170824 Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
|
80365d9bb947edef0eae0bfe62b9f7a239416e6b |
|
18-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Revert "Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException."" This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException. And fix the host test crash. Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
|
7fff544c38f0dec3a213236bb785c3ca13d21a0f |
|
18-Apr-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException." This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
|
9d46314a309aff327f9913789b5f61200c162609 |
|
18-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing ArrayOutOfBoundsException. Get rid of launchpads for throwing ArrayOutOfBoundsException and use LIRSlowPath instead. Bug: 13170824 Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
|
e643a179cf5585ba6bafdd4fa51730d9f50c06f6 |
|
08-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing NPE. Get rid of launchpads for throwing NPE and use LIRSlowPath instead. Also clean up some code of using LIRSlowPath for checking div by zero. Bug: 13170824 Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
|
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b |
|
10-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
f9487c039efb4112616d438593a2ab02792e0304 |
|
09-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
|
4289456fa265b833434c2a8eee9e7a16da31c524 |
|
07-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Use LIRSlowPath for throwing div by zero exception. Get rid of launchpads for throwing div by zero exception and use LIRSlowPath instead. Add a CallRuntimeHelper that takes no argument for the runtime function. Bug: 13170824 Change-Id: I7e0563e736c6f92bd63e3fbdfe3a777ad333e338
|
081f73e888b3c246cf7635db37b7f1105cf1a2ff |
|
07-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
754ddad084ccb610d0cf486f6131bdc69bae5bc6 |
|
19-Feb-2014 |
Dave Allison <dallison@google.com> |
Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
3da67a558f1fd3d8a157d8044d521753f3f99ac8 |
|
03-Apr-2014 |
Dave Allison <dallison@google.com> |
Add OpEndIT() for marking the end of OpIT blocks In ARM we need to prevent code motion to the inside of an IT block. This was done using a GenBarrier() to mark the end, but it wasn't obvious that this is what was happening. This CL adds an explicit OpEndIT() that takes the LIR of the OpIT for future checks. Bug: 13751744 Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
|
dd7624d2b9e599d57762d12031b10b89defc9807 |
|
15-Mar-2014 |
Ian Rogers <irogers@google.com> |
Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
|
f943914730db8ad2ff03d49a2cacd31885d08fd7 |
|
27-Mar-2014 |
Dave Allison <dallison@google.com> |
Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
|
306f017dd883c0bf806d239d97e0bca3194afbd7 |
|
07-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Faster AssembleLIR for ARM. This also reduces sizeof(LIR) by 4 bytes (32-bit builds). Change-Id: I0cb81f9bf098dfc50050d5bc705c171af26464ce
|
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 |
|
28-Mar-2014 |
Ian Rogers <irogers@google.com> |
Revert "Revert "Optimize easy multiply and easy div remainder."" This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088. Remove the part of the change that confused !is_div with being multiply rather than implying remainder. Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
|
9da5c1013215176f2a4dbe7a804be899e12d5f68 |
|
28-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler, MIPS resource cleanup MIPS architecture includes internal registers HI and LO. Similar to condition codes in other architectures, these internal resouces must be accounted for during instruction scheduling. Previously, the Quick backend for MIPS dealt with them by defining rHI and rLO pseudo registers - treating them as actual registers for def/use masks. This CL changes the handling of these resources to be in line with how condition codes are used elsewhere - leaving register definitions to be used for registers. Change-Id: Idcd77f3107b0c9b081ad05b1aab663fb9f41492d
|
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb. (cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088) Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
|
3654a6f50a948ead89627f398aaf86a2c2db0088 |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
|
262b299abf658c16f61dad2240cfaf3deafe4423 |
|
27-Mar-2014 |
buzbee <buzbee@google.com> |
Fix x86 master build failure. Replace bogus DCHECKs with logic matching pre-cleanup code. Register pairs are considered temp, promoted, dirty or live if either register of the pair meets criteria. Change-Id: If2df891fdd1e3351d4cbe72aaf2a2ac5b34b2110
|
14a46d820b04b848063f7c32ecd2cf82dd90cb1d |
|
27-Mar-2014 |
buzbee <buzbee@google.com> |
Fix x86 master build failure. Replace bogus DCHECKs with logic matching pre-cleanup code. Register pairs are considered temp, promoted, dirty or live if either register of the pair meets criteria. Change-Id: If2df891fdd1e3351d4cbe72aaf2a2ac5b34b2110
|
08df4b3da75366e5db37e696eaa7e855cba01deb |
|
25-Mar-2014 |
Zheng Xu <zheng.xu@arm.com> |
Optimize easy multiply and easy div remainder. Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters. Add special cases for *0 and *1. Add more easy multiply special cases for Arm. Reuse easy multiply in SmallLiteralDivRem() to support remainder cases. Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
99ad7230ccaace93bf323dea9790f35fe991a4a2 |
|
26-Feb-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Relaxed memory barriers for x86 X86 provides stronger memory guarantees and thus the memory barriers can be optimized. This patch ensures that all memory barriers for x86 are treated as scheduling barriers. And in cases where a barrier is needed (StoreLoad case), an mfence is used. Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
60d7a65f7fb60f502160a2e479e86014c7787553 |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d (cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
|
c0f96d03a1855fda7d94332331b94860404874dd |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
|
e90501da0222717d75c126ebf89569db3976927e |
|
12-Mar-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
Add dependency for operations with x86 FPU stack Load Hoisting optimization can re-order operations with FPU stack due to no dependency set. Patch adds resource dependency between these operations. Change-Id: Iccce98c8f3c565903667c03803884d9de1281ea8 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 |
|
21-Feb-2014 |
Dave Allison <dallison@google.com> |
Implicit null/suspend checks (oat version bump) This adds the ability to use SEGV signals to throw NullPointerException exceptions from Java code rather than having the compiler generate explicit comparisons and branches. It does this by using sigaction to trap SIGSEGV and when triggered makes sure it's in compiled code and if so, sets the return address to the entry point to throw the exception. It also uses this signal mechanism to determine whether to check for thread suspension. Instead of the compiler generating calls to a function to check for threads being suspended, the compiler will now load indirect via an address in the TLS area. To trigger a suspend, the contents of this address are changed from something valid to 0. A SIGSEGV will occur and the handler will check for a valid instruction pattern before invoking the thread suspension check code. If a user program taps SIGSEGV it will prevent our signal handler working. This will cause a failure in the runtime. There are two signal handlers at present. You can control them individually using the flags -implicit-checks: on the runtime command line. This takes a string parameter, a comma separated set of strings. Each can be one of: none switch off null null pointer checks suspend suspend checks all all checks So to switch only suspend checks on, pass: -implicit-checks:suspend There is also -explicit-checks to provide the reverse once we change the default. For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar The default is -implicit-checks:none There is also a property 'dalvik.vm.implicit_checks' whose value is the same string as the command option. The default is 'none'. For example to switch on null checks using the option: setprop dalvik.vm.implicit_checks null It only works for ARM right now. Bumps OAT version number due to change to Thread offsets. Bug: 13121132 Change-Id: If743849138162f3c7c44a523247e413785677370
|
3bc8615332b7848dec8c2297a40f7e4d176c0efb |
|
13-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Use LIRSlowPath for intrinsics, improve String.indexOf(). Rewrite intrinsic launchpads to use the LIRSlowPath. Improve String.indexOf for constant chars by avoiding the check for code points over 0xFFFF. Change-Id: I7fd5583214c5b4ab9c38ee36c5d6f003dd6345a8
|
49161cef10a308aedada18e9aa742498d6e6c8c7 |
|
12-Mar-2014 |
Jeff Hao <jeffhao@google.com> |
Allow patching between dex files in the boot classpath. Change-Id: I53f219a5382d0fcd580e96e50025fdad4fc399df
|
83cc7ae96d4176533dd0391a1591d321b0a87f4f |
|
12-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
|
a1a7074eb8256d101f7b5d256cda26d7de6ce6ce |
|
03-Mar-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite kMirOpSelect for all IF_ccZ opcodes. Also improve special cases for ARM and add tests. Change-Id: I06f575b9c7b547dbc431dbfadf2b927151fe16b9
|
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 |
|
28-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Revert "Rework Quick compiler's register handling"" This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace. Ready. Fixed the original type, plus some mechanical changes for rebasing. Still needs additional testing, but the problem with the original CL appears to have been a typo in the definition of the x86 double return template RegLocation. Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
|
be0e546730e532ef0987cd4bde2c6f5a1b14dd2a |
|
26-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Cache field lowering info in mir_graph. Change-Id: I9f9d76e3ae6c31e88bdf3f59820d31a625da020f
|
ae9fd93c39a341e2dffe15c61cc7d9e841fa92c4 |
|
11-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Tell GDB about Quick ART generated code This is actually a lot of work. To do this, we need: .debug_info .debug_abbrev .debug_frame .debug_str These are generated into the OAT file by OatWriter and ElfWriterQuick. Since the Quick ART runtime doesn't use dlopen to load the OAT files, GDB can't find this information. Use the alternate GDB JIT interface, which can be invoked at runtime. To use this interface, an ELF image needs to be built in memory. Read the information from the OAT file, fixup the addresses to point to the real locations, add a symbol table to hold the .text symbol, and then let GDB know about the information, which will be read from the runtime address space. This is quite primitive now, and could be cleaned up considerably. It probably needs symbol table entries for the methods, and descriptions of parameters and return types. Currently only supported for X86. This defaults to enabled for debug builds. Added dexoat --gen-gdb-info and --no-gen-gdb-info flags to override. Change-Id: I4d18b2370f6dfaa00c8cc1925f10717be3bd1a62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
a1ce1fef2d49d1d537776a5308ace7102a815fe5 |
|
25-Feb-2014 |
Brian Carlstrom <bdc@google.com> |
Split up CommonTest into CommonRuntimeTest and CommonCompilerTest Change-Id: I8dcf6b29a5aecd445f1a3ddb06386cf81dbc9c70
|
86ec520fc8b696ed6f164d7b756009ecd6e4aace |
|
26-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Rework Quick compiler's register handling" This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c. Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
|
2c1ed456dcdb027d097825dd98dbe48c71599b6c |
|
20-Feb-2014 |
buzbee <buzbee@google.com> |
Rework Quick compiler's register handling For historical reasons, the Quick backend found it convenient to consider all 64-bit Dalvik values held in registers to be contained in a pair of 32-bit registers. Though this worked well for ARM (with double-precision registers also treated as a pair of 32-bit single-precision registers) it doesn't play well with other targets. And, it is somewhat problematic for 64-bit architectures. This is the first of several CLs that will rework the way the Quick backend deals with physical registers. The goal is to eliminate the "64-bit value backed with 32-bit register pair" requirement from the target-indendent portions of the backend and support 64-bit registers throughout. The key RegLocation struct, which describes the location of Dalvik virtual register & register pairs, previously contained fields for high and low physical registers. The low_reg and high_reg fields are being replaced with a new type: RegStorage. There will be a single instance of RegStorage for each RegLocation. Note that RegStorage does not increase the space used. It is 16 bits wide, the same as the sum of the 8-bit low_reg and high_reg fields. At a target-independent level, it will describe whether the physical register storage associated with the Dalvik value is a single 32 bit, single 64 bit, pair of 32 bit or vector. The actual register number encoding is left to the target-dependent code layer. Because physical register handling is pervasive throughout the backend, this restructuring necessarily involves large CLs with lots of changes. I'm going to roll these out in stages, and attempt to segregate the CLs with largely mechanical changes from those which restructure or rework the logic. This CL is of the mechanical change variety - it replaces low_reg and high_reg from RegLocation and introduces RegStorage. It also includes a lot of new code (such as many calls to GetReg()) that should go away in upcoming CLs. The tentative plan for the subsequent CLs is: o Rework standard register utilities such as AllocReg() and FreeReg() to use RegStorage instead of ints. o Rework the target-independent GenXXX, OpXXX, LoadValue, StoreValue, etc. routines to take RegStorage rather than int register encodings. o Take advantage of the vector representation and eliminate the current vector field in RegLocation. o Replace the "wide" variants of codegen utilities that take low_reg/high_reg pairs with versions that use RegStorage. o Add 64-bit register target independent codegen utilities where possible, and where not virtualize with 32-bit general register and 64-bit general register variants in the target dependent layer. o Expand/rework the LIR def/use flags to allow for more registers (currently, we lose out on 16 MIPS floating point regs as well as ARM's D16..D31 for lack of space in the masks). o [Possibly] move the float/non-float determination of a register from the target-dependent encoding to RegStorage. In other words, replace IsFpReg(register_encoding_bits). At the end of the day, all code in the target independent layer should be using RegStorage, as should much of the target dependent layer. Ideally, we won't be using the physical register number encoding extracted from RegStorage (i.e. GetReg()) until the NewLIRx() layer. Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
|
9c86a0279aaf953377aa9e2277592e68bf814989 |
|
21-Feb-2014 |
Ian Rogers <irogers@google.com> |
Revert "Annotate used fields." This reverts commit 7f6cf56942c8469958b273ea968db253051c5b05. Change-Id: Ic389a194c3404ecb5bb563a405bf4a0d6336ea0d
|
4028a6c83a339036864999fdfd2855b012a9f1a7 |
|
20-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Inline x86 String.indexOf Take advantage of the presence of a constant search char or start index to tune the generated code. Change-Id: I0adcf184fb91b899a95aa4d8ef044a14deb51d88 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
7f6cf56942c8469958b273ea968db253051c5b05 |
|
29-Jan-2014 |
Vladimir Marko <vmarko@google.com> |
Annotate used fields. Annotate all fields used by a method early during the compilation, check acces rights and record field offset, volatility, etc. Use these annotations when generating code for IGET/IPUT/SGET/SPUT instructions. Change-Id: I4bbf5cca4fecf53c9bf9c93ac1793e2f40c16b5f
|
818f2107e6d2d9e80faac8ae8c92faffa83cbd11 |
|
18-Feb-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Re-apply: Initial check-in of an optimizing compiler. The classes and the names are very much inspired by V8/Dart. It currently only supports the RETURN_VOID dex instruction, and there is a pretty printer to check if the building of the graph is correct. Change-Id: I28e125dfee86ae6ec9b3fec6aa1859523b92a893
|
1af0c0b88a956813eb0ad282664cedc391e2938f |
|
19-Feb-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Initial check-in of an optimizing compiler." g++ warnings turned into errors. This reverts commit 68a5fefa90f03fdf5a238ac85c9439c6b03eae96. Change-Id: I09bb95d9cc13764ca8a266c41af04801a34b9fd0
|
68a5fefa90f03fdf5a238ac85c9439c6b03eae96 |
|
18-Feb-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Initial check-in of an optimizing compiler. The classes and the names are very much inspired by V8/Dart. It currently only supports the RETURN_VOID dex instruction, and there is a pretty printer to check if the building of the graph is correct. Change-Id: Id5ef1b317ab997010d4e3888e456c26bef1ab9c0
|
3bc01748ef1c3e43361bdf520947a9d656658bf8 |
|
06-Feb-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
GenSpecialCase support for x86 Moved GenSpecialCase from being ARM specific to common code to allow it to be used by x86 quick as well. Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
614c2b4e219631e8c190fd9fd5d4d9cd343434e1 |
|
29-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Support to generate inline long to FP bytecodes for x86 long-to-float and long-to-double are now generated inline instead of calling a helper routine. The conversion is done by using x87. Change-Id: I196e526afec1be212898baceca8527549c3655b6 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
6607d97166984ce578817269f9775c15b9044190 |
|
10-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Tweak Mir2Lir::GenInstanceofCallingHelper for X86 Make this virtual, and split out the X86 logic. Take advantage of SETcc instruction for X86. I don't think I can do much more due to need to preserve arguments for the calls. Change-Id: I10e3eaa61b61ceac384267e3078bb6f75c37cee4 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
55d0eac918321e0525f6e6491f36a80977e0d416 |
|
06-Feb-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Support Direct Method/Type access for X86 Thumb generates code to optimize calls to methods within core.oat. Implement this for X86 as well, but take advantage of mov with 32 bit immediate and call relative with 32 bit immediate. Fix some incorrect return locations for long inlines. Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
dbb17e378b538133750e56375bbdbb217db7b248 |
|
07-Feb-2014 |
Yixin Shou <yixin.shou@intel.com> |
Added inlined abs method with float and double type This patch added the implementation for inlining java.lang.Math.abs() method with float and double type. Change-Id: Ic99471b4ab4176e4a0153bef383bb49944fb636f Signed-off-by: Yixin Shou <yixin.shou@intel.com>
|
2c498d1f28e62e81fbdb477ff93ca7454e7493d7 |
|
30-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Specializing x86 range argument copying The ARM implementation of range argument copying was specialized in some cases. For all other architectures, it would fall back to generating memcpy. This patch updates the x86 implementation so it does not call memcpy and instead generates loads and stores, favoring movement of 128-bit chunks. Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
bcec6fba95ee7974d3f7b81c3c02e7eb3ca3df00 |
|
17-Jan-2014 |
Dave Allison <dallison@google.com> |
Make slow paths easier to write This adds a class LIRSlowPath that allows for deferred compilation of slow paths. Using this object you can add code that will be invoked out of line using a forward branch. The intention is to move the slow paths out of the main flow and avoid branch-over constructs that will almost always trigger. The forward branch to the slow path code will be predicted false and this will be correct most of the time. The slow path code returns to the instruction after the original branch using an unconditional branch. This is used in the following opcodes: sput, sget, const-string, check-cast, const-class. Others will follow. Bug: 10864890 Change-Id: I17130c5dc20d369bc6bbf50b8cf04343263e888e
|
feb2b4e2d1c6538777bb80b60f3a247537b6221d |
|
28-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Redo x86 int arithmetic Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it to allow use of memory operands and knowledge of the fact that x86 has (mostly) two operand instructions. Remove x86 specific code from the generic version. Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide cases. Add some x86 helper routines to simplify generation. Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
da7a69b3fa7bb22d087567364b7eb5a75824efd8 |
|
09-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Enable compiler temporaries Compiler temporaries are a facility for having virtual register sized space for dealing with intermediate values during MIR transformations. They receive explicit space in managed frames so they can have a home location in case they need to be spilled. The facility also supports "special" temporaries which have specific semantic purpose and their location in frame must be tracked. The compiler temporaries are treated in the same way as virtual registers so that the MIR level transformations do not need to have special logic. However, generated code needs to know stack layout so that it can distinguish between home locations. MIRGraph has received an interface for dealing with compiler temporaries. This interface allows allocation of wide and non-wide virtual register temporaries. The information about how temporaries are kept on stack has been moved to stack.h. This is was necessary because stack layout is dependent on where the temporaries are placed. Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
df8ee2ea9908db3dde463fed68391b0040517653 |
|
28-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
x86 updates GenInlinedUnsafePut/GenInstanceofFinal Allow x86 to inline GenInlinedUnsafePut by freeing up a temporary register early. Make an x86 specific version of GenInstanceofFinal that uses compare to memory and a setCC instruction. Change-Id: I67788d7ae83776b0b9069fe4b379452190774992 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
766e9295d2c34cd1846d81610c9045b5d5093ddd |
|
27-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve GenConstString, GenS{get,put} for x86 Rewrite GenConstString for x86 to skip calling ResolveString when the string is already resolved. Also try to avoid a register copy if the Method* is in a promoted register. Implement the TODO for GenS{get,put} to use compare to memory for x86 by adding a new codegen function to compare directly to memory. Implement a default implementation that uses a temporary register for RISC architectures. Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bb8f0ab736b61db8f543e433859272e83f96ee9b |
|
28-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Embed array class pointers at array allocation sites. Following https://android-review.googlesource.com/#/c/79302, embed array class pointers at array allocation sites in the compiled code. Change-Id: I67a1292466dfbb7f48e746e5060e992dd93525c5
|
e27b3bf2c1044bfbfbe874affd3758a73009c6c6 |
|
23-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Support GenSelect for x86 kMirOpSelect is an extended MIR that has been generated in order to remove trivial diamond shapes where the conditional is an if-eqz or if-nez and on each of the paths there is a move or const bytecode with same destination register. This patch enables x86 to generate code for this extended MIR. A) Handling the constant specialization of kMirOpSelect: 1) When the true case is zero and result_reg is not same as src_reg: xor result_reg, result_reg cmp $0, src_reg mov t1, $false_case cmovnz result_reg, t1 2) When the false case is zero and result_reg is not same as src_reg: xor result_reg, result_reg cmp $0, src_reg mov t1, $true_case cmovz result_reg, t1 3) All other cases (we do compare first to set eflags): cmp $0, src_reg mov result_reg, $true_case mov t1, $false_case cmovnz result_reg, t1 B) Handling the move specialization of kMirOpSelect: 1) When true case is already in place: cmp $0, src_reg cmovnz result_reg, false_reg 2) When false case is already in place: cmp $0, src_reg cmovz result_reg, true_reg 3) When neither cases are in place: cmp $0, src_reg mov result_reg, true_reg cmovnz result_reg, false_reg Change-Id: Ic7c50823208fe82019916476a0a77c6a271679fe Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
4708dcd68eebf1173aef1097dad8ab13466059aa |
|
22-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve x86 long multiply and shifts Generate inline code for long shifts by constants and do long multiplication inline. Convert multiplication by a constant to a shift when we can. Fix some x86 assembler problems and add the new instructions that were needed (64 bit shifts). Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
2bf31e67694da24a19fc1f328285cebb1a4b9964 |
|
23-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Improve x86 long divide Implement inline division for literal and variable divisors. Use the general case for dividing by a literal by using a double length multiply by the appropriate constant with fixups. This is the Hacker's Delight algorithm. Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
be1ca55db3362f5b100c4c65da5342fd299520bb |
|
15-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Use direct class pointers at allocation sites in the compiled code. - Rather than looking up a class from its type ID (and checking if it's resolved/initialized, resolving/initializing if not), use direct class pointers, if possible (boot-code-to-boot-class pointers and app-code-to-boot-class pointers.) - This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4. - Embedding the object size (along with class pointers) caused a 1-2% slowdown in MemAllocTest and isn't implemented in this change. - TODO: do the same for array allocations. - TODO: when/if an application gets its own image, implement app-code-to-app-class pointers. - Fix a -XX:gc bug. cf. https://android-review.googlesource.com/79460/ - Add /tmp/android-data/dalvik-cache to the list of locations to remove oat files in clean-oat-host. cf. https://android-review.googlesource.com/79550 - Add back a dropped UNLIKELY in FindMethodFromCode(). cf. https://android-review.googlesource.com/74205 Bug: 9986565 Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
|
e02d48fb24747f90fd893e1c3572bb3c500afced |
|
15-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Optimize x86 long arithmetic Be smarter about taking advantage of a constant operand for x86 long add/sub/and/or/xor. Using instructions with immediates and generating results directly into memory reduces the number of temporary registers and avoids hardcoded register usage. Also rewrite the existing non-const x86 arithmetic to avoid fixed register use, and use the fact that x86 instructions are two operand. Pass the opcode to the XXXLong() routines to easily detect two operand DEX opcodes. Add a new StoreFinalValueWide() routine, which is similar to StoreValueWide, but doesn't do an EvalLoc to allocate registers. The src operand must already be in registers, and it just updates the dest location, and calls the right live/dirty routines to get the src into the dest properly. Change-Id: Iefc16e7bc2236a73dc780d3d5137ae8343171f62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
d61ba4ba6fcde666adb5d5c81b1c32f0534fb2c8 |
|
13-Jan-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Revert "Better support for x86 XMM registers"" This reverts commit 8ff67e3338952c70ccf3b609559bf8cc0f379cfd. Fix applied to loc.fp usage. Change-Id: I1eb3005392544fcf30c595923ed25bcee2dc4859
|
8ff67e3338952c70ccf3b609559bf8cc0f379cfd |
|
11-Jan-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Better support for x86 XMM registers" The invalid usage of loc.fp must be corrected before this change can be submitted. This reverts commit 766a5e5940b469ab40e52770862c81cfec1d835b. Change-Id: I1173a9bf829da89cccd9c2898f5e11164987a22b
|
766a5e5940b469ab40e52770862c81cfec1d835b |
|
10-Jan-2014 |
Mark Mendell <mark.p.mendell@intel.com> |
Better support for x86 XMM registers Currently, ART Quick mode assumes that a double FP register is composed of two single consecutive FP registers. This is true for ARM and MIPS, but not x86. This means that only half of the 8 XMM registers are available for use by x86 doubles. This patch breaks the assumption that a wide FP RegisterLocation must be a paired set of FP registers. This is done by making some routines in common code virtual and overriding them in the X86Mir2Lir class. For these wide fp locations, the high register is set to the same value as the low register, in order to minimize changes to common code. In a couple of places, the common code checks for this case. The changes are also supposed to allow the possibility of using the XMM registers for vector operations,but that support is still WIP. Change-Id: Ic6ef24ea764991c6f4d9fb88d483a619f5a468cb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
bd288c2c1206bc99fafebfb9120a83f13cf9723b |
|
21-Dec-2013 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Add conditional move support to x86 and allow GenMinMax to use it X86 supports conditional moves which is useful for reducing branchiness. This patch adds support to the x86 backend to generate conditional reg to reg operations. Both encoder and decoder support was added for cmov. The x86 version of GenMinMax used for generating inlined version Math.min/max has been updated to make use of the conditional move support. Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
090dd4489eeffb5f10051a5d9c1ed71b0a6bc4b9 |
|
20-Dec-2013 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Eliminate redundant x86 compare for GenDivZeroCheck For x86, the ALU operations on general purpose registers update the flags. Thus, when generating the zero check for divide/remainder operations, the compare is not needed. Change-Id: I07bfdf7d5491d3e3e9d98a932472d7f18d5b46d3 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
5816ed48bc339c983b40dc493e96b97821ce7966 |
|
27-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Detect special methods at the end of verification. This moves special method handling to method inliner and prepares for eventual inlining of these methods. Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
|
e13717e796d338b08ea66f6a7e3470ca44de707f |
|
20-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Per-DexFile locking for inliner initialization. And clean up lock and compiler driver naming. Change-Id: I1562c7f55c4b0174a36007ba6199360da06169ff
|
31c2aac7137b69d5622eea09597500731fbee2ef |
|
09-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Rename ClobberCalleeSave to *Caller*, fix it for x86. Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
|
06606b9c4a1c00154ed15f719ad8ea994e54ee8e |
|
02-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Performance improvement for mapping table creation. Avoid the raw mapping tables altogether. Change-Id: I6d1c786325d369e899a75f15701edbafdd14363f
|
70b797d998f2a28e39f7d6ffc8a07c9cbc47da14 |
|
03-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Unsafe.compareAndSwapLong() intrinsic for x86. Change-Id: Idbc5371a62dfdd84485a657d4548990519200205
|
1e6cb63d77090ddc6aa19c755d7066f66e9ff87e |
|
28-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Delta-encoding of mapping tables. Both PC offsets and dalvik offsets are delta-encoded. Since PC offsets are increasing, the deltas are then compressed as unsigned LEB128. Dalvik offsets are not monotonic, so their deltas are compressed as signed LEB128. This reduces the size of the mapping tables by about 30% on average, 25% from the PC offset and 5% from the dalvik offset delta encoding. Bug: 9437697 Change-Id: I600ab9c22dec178088d4947a811cca3bc8bd4cf4
|
3e5af82ae1a2cd69b7b045ac008ac3b394d17f41 |
|
21-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Intrinsic Unsafe.CompareAndSwapLong() for ARM. (cherry picked from cb53fcd79b1a5ce608208ec454b5c19f64aaba37) Change-Id: Iadd3cc8b4ed390670463b80f8efd579ce6ece226
|
1c282e2b9a9b432e132b2c332f861cad9feb4a73 |
|
21-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Refactor intrinsic CAS, prepare for 64-bit version. Bug: 11391018 Change-Id: Ic0f740e0cd0eb47f2c915f81be02f52f7721f8a3
|
5c96e6b4dc354a7439b211b93462fbe8edea5e57 |
|
14-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Rewrite intrinsics detection. Intrinsic methods should be treated as a special case of inline methods. They should be detected early and used to guide other optimizations. This CL rewrites the intrinsics detection so that it can be moved to any compilation phase. Change-Id: I4424a6a869bd98b9c478953c9e3bcaf1c6de2b33
|
e508a2090b19fe705fbc6b99d76474037a74bbfb |
|
04-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Fix unaligned Memory peek/poke intrinsics. Change-Id: Id454464d0b28aa37f5239f1c6589ceb0b3bbbdea
|
65636e5de2375839e29e3e19ee7a7db737901cf0 |
|
24-Oct-2013 |
Vladimir Marko <vmarko@google.com> |
Add intrinsics for Memory peek/poke. Add intrinsics for single memory access (non-array) peek/poke methods in libcore.io.Memory. Change-Id: I5d66a5b14ea89875d8afb8252eb293f7d637b83f
|
6bdf1fff5f841f3997d4b488f00647f7aa2cdaa3 |
|
29-Oct-2013 |
Vladimir Marko <vmarko@google.com> |
Add intrinsics for {Short,Int,Long}.reverseBytes(). Change-Id: I34a2ec642f59fc4ff18aed59769a9e8d7e361098
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
409fe94ad529d9334587be80b9f6a3d166805508 |
|
11-Oct-2013 |
buzbee <buzbee@google.com> |
Quick assembler fix This CL re-instates the select pattern optimization disabled by CL 374310, and fixes the underlying problem: improper handling of the kPseudoBarrier LIR opcode. The bug was introduced in the recent assembler restructuring. In short, LIR pseudo opcodes (which have values < 0), should always have size 0 - and thus cause no bits to be emitted during assembly. In this case, bad logic caused us to set the size of a kPseudoBarrier opcode via lookup through the EncodingMap. Because all pseudo ops are < 0, this meant we did an array underflow load, picking up whatever garbage was located before the EncodingMap. This explains why this error showed up recently - we'd previuosly just gotten a lucky layout. This CL corrects the faulty logic, and adds DCHECKs to uses of the EncodingMap to ensure that we don't try to access w/ a pseudo op. Additionally, the existing is_pseudo_op() macro is replaced with IsPseudoLirOp(), named similar to the existing IsPseudoMirOp(). Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
|
a9a8254c920ce8e22210abfc16c9842ce0aea28f |
|
04-Oct-2013 |
Ian Rogers <irogers@google.com> |
Improve quick codegen for aput-object. 1) don't type check known null. 2) if we know types in verify don't check at runtime. 3) if we're runtime checking then move all the code out-of-line. Also, don't set up a callee-save frame for check-cast, do an instance-of test then throw an exception if that fails. Tidy quick entry point of Ldivmod to Lmod which it is on x86 and mips. Fix monitor-enter/exit NPE for MIPS. Fix benign bug in mirror::Class::CannotBeAssignedFromOtherTypes, a byte[] cannot be assigned to from other types. Change-Id: I9cb3859ec70cca71ed79331ec8df5bec969d6745
|
d9c4fc94fa618617f94e1de9af5f034549100753 |
|
02-Oct-2013 |
Ian Rogers <irogers@google.com> |
Inflate contended lock word by suspending owner. Bug 6961405. Don't inflate monitors for Notify and NotifyAll. Tidy lock word, handle recursive lock case alongside unlocked case and move assembly out of line (except for ARM quick). Also handle null in out-of-line assembly as the test is quick and the enter/exit code is already a safepoint. To gain ownership of a monitor on behalf of another thread, monitor contenders must not hold the monitor_lock_, so they wait on a condition variable. Reduce size of per mutex contention log. Be consistent in calling thin lock thread ids just thread ids. Fix potential thread death races caused by the use of FindThreadByThreadId, make it invariant that returned threads are either self or suspended now. Code size reduction on ARM boot.oat 0.2%. Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%, nexus 4 speedup 2.09% on DeltaBlue. Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
|
b48819db07f9a0992a72173380c24249d7fc648a |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
|
d91d6d6a80748f277fd938a412211e5af28913b1 |
|
26-Sep-2013 |
Ian Rogers <irogers@google.com> |
Introduce Signature type to avoid string comparisons. Method resolution currently creates strings to then compare with strings formed from methods in other dex files. The temporary strings are purely created for the sake of comparisons. This change creates a new Signature type that represents a method signature but not as a string. This type supports comparisons and so can be used when searching for methods in resolution. With this change malloc is no longer the hottest method during dex2oat (now its memset) and allocations during verification have been reduced. The verifier is commonly what is populating the dex cache for methods and fields not declared in the dex file itself. Change-Id: I5ef0542823fbcae868aaa4a2457e8da7df0e9dae
|
c729a6b936d59562bd9fb830a595d9ff65dfd129 |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Improve promotion of double-precision regs Minor rework of the double allocation mechanism to more explicitly manage the allocation of preserved floating point single pairs as doubles. Change-Id: Id9db4b0e86e5ef54a5db587f367e00efdf7e98d6
|
bd663de599b16229085759366c56e2ed5a1dc7ec |
|
11-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: register/bb utilities This CL yeilds about a 4% improvement in the compilation phase of dex2oat (single-threaded; multi-threaded compilation is more difficult to accurately measure). The register utilities could stand to be completely rewritten, but this gets most of the easy benefit. Next up: the assembly phase. Change-Id: Ife5a474e9b1a6d9e501e888dda6749d34eb77e96
|
252254b130067cd7a5071865e793966871ae0246 |
|
09-Sep-2013 |
buzbee <buzbee@google.com> |
More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so *only* because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
|
56c717860df2d71d66fb77aa77f29dd346e559d3 |
|
06-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning Specialized the dataflow iterators and did a few other minor tweaks. Showing ~5% compile-time improvement in a single-threaded environment; less in multi-threaded (presumably because we're blocked by something else). Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
|
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 |
|
06-Sep-2013 |
Ian Rogers <irogers@google.com> |
Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
|
11b63d13f0a3be0f74390b66b58614a37f9aa6c1 |
|
27-Aug-2013 |
buzbee <buzbee@google.com> |
Quick compiler: division by literal fix The constant propagation optimization pass attempts to identify constants in Dalvik virtual registers and handle them more efficiently. The use of small constants in divison, though, was handled incorrectly in that the high level code correctly detected the use of a constant, but the actual code generation routine was only expecting the use of a special constant form opcode. see b/10503566 Change-Id: I88aa4d2eafebb2b1af1a1e88049f1845aefae261
|
96faf5b363d922ae91cf25404dee0e87c740c7c5 |
|
10-Aug-2013 |
Ian Rogers <irogers@google.com> |
Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a (cherry picked from commit 1809a72a66d245ae598582d658b93a24ac3bf01e)
|
468532ea115657709bc32ee498e701a4c71762d4 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
|
1809a72a66d245ae598582d658b93a24ac3bf01e |
|
10-Aug-2013 |
Ian Rogers <irogers@google.com> |
Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a
|
848871b4d8481229c32e0d048a9856e5a9a17ef9 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
|
7934ac288acfb2552bb0b06ec1f61e5820d924a4 |
|
26-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
|
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
|
9b7085a4e7c40e7fa01932ea1647a4a33ac1c585 |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint readability/braces issues Change-Id: I56b88956510077b0e13aad4caee8898313fab55b
|
df62950e7a32031b82360c407d46a37b94188fbb |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
|
0cd7ec2dcd8d7ba30bf3ca420b40dac52849876c |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/blank_line issues Change-Id: Ice937e95e23dd622c17054551d4ae4cebd0ef8a2
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
fc0e3219edc9a5bf81b166e82fd5db2796eb6a0d |
|
17-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix multiple inclusion guards to match new pathnames Change-Id: Id7735be1d75bc315733b1773fba45c1deb8ace43
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|