Cross Reference: /art/compiler/dex/quick/mir_to

History log of /art/compiler/dex/quick/mir_to_lir.h
Revision	Date	Author	Comments
b7fd412dd21eb362931b3a0716c94fd189a66295	04-Jun-2015	Vladimir Marko <vmarko@google.com>	Revert "Quick: Create GC map based on compiler data. DO NOT MERGE" This reverts commit 7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7. Change-Id: Iadb4462bf8e834c6a847c01ee6eb332a325de22c
c8d000a12d853a72999c96e3b73587bad2be6954	04-Jun-2015	Vladimir Marko <vmarko@google.com>	Revert "Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE" This reverts commit fad2cbf97c71b9742ccd88cc1a5ba13fa918e677. Change-Id: I175dd9e49014b71a300d987678032bd624a99cf1
fad2cbf97c71b9742ccd88cc1a5ba13fa918e677	25-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Fix "select" pattern to update data used for GC maps. DO NOT MERGE Follow-up to https://android-review.googlesource.com/143222 (cherry picked from commit 6e07183e822a32856da9eb60006989496e06a9cc) Change-Id: I916743c845d9568063cd6a4b2ef71e9cbc43dee8
7cc8f9aa1349fd6cb0814a653ee2d1164a7fb9f7	20-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Create GC map based on compiler data. DO NOT MERGE The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 (cherry picked from commit 767c752fddc64e280dba507457e4f06002b5f678) Change-Id: Id75428fd0a2f6bdd2ccb20ce75cdeab01150e455
3d21bdf8894e780d349c481e5c9e29fe1556051c	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
41b175aba41c9365a1c53b8a1afbd17129c87c14	19-May-2015	Vladimir Marko <vmarko@google.com>	ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
848f70a3d73833fc1bf3032a9ff6812e429661d9	15-Jan-2014	Jeff Hao <jeffhao@google.com>	Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
0d22184ec9e5b1e958c031ac92c7f053de3a13a2	27-Apr-2015	Nicolas Geoffray <ngeoffray@google.com>	Revert "Revert "[optimizing] Replace FP divide by power of 2"" This reverts commit 067cae2c86627d2edcf01b918ee601774bc76aeb. Change-Id: Iaaa8772500ea7d3dce6ae0829dc0dc3bbc9c14ca
5ea536aa4a6414db01beaf6f8bd8cb9adc5cfc92	20-Apr-2015	Vladimir Marko <vmarko@google.com>	Remove ArtMethod* parameter from dex cache entry points. Load the ArtMethod* using an optimized stack walk instead. This reduces the size of the generated code. Three of the entry points are called only from a slow-path and the fourth (InitializeTypeAndVerifyAccess) is rare and already slow enough that the one or two extra loads (depending on whether we already have the ArtMethod* in a register) are insignificant. And as we're starting to use PC-relative addressing of the dex cache arrays (already done by Quick for the boot image), having the ArtMethod* in a register becomes less likely anyway. Change-Id: Ib19b9d204e355e13bf386662a8b158178bf8ad28
2cebb24bfc3247d3e9be138a3350106737455918	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
fac10700fd99516e8a14f751fe35553021ce6982	22-Apr-2015	Vladimir Marko <vmarko@google.com>	Quick: Remove broken Mir2Lir::LocToRegClass(). Its use in intrinsics has been bogus. In all other instances it's been used under the assumption that the inferred type matches the return type of associated calls. However, if the type inference identifies a type mismatch, the assumption doesn't hold and there isn't necessarily a valid value that the function could reasonably return. Bug: 19918641 Change-Id: I050934e6f9eb00427d0b888ee29ae9eeb509bb3f
1961b609bfefaedb71cee3651c4f931cc3e7393d	08-Apr-2015	Vladimir Marko <vmarko@google.com>	Quick: PC-relative loads from dex cache arrays on x86. Rewrite all PC-relative addressing on x86 and implement PC-relative loads from dex cache arrays. Don't adjust the base to point to the start of the method, let it point to the anchor, i.e. the target of the "call +0" insn. Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
1109fb3cacc8bb667979780c2b4b12ce5bb64549	07-Apr-2015	David Srbecky <dsrbecky@google.com>	Implement CFI for Quick. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
8c57831b2b07185ee1986b9af68a351e1ca584c3	07-Apr-2015	David Srbecky <dsrbecky@google.com>	Remove the old CFI infrastructure. Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
cc23481b66fd1f2b459d82da4852073e32f033aa	07-Apr-2015	Vladimir Marko <vmarko@google.com>	Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
3477307fdf93a1ef9a80d4e096125705c47e8024	07-Apr-2015	Vladimir Marko <vmarko@google.com>	Quick: Use PC-relative dex cache array loads for SGET/SPUT. Change-Id: I890284b73f69120ada5cf9b9ef4a717af3273cd2
20f85597828194c12be10d3a927999def066555e	19-Mar-2015	Vladimir Marko <vmarko@google.com>	Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
6e07183e822a32856da9eb60006989496e06a9cc	25-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Fix "select" pattern to update data used for GC maps. Follow-up to https://android-review.googlesource.com/143222 Change-Id: I1c12af9a19f76e64fd209f6cc2eaec5587b3083b
f6737f7ed741b15cfd60c2530dab69f897540735	23-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
767c752fddc64e280dba507457e4f06002b5f678	20-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Create GC map based on compiler data. The Quick compiler and verifier sometimes disagree on dalvik register types (fp/core/ref) for 0/null constants and merged registers involving 0/null constants. Since the verifier is more lenient it can mark a register as a reference for GC where Quick considers it a floating point register or a dead register (which would have a ref/fp conflict if not dead). If the compiler used an fp register to hold the zero value, the core register or stack location used by GC based on the verifier data can hold an invalid value. Previously, as a workaround we stored the fp zero value also in the stack location or core register where GC would look for it. This wasn't precise and may have missed some cases. To fix this properly, we now generate GC maps based on the compiler's notion of references if register promotion is enabled. Bug: https://code.google.com/p/android/issues/detail?id=147187 Change-Id: Id3a2f863b16bdb8969df7004c868773084aec421
0b40ecf156e309aa17c72a28cd1b0237dbfb8746	20-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Clean up slow paths. Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
22fe45de11ed7afdf21400d2de3abd23f3a62800	18-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Eliminate check-cast guaranteed by instance-of. Eliminate check-cast if the result of an instance-of with the very same type on the same value is used to branch to the check-cast's block or a dominator of it. Note that there already exists a verifier-based elimination of check-cast but it excludes check-cast on interfaces. This new optimization works for interface types and, since it's GVN-based, it can better recognize when the same reference is used for instance-of and check-cast. Change-Id: Ib315199805099d1cb0534bb4a90dc51baa409685
80b96d1a76790527f72a660ac03d9c215eed17ce	19-Feb-2015	Vladimir Marko <vmarko@google.com>	Replace a few std::vector with ArenaVector in Mir2Lir. Change-Id: I7867d60afc60f57cdbbfd312f02883854d65c805
b666f4805c8ae707ea6fd7f6c7f375e0b000dba8	18-Feb-2015	Mathieu Chartier <mathieuc@google.com>	Move arenas into runtime Moved arena pool into the runtime. Motivation: Allow GC to use arena allocators, recycle arena pool for linear alloc. Bug: 19264997 Change-Id: I8ddbb6d55ee923a980b28fb656c758c5d7697c2f
6ce3eba0f2e6e505ed408cdc40d213c8a512238d	16-Feb-2015	Vladimir Marko <vmarko@google.com>	Add suspend checks to special methods. Generate suspend checks at the beginning of special methods. If we need to call to runtime, go to the slow path where we create a simplified but valid frame, spill all arguments, call art_quick_test_suspend, restore necessary arguments and return back to the fast path. This keeps the fast path overhead to a minimum. Bug: 19245639 Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
e4fcc5ba2284c201c022b52d27f7a1201d696324	13-Feb-2015	Vladimir Marko <vmarko@google.com>	Clean up Scoped-/ArenaAlocator array allocations. Change-Id: Id718f8a4450adf1608306286fa4e6b9194022532
72f53af0307b9109a1cfc0671675ce5d45c66d3a	12-Nov-2014	Chao-ying Fu <chao-ying.fu@intel.com>	ART: Remove MIRGraph::dex_pc_to_block_map_ This patch removes MIRGraph::dex_pc_to_block_map_, adds a local variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and updates several functions to pass dex_pc_to_block_map. The goal is to limit the scope of dex_pc_to_block_map and the usage of FindBlock, so that various compiler optimizations cannot rely on dex pc to look up basic blocks to avoid duplicated dex pc issues. Also, this patch changes quick targets to use successor blocks for switch case target generation at Mir2Lir::InstallSwitchTables(). Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
9c462086269324350516b3394d478f1d71a4b5d1	27-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Even more Quick cleanup Remove Backend. Change-Id: I247cc65ccda6a362ba1a8f5e73e7f12ecd980a87
0b9203e7996ee1856f620f95d95d8a273c43a3df	23-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
f681570077563bb529a30f9e7c572b837cecfb83	20-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Make some helpers non-virtual in Mir2Lir These don't need to be virtual. Change-Id: Idca3c0a4e8b5e045d354974bd993492d6c0e70ba
d500b53ff8742f76b63c9f7593082d9e8114b85f	17-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Some Quick cleanup Move some definitions around. In case a method is already virtual, avoid instruction-set tests. Change-Id: I8d98f098e55ade1bc0cfa32bb2aad006caccd07d
7e499925f8b4da46ae51040e9322690f3df992e6	06-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
1cc7dbabd03e0a6c09d68161417a21bd6f9df371	18-Dec-2014	Andreas Gampe <agampe@google.com>	ART: Reorder entrypoint argument order Shuffle the ArtMethod* referrer backwards for easier removal. Clean up ARM & MIPS assembly code. Change some macros to make future changes easier. Change-Id: Ie2862b68bd6e519438e83eecd9e1611df51d7945
e21dc3db191df04c100620965bee4617b3b24397	09-Dec-2014	Andreas Gampe <agampe@google.com>	ART: Swap-space in the compiler Introduce a swap-space and corresponding allocator to transparently switch native allocations to memory backed by a file. Bug: 18596910 (cherry picked from commit 62746d8d9c4400e4764f162b22bfb1a32be287a9) Change-Id: I131448f3907115054a592af73db86d2b9257ea33
8b858e16563ebf8e522df026a6ab409f1bd9b3de	27-Nov-2014	Vladimir Marko <vmarko@google.com>	Quick: Redefine the notion of back-egdes. Redefine a back-edge to really mean an edge to a loop head instead of comparing instruction offsets. Generate suspend checks also on fall-through to a loop head; insert an extra GOTO for these edges. Add suspend checks to fused cmp instructions. Rewrite suspend check elimination to track whether there is an invoke on each path from the loop head to a given back edge, instead of using domination info to look for a basic block with invoke that must be on each path. Ignore invokes to intrinsics and move the optimization to a its own pass. The new loops in 109-suspend-check should prevent intrinsics and fused cmp-related regressions. Bug: 18522004 Change-Id: I96ac818f76ccf9419a6e70e9ec00555f9d487a9e
717a3e447c6f7a922cf9c3efe522747a187a045d	13-Nov-2014	Serguei Katkov <serguei.i.katkov@intel.com>	Re-factor Quick ABI support Now every architecture must provide a mapper between VRs parameters and physical registers. Additionally as a helper function architecture can provide a bulk copy helper for GenDalvikArgs utility. All other things becomes a common code stuff: GetArgMappingToPhysicalReg, GenDalvikArgsNoRange, GenDalvikArgsRange, FlushIns. Mapper now uses shorty representation of input parameters. This is required due to location are not enough to detect the type of parameter (fp or core). For the details see https://android-review.googlesource.com/#/c/113936/. Change-Id: Ie762b921e0acaa936518ee6b63c9a9d25f83e434 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
7ab2fce83cd72c0963128b098a78606e77ea15d5	28-Nov-2014	Vladimir Marko <vmarko@google.com>	Refactor handling of conditional branches with known result. Detect IF_cc and IF_ccZ instructions with known results in the basic block optimization phase (instead for the codegen phase) and replace them with GOTO/NOP. Kill blocks that are unreachable as a result. Change-Id: I169c2fa6f1e8af685f4f3a7fe622f5da862ce329
6af820639c74e769ffc1f54930f6ebc11364f894	26-Nov-2014	Yevgeny Rouban <yevgeny.y.rouban@intel.com>	ART: x86 specific clearing higher bits when converting long to int The following problem description is taken from https://android-review.googlesource.com/107261 If destination and source of long-to-int is the same physical register on 64-bit then we do not emit any instructions but consider that destination is a 32-bit view of source register. As a result high part contains garbage. If the destination is used later as index to array access then this garbage is used in computation of address because address is 64-bit. For all other cases garbage is just ignored. A generic solution (113023) for all hw platforms was suggested but rejected later for the sake of HW specific solution: https://android-review.googlesource.com/113023 https://android-review.googlesource.com/114436 This patch is a rework of patch 113023 to stick with x86_64 specific changes: for 64-bit target this patch forces generating reg-to-reg copy if the src and dest are the same physical registers. This makes the higher bits be zeroed by 32-bit move instruction. Change-Id: Id29af839506ff9319ffba08b2e86e240fef4dafd Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
743b98cd3d7db1cfd6b3d7f7795e8abd9d07a42d	24-Nov-2014	Vladimir Marko <vmarko@google.com>	Skip null check in MarkGCCard() for known non-null values. Use GVN's knowledge of non-null values to set a new MIR flag for IPUT/SPUT/APUT to skip the value null check. Change-Id: I97a8d1447acb530c9bbbf7b362add366d1486ee1
da96aeda912ff317de2c41e5a49bd244427238ac	27-Oct-2014	Chao-ying Fu <chao-ying.fu@intel.com>	ART: Generate switch targets from successor blocks This patch relies on the successor blocks to generate switch targets in GenSmallPackedSwitch and GenSmallSparseSwitch for all quick targets. In x86, we create a new packed switch table by storing basic block ids instead of dex offsets, and we override MarkPackedCaseLabels and InsertCaseLabel to avoid calling FindBlock. Change-Id: Ibb5983db582f0965aba787b520bd106522453564 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
807140048f82a2b87ee5bcf337f23b6a3d1d5269	21-Nov-2014	Mathieu Chartier <mathieuc@google.com>	Add fast string sharpening String sharpening changes const strings to PC relative loads instead of always going through the dex cache. This saves code size and probably improves performance slightly. Before: 49602992 system@framework@boot.oat After: 49385904 system@framework@boot.oat Pre-cursor to removing dex_cache_strings_ field from ArtMethod. Bug: 17643507 Change-Id: I1787f48774631eee0accafeea257aa8d0e91e8d6
bf535be514570fc33fc0a6347a87dcd9097d9bfd	19-Nov-2014	Vladimir Marko <vmarko@google.com>	Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
d582fa4ea62083a7598dded5b82dc2198b3daac7	06-Nov-2014	Ian Rogers <irogers@google.com>	Instruction set features for ARM64, MIPS and X86. Also, refactor how feature strings are handled so they are additive or subtractive. Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler rather than #ifdefs that wouldn't have worked in cross-compilation. Add SIMD features for x86/x86-64 proposed in: https://android-review.googlesource.com/#/c/112370/ Bug: 18056890 Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
675e09b2753c2fcd521bd8f0230a0abf06e9b0e9	23-Oct-2014	Ningsheng Jian <ningsheng.jian@arm.com>	ARM: Strength reduction for floating-point division For floating-point division by power of two constants, generate multiplication by the reciprocal instead. Change-Id: I39c79eeb26b60cc754ad42045362b79498c755be
080dd413e133ae357ab9572d924f7a884315d535	05-Nov-2014	Vladimir Marko <vmarko@google.com>	Clean up arena objects in Mir2Lir. Change-Id: I93fca37be2ae100ddebf80b6ba7a561b187e8886
785d2f2116bb57418d81bb55b55a087afee11053	04-Nov-2014	Andreas Gampe <agampe@google.com>	ART: Replace COMPILE_ASSERT with static_assert (compiler) Replace all occurrences of COMPILE_ASSERT in the compiler tree. Change-Id: Icc40a38c8bdeaaf7305ab3352a838a2cd7e7d840
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f	31-Oct-2014	Ian Rogers <irogers@google.com>	Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5667fdbb6e441dee7534ade18b628ed396daf593	23-Oct-2014	Zheng Xu <zheng.xu@arm.com>	ARM: Use hardfp calling convention between java to java call. This patch default to use hardfp calling convention. Softfp can be enabled by setting kArm32QuickCodeUseSoftFloat to true. We get about -1 ~ +5% performance improvement with different benchmark tests. Hopefully, we should be able to get more performance by address the left TODOs, as some part of the code takes the original assumption which is not optimal. DONE: 1. Interpreter to quick code 2. Quick code to interpreter 3. Transition assembly and callee-saves 4. Trampoline(generic jni, resolution, invoke with access check and etc.) 5. Pass fp arg reg following aapcs(gpr and stack do not follow aapcs) 6. Quick helper assembly routines to handle ABI differences 7. Quick code method entry 8. Quick code method invocation 9. JNI compiler TODO: 10. Rework ArgMap, FlushIn, GenDalvikArgs and affected common code. 11. Rework CallRuntimeHelperXXX(). Change-Id: I9965d8a007f4829f2560b63bcbbde271bdcf6ec2
5c5676b26a08454b3f0133783778991bbe5dd681	30-Sep-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	ART: Add div/rem zero check elimination flag Just as with other throwing bytecodes, it is possible to prove in some cases that a divide/remainder won't throw ArithmeticException. For example, in case two divides with same denominator are in order, then provably the second one cannot throw if the first one did not. This patch adds the elimination flag and updates the signature of several Mir2Lir methods to take the instruction optimization flags into account. Change-Id: I0b078cf7f29899f0f059db1f14b65a37444b84e8 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
d8c3e3608a7b47e82186e4f8118541ef06d9eab2	08-Oct-2014	Alexei Zavjalov <alexei.zavjalov@intel.com>	ART: X86: GenLongArith should handle overlapped VRs In a case, when src and dest VRs are overlapped when we called GenLongArith it may cause the incorrect use of regs. The solution is to map src to an physical reg and work with this reg instead of mem. Renamed BadOverlap() to PartiallyIntersects() for consistency. Change-Id: Ia3fc7f741f0a92556e1b2a1b084506662ef04c9d Signed-off-by: Katkov, Serguei I <serguei.i.katkov@intel.com> Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
832336b3c9eb892045a8de1bb12c9361112ca3c5	09-Oct-2014	Ian Rogers <irogers@google.com>	Don't copy fill array data to quick literal pool. Currently quick copies the fill array data from the dex file to the literal pool. It then has to go through hoops to pass this PC relative address down to out-of-line code. Instead, pass the offset of the table to the out-of-line code and use the CodeItem data associated with the ArtMethod. This reduces the size of oat code while greatly simplifying it. Unify the FillArrayData implementation in quick, portable and the interpreters. Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
7c02e918e752ab36f0b6cab7528f10c0cf55a4ee	03-Oct-2014	buzbee <buzbee@google.com>	Quick compiler: Fix ambiguous LoadValue() Internal b/17790197 & hat tip to Stephen Kyle The following custom-edited dex program demonstrated incorrect code generation caused by type confusion. In the example, the constant held in v0 is used in both float and int contexts, and the register class gets confused at the if-eq. .method private static getInt()I .registers 4 const/16 v0, 100 const/4 v1, 1 const/4 v2, 7 :loop if-eq v2, v0, :done add-int v2, v2, v1 goto :loop :done add-float v3, v0, v1 return v2 .end method The bug was introduced in c/96499, "Quick compiler: reference cleanup" That CL created a convenience variant of LoadValue which selected the target register type based on the type of the RegLocation. It should not have done so. The type of a RegLocation is the compiler's best guess of the Dalvik type - and Dalvik allows constants to be used in multiple type contexts. All code generation utilities must specify desired register class based on the capabilities of the instructions to be emitted. In the failing case, OpCmpImmBranch (and GenCompareZeroAndBranch) will be using core registers, so the LoadValue must specify either kCoreReg or kRefReg. The CL deletes the dangerous LoadValue() variant. Change-Id: Ie4ec6e51b19676dbbb9628c72c8b3473a419e7ec
f4da675bbc4615c5f854c81964cac9dd1153baea	01-Aug-2014	Vladimir Marko <vmarko@google.com>	Implement method calls using relative BL on ARM. Store the linker patches with each CompiledMethod instead of keeping them in CompilerDriver. Reorganize oat file creation to apply the patches as we're writing the method code. Add framework for platform-specific relative call patches in the OatWriter. Implement relative call patches for ARM. Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
e39c54ea575ec710d5e84277fcdcc049f8acb3c9	22-Sep-2014	Vladimir Marko <vmarko@google.com>	Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
4e67841e99e4a206133e7010653ccd132682296a	09-Sep-2014	Mathieu Chartier <mathieuc@google.com>	Change Reference.get() intrinsic to Reference.getReferent(). The reference intrinsic was incorrectly inlining PhantomReference.get(). We now get around this by adding a layer of indirection. Reference.get() now calls getReferent() which is intrinsified and inlined. Requires: https://android-review.googlesource.com/#/c/107100/ Bug: 17429865 (cherry picked from commit cd48f2d86197d4fe87cc88077bc4af5ba66e5295) Change-Id: Ie91e70abf43cedf3c707c7bb8a5059e19d2a2577
cd48f2d86197d4fe87cc88077bc4af5ba66e5295	09-Sep-2014	Mathieu Chartier <mathieuc@google.com>	Change Reference.get() intrinsic to Reference.getReferent(). The reference intrinsic was incorrectly inlining PhantomReference.get(). We now get around this by adding a layer of indirection. Reference.get() now calls getReferent() which is intrinsified and inlined. Requires: https://android-review.googlesource.com/#/c/107100/ Bug: 17429865 Change-Id: Ie91e70abf43cedf3c707c7bb8a5059e19d2a2577
8d0d03e24325463f0060abfd05dba5598044e9b1	07-Jun-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
37f05ef45e0393de812d51261dc293240c17294d	17-Jul-2014	Fred Shih <ffred@google.com>	Reduced memory usage of primitive fields smaller than 4-bytes Reduced memory used by byte and boolean fields from 4 bytes down to a single byte and shorts and chars down to two bytes. Fields are now arranged as Reference followed by decreasing component sizes, with fields shuffled forward as needed. Bug: 8135266 Change-Id: I65eaf31ed27e5bd5ba0c7d4606454b720b074752
53c913bb71b218714823c8c87a1f92830c336f61	13-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Clean up compiler Clean up the compiler: less extern functions, dis-entangle compilers, hide some compiler specifics, lower global includes. Change-Id: Ibaf88d02505d86994d7845cf0075be5041cc8438
9a8a506b1cd639ad4126c19530cd206d8d3923c3	07-Aug-2014	Martyn Capewell <martyn.capewell@arm.com>	AArch64: Improve MIR to LIR translation for abs Improve translation by using a shorter and more efficient sequence for integer abs, and replacing UBFM with AND for FP abs in integer registers. Change-Id: Ifc39cd7806ed637d5cfc3284c435b5d501047eb5 Signed-off-by: Alexandre Rames <alexandre.rames@arm.com>
e3ea83811d47152c00abea24a9b420651a33b496	08-Aug-2014	Yevgeny Rouban <yevgeny.y.rouban@intel.com>	ART source line debug info in OAT files OAT files have source line information enough for ART runtime needs like jump to/from interpreter and thread suspension. But this information is not enough for finer grained source level debugging and low-level profiling (VTune or perf). This patch adds to OAT files two additional sections: .debug_line - DWARF formatted Elf32 section with detailed source line information (mapping from native PC to Java source lines). In addition to the debugging symbols added using the dex2oat option --include-debug-symbols, the source line information is added to the section .debug_line. The source line info can be read by many Elf reading tools like objdump, readelf, dwarfdump, gdb, perf, VTune, ... gdb can use this debug line information in x86. In 64-bit mode the information can be used if the oat file is mapped in the lower address space (address has higher 32 bits zeroed). Relocation works. Testing: 1. art/test/run-test --host --gdb [--64] 001-HelloWorld 2. in gdb: break Main.java:19 3. in gdb: break Runtime.java:111 4. in gdb: run - stops at void java.lang.Runtime.<init>() 5. in gdb: backtrace - shows call stack down to main() 6. in gdb: continue - stops at void Main.main() (only in 32-bit mode) 7. in gdb: backtrace - shows call stack down to main() 8. objdump -W <oat-file> - addresses are from VMA range of .text section reported by objdump -h <file> 9. dwarfdump -ka <oat-file> - no errors expected Size of aosp-x86-eng boot.oat increased by 11% from 80.5Mb to 89.2Mb with two sections added .debug_line (7.2Mb) and .rel.debug (1.5Mb). Change-Id: Ib8828832686e49782a63d5529008ff4814ed9cda Signed-off-by: Yevgeny Rouban <yevgeny.y.rouban@intel.com>
8c18c2aaedb171f9b03ec49c94b0e33449dc411b	06-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Bug: 16241558 (cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13) Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
e7f82e2515f47f3c3292281312d7031a34a58ffc	06-Aug-2014	Fred Shih <ffred@google.com>	Added support for patching classes from different dex files. Added support for class patching from different dex files and moved ScopedObjectAccess from the quick compiler to driver. Slight refactoring for clarity. Bug: 16656190 Change-Id: I107fcbce75db42ca61321ea1c5d5f236680a1b3d
48971b3242e5126bcd800cc9c68df64596b43d13	06-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
547cdfd21ee21e4ab9ca8692d6ef47c62ee7ea52	05-Aug-2014	Tong Shen <endlessroad@google.com>	Emit CFI for x86 & x86_64 JNI compiler. Now for host-side x86 & x86_64 ART, we are able to get complete stacktrace with even mixed C/C++ & Java stack frames. Testing: 1. art/test/run-test --host --gdb [--64] --no-relocate 005 2. In gdb, run 'b art::Class_classForName' which is implementation of a Java native method, then 'r' 3. In gdb, run 'bt'. You should see stack frames down to main() Change-Id: I2d17e9aa0f6d42d374b5362a15ea35a2fce96302
c76c614d681d187d815760eb909e5faf488a3c35	05-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Refactor long ops in quick compiler Make GenArithOpLong virtual. Let the implementation in gen_common be very basic, without instruction-set checks, and meant as a fall-back. Backends should implement and dispatch to code for better implementations. This allows to remove the GenXXXLong virtual methods from Mir2Lir, and clean up the backends (especially removing some LOG(FATAL) implementations). Change-Id: I6366443c0c325c1999582d281608b4fa229343cf
8081d2b8d7a743729557051d0294e040e61c747a	31-Jul-2014	Vladimir Marko <vmarko@google.com>	Create allocator adapter for using Arena in std containers. Create ArenaAllocatorAdapter, similar to the existing ScopedArenaAllocatorAdapter, for allocating memory for standard containers via the ArenaAllocator. Add the ability to specify allocation kind rather than just kArenaAllocSTL to both adapters. Move the scoped arena allocator to the scoped_arena_containers.h header file. Define template aliases for containers using the new adapter and change a few MIRGraph and Mir2Lir members to use them. Change-Id: I9bbc50248e0fed81729497b848cb29bf68444268
c763e350da562b0c6bebf10599588d4901140e45	04-Jul-2014	Matteo Franchin <matteo.franchin@arm.com>	AArch64: Implement InexpensiveConstant methods. Implement IsInexpensiveConstant and friends for A64. Also extending the methods to take the opcode with respect to which the constant is inexpensive. Additionally, logical operations (i.e. and, or, xor) can now handle the immediates 0 and ~0 (which are not logical immediates). Change-Id: I46ce1287703765c5ab54983d13c1b3a1f5838622
6bbf0967d217ab2b7bdbb78bfd076b8fb07a44e8	14-Jul-2014	Alexei Zavjalov <alexei.zavjalov@intel.com>	ART: Implement the easy long division/remainder by a constant Also optimizes long/int divisions by power-of-two values. Also do some clean-up. Change-Id: Ie414e64aac251c81361ae107d157c14439e6dab5 Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
2eba1fa7e9e5f91e18ae3778d529520bd2c78d55	31-Jul-2014	Serban Constantinescu <serban.constantinescu@arm.com>	AArch64: Add inlining support for ceil(), floor(), rint(), round() This patch adds inlining support for the following Math, StrictMath methods in the ARM64 backend: * double ceil(double) * double floor(double) * double rint(double) * long round(double) * int round(float) Also some cleanup. Change-Id: I9f5a2f4065b1313649f4b0c4380b8176703c3fe1 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
63999683329612292d534e6be09dbde9480f1250	15-Jul-2014	Serban Constantinescu <serban.constantinescu@arm.com>	Revert "Revert "Enable Load Store Elimination for ARM and ARM64"" This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: I3aadb12a787164146a95bc314e85fa73ad91e12b
c32447bcc8c36ee8ff265ed678c7df86936a9ebe	27-Jul-2014	Bill Buzbee <buzbee@android.com>	Revert "Enable Load Store Elimination for ARM and ARM64" On extended testing, I'm seeing a CHECK failure at utility_arm.cc:1201. This reverts commit fcc36ba2a2b8fd10e6eebd21ecb6329606443ded. Change-Id: Icae3d49cd7c8fcab09f2f989cbcb1d7e5c6d137a
fcc36ba2a2b8fd10e6eebd21ecb6329606443ded	15-Jul-2014	Serban Constantinescu <serban.constantinescu@arm.com>	Enable Load Store Elimination for ARM and ARM64 This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: Iefae9b696f87f833ef35c451ed4d49c5a1b6fde0
984305917bf57b3f8d92965e4715a0370cc5bcfb	28-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Rework quick entrypoint code in Mir2Lir, cleanup To reduce the complexity of calling trampolines in generic code, introduce an enumeration for entrypoints. Introduce a header that lists the entrypoint enum and exposes a templatized method that translates an enum value to the corresponding thread offset value. Call helpers are rewritten to have an enum parameter instead of the thread offset. Also rewrite LoadHelper and GenConversionCall this way. It is now LoadHelper's duty to select the right thread offset size. Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to further simplify the call helpers, as well as make OpThreadMem specific to X86 only (removed from Mir2Lir). Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they are now specific to X86 only. Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend. Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented. Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
bebee4fd10e5db6cb07f59bc0f73297c900ea5f0	16-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Bug: 16241558 (cherry picked from commit 90969af6deb19b1dbe356d62fe68d8f5698d3d8f) Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
0f45f22eb3c52f0ece4c56989180e79c6680d825	15-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Bug: 16256184 (cherry picked from commit 7ea6f79bbddd69d5db86a8656a31aaaf64ae2582) Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
9ee4519afd97121f893f82d41d23164fc6c9ed34	17-Jul-2014	Serguei Katkov <serguei.i.katkov@intel.com>	x86: GenSelect utility update The is follow-up https://android-review.googlesource.com/#/c/101396/ to make x86 GenSelectConst32 implementation complete. Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
f9d6aede77c700118e225f8312cd888262b77862	17-Jul-2014	Vladimir Marko <vmarko@google.com>	Use vabs/fabs on arm/arm64 for intrinsic abs(). Bug: 11579369 (cherry picked from 5030d3ee8c6fe10394912ede107cbc8df63b7b16) Change-Id: I7b0596a8e7e3c87a93b225519c5aeedfe4f22e6d
7ea6f79bbddd69d5db86a8656a31aaaf64ae2582	15-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
147eb41b53729ec8d5c188d1cac90964a51afb8a	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
f12feb8e0e857f2832545b3f28d31bad5a9d3903	17-Jul-2014	Nicolas Geoffray <ngeoffray@google.com>	Stack overflow checks and NPE checks for optimizing. Change-Id: I59e97448bf29778769b79b51ee4ea43f43493d96
5030d3ee8c6fe10394912ede107cbc8df63b7b16	17-Jul-2014	Vladimir Marko <vmarko@google.com>	Use vabs/fabs on arm/arm64 for intrinsic abs(). Bug: 11579369 Change-Id: If09da85e22786faa13a2d74f62cee68ea67bd087
d85614222fa062ec809af9d65f04ab6b7dc1c248	11-Jul-2014	Fred Shih <ffred@google.com>	Revert "Revert "Revert "Revert "Add intrinsic for Reference.get()"""" Fixed TargetReg issue causing build failure for x86. This reverts commit 9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f. (cherry picked from commit 4ee7a665e7f9cd2c5ace2d6304e33f64067b209f) Change-Id: I555f4e06955711262e6b37ffbeabee9698ec695c
90969af6deb19b1dbe356d62fe68d8f5698d3d8f	16-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Refactor GenSelect, refactor gen_common accordingly This adds a GenSelect method meant for selection of constants. The general-purpose GenInstanceof code is refactored to take advantage of this. This cleans up code and squashes a branch-over on ARM64 to a cset. Also add a slow-path for type initialization in GenInstanceof. Change-Id: Ie4494858bb8c26d386cf2e628172b81bba911ae5
69dfe51b684dd9d510dbcb63295fe180f998efde	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
d9cb8ae2ed78f957a773af61759432d7a7bf78af	09-Jul-2014	Douglas Leung <douglas@mips.com>	Fix art test failures for Mips. This patch fixes the following art test failures for Mips: 003-omnibus-opcodes 030-bad-finalizer 041-narrowing 059-finalizer-throw Change-Id: I4e0e9ff75f949c92059dd6b8d579450dc15f4467 Signed-off-by: Douglas Leung <douglas@mips.com>
4ee7a665e7f9cd2c5ace2d6304e33f64067b209f	11-Jul-2014	Fred Shih <ffred@google.com>	Revert "Revert "Revert "Revert "Add intrinsic for Reference.get()"""" Fixed TargetReg issue causing build failure for x86. This reverts commit 9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f. Change-Id: I7e6a526954467aaf68deeed999880dfe9aa5f06e
ed7a0f2fb84b200ab6ef34e30dcbba4c0cf8d435	10-Jun-2014	Matteo Franchin <matteo.franchin@arm.com>	AArch64: improve usage of TargetReg() and friends. TargetReg(arg1) does now always return a 32-bit register. We also avoid using this function directly and rather use the two-arguments overload or TargetPtrReg(). Change-Id: I746b3c29a2a2553b399b5c3e7ee3887c7e7c52c3
af263df7f643e699abf622c64447d31bacc14c34	12-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Change GenPCUseDefEncoding(), turn on Load Hoisting for ARM64 This defines the PC resource mask as empty, as the PC is not accessible on ARM64. Unify code paths with x86 in LoadStoreElimination and LoadHoisting. Change-Id: Iea8b9e666f306c7a6ff52b6c5bf7e05b35346b2c
a9b870b73a155ce70c867d5b3f9758fab0b45f07	11-Jul-2014	Christopher Ferris <cferris@google.com>	Revert "Add intrinsic for Reference.get()" This reverts commit 460503b13bc894828a2d2d47d09e5534b3e91aa1. Change-Id: Ie63f43049307e02e3b90f4e034abc9ea54ca4e24
ccc60264229ac96d798528d2cb7dbbdd0deca993	05-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Rework TargetReg(symbolic_reg, wide) Make the standard implementation in Mir2Lir and the specialized one in the x86 backend return a pair when wide = "true". Introduce WideKind enumeration to improve code readability. Simplify generic code based on this implementation. Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
59a42afc2b23d2e241a7e301e2cd68a94fba51e5	04-Jul-2014	Serguei Katkov <serguei.i.katkov@intel.com>	Update counting VR for promotion For 64-bit it makes sense to compute VR uses together for int and long because core reg is shared. Change-Id: Ie8676ece12c928d090da2465dfb4de4e91411920 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
e9f3e71c90094e87ff83bd5449a2fc4d65f717b2	04-Jul-2014	Mark Mendell <mark.p.mendell@intel.com>	Updates to help classes derived from X86Mir2Lir Just a couple of extra changes to help me out. These changes won't affect anyone else. Change-Id: I0e0985a4f16822d5cbfabbf81c9902d34ebdb5da Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e	10-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
d4415e8bd04c4a9367744ff0149597b4f37a0e0a	11-Jul-2014	Christopher Ferris <cferris@google.com>	Revert "Revert "Add intrinsic for Reference.get()"" This reverts commit a9b870b73a155ce70c867d5b3f9758fab0b45f07. Change-Id: Ic2a9b47f2b911bef4b764d10bc33cf000e4b4211
9e82bd3f0ce9e5f5777bea2f752ff3e251d32f9f	11-Jul-2014	Sebastien Hertz <shertz@google.com>	Revert "Revert "Revert "Add intrinsic for Reference.get()""" This reverts commit d4415e8bd04c4a9367744ff0149597b4f37a0e0a. Change-Id: I34553ccbdcfea35c7742d21be2a74dc7085ab2a0
0025a86411145eb7cd4971f9234fc21c7b4aced1	11-Jul-2014	Nicolas Geoffray <ngeoffray@google.com>	Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
460503b13bc894828a2d2d47d09e5534b3e91aa1	18-Jun-2014	Fred Shih <ffred@google.com>	Add intrinsic for Reference.get() Added an intrinsic function for Reference.get(). Return immediately without going through JNI if the slow path is not currently in use. Otherwise, branch off to the the existing JNI function. Approximately 47x speedup for cases where slow path is not enabled. Change-Id: I13ad65a356fe4e104d8d83980694dc2740d7d039
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf	29-May-2014	Dave Allison <dallison@google.com>	Add implicit null and stack checks for x86 This adds compiler and runtime changes for x86 implicit checks. 32 bit only. Both host and target are supported. By default, on the host, the implicit checks are null pointer and stack overflow. Suspend is implemented but not switched on. Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
3d14eb620716e92c21c4d2c2d11a95be53319791	10-Jul-2014	Dave Allison <dallison@google.com>	Revert "Add implicit null and stack checks for x86" It breaks cross compilation with x86_64. This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf. Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
70c4f06f9965cdb9319a2c85f65acda20086d765	25-Jun-2014	DaniilSokolov <daniil.y.sokolov@intel.com>	ART: Intrinsic implementation for java.lang.System.arraycopy. Implements intrinsic for java.lang.System.arraycopy(char[], int, char[], int, int) - this method is internal to android class libraries and used in such classes as StringBuffer and StringBuilder. It is not possible to call it from application code. The intrinsic for this method is implemented as inline method (assembly code is generated manually). The intrinsic is x86 32 bit only. Change-Id: Id1b1e0a20d5f6d5f5ebfe1fdc2447b6d8a515432 Signed-off-by: Daniil Sokolov <daniil.y.sokolov@intel.com>
a77ee5103532abb197f492c14a9e6fb437054e2a	02-Jul-2014	Chao-ying Fu <chao-ying.fu@intel.com>	x86_64: TargetReg update for x86 Also includes changes in common code. Elimination of use of TargetReg with one parameter and direct access to special target registers. Change-Id: Ied2c1f87d4d1e4345248afe74bca40487a46a371 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
b5860fb459f1ed71f39d8a87b45bee6727d79fe8	22-Jun-2014	buzbee <buzbee@google.com>	Register promotion support for 64-bit targets Not sufficiently tested for 64-bit targets, but should be fairly close. A significant amount of refactoring could stil be done, (in later CLs). With this change we are not making any changes to the vmap scheme. As a result, it is a requirement that if a vreg is promoted to both a 32-bit view and the low half of a 64-bit view it must share the same physical register. We may change this restriction later on to allow for more flexibility for 32-bit Arm. For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to promote, we'd end up with something like: v4 (as an int) -> r10 v4/v5 (as a long) -> r10 v5 (as an int) -> r11 v5/v6 (as a long) -> r11 Fix a couple of ARM64 bugs on the way... Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
255e014542b2180620230e4d9d6000ae06846bbd	04-Jul-2014	Matteo Franchin <matteo.franchin@arm.com>	Aarch64: fix references handling in Load*Indexed. Fix the way we handle references in Load/StoreBaseIndexed and friends. We assume references are 64-bit RegStorage entities, with the difference that they are load as 32-bit values. Change-Id: I7fe987ef9e97e9a5042b85378b33d1e85710d8b5
23abec955e2e733999a1e2c30e4e384e46e5dde4	02-Jul-2014	Serban Constantinescu <serban.constantinescu@arm.com>	AArch64: Add few more inline functions This patch adds inlining support for the following functions: * Math.max/min(long, long) * Math.max/min(float, float) * Math.max/min(double, double) * Integer.reverse(int) * Long.reverse(long) Change-Id: Ia2b1619fd052358b3a0d23e5fcbfdb823d2029b9 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
dd64450b37776f68b9bfc47f8d9a88bc72c95727	01-Jul-2014	Elena Sayapina <elena.v.sayapina@intel.com>	x86_64: Unify 64-bit check in x86 compiler Update x86-specific Gen64Bit() check with the CompilationUnit target64 field which is set using unified Is64BitInstructionSet(InstructionSet) check. Change-Id: Ic00ac863ed19e4543d7ea878d6c6c76d0bd85ce8 Signed-off-by: Elena Sayapina <elena.v.sayapina@intel.com>
4b537a851b686402513a7c4a4e60f5457bb8d7c1	01-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Quick compiler: More size checks, add TargetReg variants Add variants for TargetReg for requesting specific register usage, e.g., wide and ref. More register size checks. With code adapted from https://android-review.googlesource.com/#/c/98605/. Change-Id: I852d3be509d4dcd242c7283da702a2a76357278d
de68676b24f61a55adc0b22fe828f036a5925c41	24-Jun-2014	Andreas Gampe <agampe@google.com>	Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11	24-Jun-2014	Andreas Gampe <agampe@google.com>	Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d	23-Jun-2014	Andreas Gampe <agampe@google.com>	ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
c61b3c984c509d5f7c8eb71b853c81a34b5c28ef	18-Jun-2014	Matteo Franchin <matteo.franchin@arm.com>	AArch64: implement easy division and reminder. This implements easy division and reminder for integer only (32-bit). The optimisation applies to div/rem by powers of 2 and to div by small literals (between 3-15). Change-Id: I71be7c4de5d2e2e738b88984f13efb08f4388a19
7cd26f355ba83be75b72ed628ed5ee84a3245c4f	19-Jun-2014	Andreas Gampe <agampe@google.com>	ART: Target-dependent stack overflow, less check elision Refactor the separate stack overflow reserved sizes from thread.h into instruction_set.h and make sure they're used in the compiler. Refactor the decision on when to elide stack overflow checks: especially with large interpreter stack frames, it is not a good idea to elide checks when the frame size is even close to the reserved size. Currently enforce checks when the frame size is >= 2KB, but make sure that frame sizes 1KB and below will elide the checks (number from experience). Bug: 15728765 Change-Id: I016bfd3d8218170cbccbd123ed5e2203db167c06
7071c8d5885175a746723a3b38a347855965be08	05-Mar-2014	Yixin Shou <yixin.shou@intel.com>	Add x86 inlined abs method for float/double Add the optimized implementation of inlined abs method for float/double for X86 side. Change-Id: I2f367542f321d88a976129f9f7156fd3c2965c8a Signed-off-by: Yixin Shou <yixin.shou@intel.com>
4c115b85cc48f4dfc8fc2b0484ddfeb29f02d658	17-Jun-2014	Vladimir Marko <vmarko@google.com>	Revert "Add x86 inlined abs method for float/double" This reverts commit e88b89ad1d1a583daf205c7a387ba13f549f95f1. Change-Id: I2ba21b7442ba3696482d45001e6bd32e8baf9d1f
e88b89ad1d1a583daf205c7a387ba13f549f95f1	05-Mar-2014	Yixin Shou <yixin.shou@intel.com>	Add x86 inlined abs method for float/double Add the optimized implementation of inlined abs method for float/double for X86 side. Change-Id: I4e095644a90524354040174954c1e127c7bb4ee2 Signed-off-by: Yixin Shou <yixin.shou@intel.com>
5aa6e04061ced68cca8111af1e9c19781b8a9c5d	14-Jun-2014	Ian Rogers <irogers@google.com>	Tidy x86 assembler. Use helper functions to compute when the kind has a SIB, a ModRM and RegReg form. Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
169489b4f4be8c5dd880ba6f152948324d22ff79	11-Jun-2014	Serban Constantinescu <serban.constantinescu@arm.com>	AArch64: Add support for inlined methods This patch adds support for Arm64 inlined methods. Change-Id: Ic6aeed6d2d32f65cd1e63cf482f83cdcf958798a
8dea81ca9c0201ceaa88086b927a5838a06a3e69	06-Jun-2014	Vladimir Marko <vmarko@google.com>	Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
58994cdb00b323339bd83828eddc53976048006f	16-May-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	x86_64: Hard Float ABI support in QCG This patch shows our efforts on resolving the ART limitations: - passing "float"/"double" arguments via FPR - passing "long" arguments via single GPR, not pair - passing more than 3 agruments via GPR. Work done: - Extended SpecialTargetRegister enum with kARG4, kARG5, fARG4..fARG7. - Created initial LoadArgRegs/GenDalvikX/FlushIns version in X86Mir2Lir. - Unlimited number of long/double/float arguments support - Refactored (v2) Change-Id: I5deadd320b4341d5b2f50ba6fa4a98031abc3902 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
089142cf1d0c028b5a7c703baf0b97f4a4ada3f7	05-Jun-2014	Vladimir Marko <vmarko@google.com>	Avoid register pool allocations on the heap. Create a helper template class ArrayRef and use it instead of std::vector<> for register pools in target_<arch>.cc to avoid these heap allocations during program startup. Change-Id: I4ab0205af9c1d28a239c0a105fcdc60ba800a70a
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879	01-Jun-2014	buzbee <buzbee@google.com>	Quick compiler: reference cleanup For 32-bit targets, object references are 32 bits wide both in Dalvik virtual registers and in core physical registers. Because of this, object references and non-floating point values were both handled as if they had the same register class (kCoreReg). However, for 64-bit systems, references are 32 bits in Dalvik vregs, but 64 bits in physical registers. Although the same underlying physical core registers will still be used for object reference and non-float values, different register class views will be used to represent them. For example, an object reference in arm64 might be held in x3 at some point, while the same underlying physical register, w3, would be used to hold a 32-bit int. This CL breaks apart the handling of object reference and non-float values to allow the proper register class (or register view) to be used. A new register class, kRefReg, is introduced which will map to a 32-bit core register on 32-bit targets, and 64-bit core registers on 64-bit targets. From this point on, object references should be allocated registers in the kRefReg class rather than kCoreReg. Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
ffddfdf6fec0b9d98a692e27242eecb15af5ead2	03-Jun-2014	Tim Murray <timmurray@google.com>	DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
0955f7e470fb733aef07096536e9fba7c99250aa	23-May-2014	Matteo Franchin <matteo.franchin@arm.com>	AArch64: fixing some assertions. Fixing some assertions while attempting to get libartd.so to work. Fixing also the shift logic in LoadBaseIndexed() and StoreBaseIndexed(). This commit only fixes a part of the assertion issues. Change-Id: I473194d4260dd59a8ee6d73114429728c977ee0e
85089dd28a39dd20f42ac258398b2a08668f9ef1	26-May-2014	buzbee <buzbee@google.com>	Quick compiler: generalize NarrowRegLoc() Some of the RegStorage utilites (DoubleToLowSingle(), DoubleToHighSingle(), etc.) worked only for targets which which treat double precision registers as a pair of aliased single precision registers. This CL elminates those utilities, and replaces them with a new RegisterInfo utility that will search an aliased register set and return the member matching the required storage configuration (if it exists). Change-Id: Iff5de10f467d20a56e1a89df9fbf30d1cf63c240
642fe34958ba7fafa81341823241616edde0380c	24-May-2014	buzbee <buzbee@google.com>	Quick compiler: fix register clobbering. Ensure all aliased children of a register set are clobbered when any member is clobbered. Additionally, use a clobbering mask to avoid clobbering non-overlapping siblings. Change-Id: Ic0d88a30f3e5b7a359396f6541d602739fa3124a
ed65c5e982705defdb597d94d1aa3f2997239c9b	22-May-2014	Serban Constantinescu <serban.constantinescu@arm.com>	AArch64: Enable LONG_* and INT_* opcodes. This patch fixes some of the issues with LONG and INT opcodes. The patch has been tested and passes all the dalvik tests except for 018 and 107. Change-Id: Idd1923ed935ee8236ab0c7e5fa969eaefeea8708 Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
a51a0b0300268b605e3ad71b0e87ff394032c5e7	21-May-2014	Vladimir Marko <vmarko@google.com>	Method inlining across dex files in boot image. Fix LoadCodeAddress() and LoadMethodAddress() to use the dex file in addition to the method index to uniquely identify the literal. With that fix in place, when we have both the direct code and the direct method, we can safely pass the actual target method id instead of the method id from the same dex file in the method lowering info. This was already done for calls from apps into boot image (and thus there was a bug with a tiny risk of the wrong literal being used) and now we also do that for calls within the boot image. The latter allows the inlining pass to inline many more methods than before in the boot image. Bug: 15021903 Change-Id: Ic765ce9809b43ef07e7db32b8e3fbc9acb09147f
b01bf15d18f9b08d77e7a3c6e2897af0e02bf8ca	14-May-2014	buzbee <buzbee@google.com>	64-bit temp register support. Add a 64-bit temp register allocation path. The recent physical register handling rework supports multiple views of the same physical register (or, such as for Arm's float/double regs, different parts of the same physical register). This CL adds a 64-bit core register view for 64-bit targets. In short, each core register will have a 64-bit name, and a 32-bit name. The different views will be kept in separate register pools, but aliasing will be tracked. The core temp register allocation routines will be largely identical - except for 32-bit targets, which will continue to use pairs of 32-bit core registers for holding long values. Change-Id: I8f118e845eac7903ad8b6dcec1952f185023c053
e87f9b5185379c8cf8392d65a63e7bf7e51b97e7	30-Apr-2014	Mark Mendell <mark.p.mendell@intel.com>	Allow X86 QBE to be extended Enhancements and updates to allow X86Mir2LIR Backend to be subclassed for experimentation. Add virtual in a whole bunch of places, and make some other changes to get this to work. Change-Id: I0980a19bc5d5725f91660f98c95f1f51c17ee9b6 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
082833c8d577db0b2bebc100602f31e4e971613e	18-May-2014	buzbee <buzbee@google.com>	Quick compiler, out of registers fix It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This could result in "out of registers" failures, as well as other more subtle problems. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Cherry-pick of internal change 468162 Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
05d3aeb33683b16837741f9348d6fba9a8432068	18-May-2014	buzbee <buzbee@google.com>	Quick compiler, out of registers fix Fixes b/15024623 It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
d65c51a556e6649db4e18bd083c8fec37607a442	29-Apr-2014	Mark Mendell <mark.p.mendell@intel.com>	ART: Add support for constant vector literals Add in some vector instructions. Implement the ConstVector instruction, which takes 4 words of data and loads it into an XMM register. Initially, only the ConstVector MIR opcode is implemented. Others will be added after this one goes in. Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b14329f90f725af0f67c45dfcb94933a426d63ce	15-May-2014	Andreas Gampe <agampe@google.com>	ART: Fix MonitorExit code on ARM We do not emit barriers on non-SMP systems. But on ARM, we have places that need to conditionally execute, which is done through an IT instruction. The guide of said instruction thus changes between SMP and non-SMP systems. To cleanly approach this, change the API so that GenMemBarrier returns whether it generated an instruction. ARM will have to query the result and update any dependent IT. Throw a build system error if TARGET_CPU_SMP is not set. Fix runtime/Android.mk to work with new multilib host. Bug: 14989275 Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
c93ac8b73b5772e43b6dd1cc9e1deee79ca68849	13-May-2014	Vladimir Marko <vmarko@google.com>	Fix special getter/setter to use RegClassForFieldLoadStore(). This ensures correct register class is used for volatile load/store in these getters and setters. Bug: 14112919 Change-Id: Ib7aa83d441fb007e97f9acc2a778bc20ffed837c
9ee801f5308aa3c62ae3bedae2658612762ffb91	12-May-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	Add x86_64 code generation support Utilizes r0..r7 in register allocator, implements spill/unsill core regs as well as operations with stack pointer. Change-Id: I973d5a1acb9aa735f6832df3d440185d9e896c67 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
2f244e9faccfcca68af3c5484c397a01a1c3a342	08-May-2014	Andreas Gampe <agampe@google.com>	ART: Add more ThreadOffset in Mir2Lir and backends This duplicates all methods with ThreadOffset parameters, so that both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic checks against the compilation unit's instruction set determine which pointer size to use and therefore which methods to call. Methods with unsupported pointer sizes should fatally fail, as this indicates an issue during method selection. Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
ba57451494946a128703e1cbd8bf5969ee8dc598	13-May-2014	buzbee <buzbee@google.com>	Quick compiler: fix compile-time perf regression The recent changes to the temp register liveness tracking introduced a measureable compile-time performance regression. This CL cleans it up. Change-Id: Id698b93e957f0ecab7ddfab94727f85e49cf10cf
0dc242d6fc1254e6ca1c31e08e612bbf45644b17	12-May-2014	Vladimir Marko <vmarko@google.com>	Avoid unnecessary copy/load in EvalLoc() and LoadValue(). EvalLoc()/EvalLocWide() are used to prepare a register where a value is subsequently stored, so they shouldn't copy the old value to the new register for register class mismatch. The only exception where we actually need a copy is LoadValue()/LoadValueWide(), so we inline the old code that makes the copy there. We also avoid loading inexpensive constants when the value is already in the register. Change-Id: I07519e9d4d9b3f7272233d196435f3035e4a3ca9
30adc7383a74eb3cb6db3bf42cea3a5595055ce1	10-May-2014	buzbee <buzbee@google.com>	Quick compiler: Fix liveness tracking Rework temp register liveness tracking to play nicely with aliased physical registers, and re-enable liveness tracking optimization. Add a pair of x86 utility routines that act like UpdateLoc(), but only show in-register live temps if they are of the expected register class. Change-Id: I92779e0da2554689103e7488025be281f1a58989
674744e635ddbdfb311fbd25b5a27356560d30c3	24-Apr-2014	Vladimir Marko <vmarko@google.com>	Use atomic load/store for volatile IGET/IPUT/SGET/SPUT. Bug: 14112919 Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
e45fb9e7976c8462b94a58ad60b006b0eacec49f	06-May-2014	Matteo Franchin <matteo.franchin@arm.com>	AArch64: Change arm64 backend to produce A64 code. The arm backend clone is changed to produce A64 code. At the moment this backend can only compile simple methods (both leaf and non-leaf). Most of the work on the assembler (assembler_arm64.cc) has been done. Some work on the LIR generation layer (functions such as OpRegRegImm & friends) is still necessary. The register allocator still needs to be adapted to the A64 instruction set (it is mostly unchanged from the arm backend). Offsets for helpers in gen_invoke.cc still need to be changed to work on 64-bit. Change-Id: I388f99eeb832857981c7d9d5cb5b71af64a4b921
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824	07-May-2014	Vladimir Marko <vmarko@google.com>	Cleanup ARM load/store wide and remove unused param s_reg. Use a single LDRD/VLDR instruction for wide load/store on ARM, adjust the base pointer if needed. Remove unused parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp() and StoreBaseIndexedDisp() on all architectures. Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
455759b5702b9435b91d1b4dada22c4cce7cae3c	06-May-2014	Vladimir Marko <vmarko@google.com>	Remove LoadBaseDispWide and StoreBaseDispWide. Just pass k64 or kDouble to non-wide versions. Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
091cc408e9dc87e60fb64c61e186bea568fc3d3a	31-Mar-2014	buzbee <buzbee@google.com>	Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
6ffcfa04ebb2660e238742a6000f5ccebdd5df15	25-Apr-2014	Mingyao Yang <mingyao@google.com>	Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
7a11ab09f93f54b1c07c0bf38dd65ed322e86bc6	29-Apr-2014	buzbee <buzbee@google.com>	Quick compiler: debugging assists A few minor assists to ease A/B debugging in the Quick compiler: 1. To save time, the assemblers for some targets only update the object code offsets on instructions involved with pc-relative fixups. We add code to fix up all offsets when doing a verbose codegen listing. 2. Temp registers are normally allocated in a round-robin fashion. When disabling liveness tracking, we now reset the round-robin pool to 0 on each instruction boundary. This makes it easier to spot real codegen differences. 3. Self-register copies were previously emitted, but marked as nops. Minor change to avoid generating them in the first place and reduce clutter. Change-Id: I7954bba3b9f16ee690d663be510eac7034c93723
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb	19-Apr-2014	buzbee <buzbee@google.com>	Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
3a74d15ccc9a902874473ac9632e568b19b91b1c	22-Apr-2014	Mingyao Yang <mingyao@google.com>	Delete throw launchpads. Bug: 13170824 Change-Id: I9d5834f5a66f5eb00f2ac80774e8c27dea99949e
80365d9bb947edef0eae0bfe62b9f7a239416e6b	18-Apr-2014	Mingyao Yang <mingyao@google.com>	Revert "Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException."" This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException. And fix the host test crash. Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
7fff544c38f0dec3a213236bb785c3ca13d21a0f	18-Apr-2014	Brian Carlstrom <bdc@google.com>	Revert "Use LIRSlowPath for throwing ArrayOutOfBoundsException." This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
9d46314a309aff327f9913789b5f61200c162609	18-Apr-2014	Mingyao Yang <mingyao@google.com>	Use LIRSlowPath for throwing ArrayOutOfBoundsException. Get rid of launchpads for throwing ArrayOutOfBoundsException and use LIRSlowPath instead. Bug: 13170824 Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
e643a179cf5585ba6bafdd4fa51730d9f50c06f6	08-Apr-2014	Mingyao Yang <mingyao@google.com>	Use LIRSlowPath for throwing NPE. Get rid of launchpads for throwing NPE and use LIRSlowPath instead. Also clean up some code of using LIRSlowPath for checking div by zero. Bug: 13170824 Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b	10-Apr-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304	09-Apr-2014	Dave Allison <dallison@google.com>	Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
4289456fa265b833434c2a8eee9e7a16da31c524	07-Apr-2014	Mingyao Yang <mingyao@google.com>	Use LIRSlowPath for throwing div by zero exception. Get rid of launchpads for throwing div by zero exception and use LIRSlowPath instead. Add a CallRuntimeHelper that takes no argument for the runtime function. Bug: 13170824 Change-Id: I7e0563e736c6f92bd63e3fbdfe3a777ad333e338
081f73e888b3c246cf7635db37b7f1105cf1a2ff	07-Apr-2014	Dave Allison <dallison@google.com>	Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6	19-Feb-2014	Dave Allison <dallison@google.com>	Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
3da67a558f1fd3d8a157d8044d521753f3f99ac8	03-Apr-2014	Dave Allison <dallison@google.com>	Add OpEndIT() for marking the end of OpIT blocks In ARM we need to prevent code motion to the inside of an IT block. This was done using a GenBarrier() to mark the end, but it wasn't obvious that this is what was happening. This CL adds an explicit OpEndIT() that takes the LIR of the OpIT for future checks. Bug: 13751744 Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
dd7624d2b9e599d57762d12031b10b89defc9807	15-Mar-2014	Ian Rogers <irogers@google.com>	Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
f943914730db8ad2ff03d49a2cacd31885d08fd7	27-Mar-2014	Dave Allison <dallison@google.com>	Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
306f017dd883c0bf806d239d97e0bca3194afbd7	07-Jan-2014	Vladimir Marko <vmarko@google.com>	Faster AssembleLIR for ARM. This also reduces sizeof(LIR) by 4 bytes (32-bit builds). Change-Id: I0cb81f9bf098dfc50050d5bc705c171af26464ce
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6	28-Mar-2014	Ian Rogers <irogers@google.com>	Revert "Revert "Optimize easy multiply and easy div remainder."" This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088. Remove the part of the change that confused !is_div with being multiply rather than implying remainder. Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
9da5c1013215176f2a4dbe7a804be899e12d5f68	28-Mar-2014	buzbee <buzbee@google.com>	Quick compiler, MIPS resource cleanup MIPS architecture includes internal registers HI and LO. Similar to condition codes in other architectures, these internal resouces must be accounted for during instruction scheduling. Previously, the Quick backend for MIPS dealt with them by defining rHI and rLO pseudo registers - treating them as actual registers for def/use masks. This CL changes the handling of these resources to be in line with how condition codes are used elsewhere - leaving register definitions to be used for registers. Change-Id: Idcd77f3107b0c9b081ad05b1aab663fb9f41492d
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e	28-Mar-2014	Brian Carlstrom <bdc@google.com>	Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb. (cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088) Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
3654a6f50a948ead89627f398aaf86a2c2db0088	28-Mar-2014	Brian Carlstrom <bdc@google.com>	Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
262b299abf658c16f61dad2240cfaf3deafe4423	27-Mar-2014	buzbee <buzbee@google.com>	Fix x86 master build failure. Replace bogus DCHECKs with logic matching pre-cleanup code. Register pairs are considered temp, promoted, dirty or live if either register of the pair meets criteria. Change-Id: If2df891fdd1e3351d4cbe72aaf2a2ac5b34b2110
14a46d820b04b848063f7c32ecd2cf82dd90cb1d	27-Mar-2014	buzbee <buzbee@google.com>	Fix x86 master build failure. Replace bogus DCHECKs with logic matching pre-cleanup code. Register pairs are considered temp, promoted, dirty or live if either register of the pair meets criteria. Change-Id: If2df891fdd1e3351d4cbe72aaf2a2ac5b34b2110
08df4b3da75366e5db37e696eaa7e855cba01deb	25-Mar-2014	Zheng Xu <zheng.xu@arm.com>	Optimize easy multiply and easy div remainder. Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters. Add special cases for 0 and 1. Add more easy multiply special cases for Arm. Reuse easy multiply in SmallLiteralDivRem() to support remainder cases. Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6	07-Mar-2014	buzbee <buzbee@google.com>	Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
99ad7230ccaace93bf323dea9790f35fe991a4a2	26-Feb-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Relaxed memory barriers for x86 X86 provides stronger memory guarantees and thus the memory barriers can be optimized. This patch ensures that all memory barriers for x86 are treated as scheduling barriers. And in cases where a barrier is needed (StoreLoad case), an mfence is used. Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
60d7a65f7fb60f502160a2e479e86014c7787553	14-Mar-2014	Brian Carlstrom <bdc@google.com>	Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d (cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
c0f96d03a1855fda7d94332331b94860404874dd	14-Mar-2014	Brian Carlstrom <bdc@google.com>	Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
e90501da0222717d75c126ebf89569db3976927e	12-Mar-2014	Serguei Katkov <serguei.i.katkov@intel.com>	Add dependency for operations with x86 FPU stack Load Hoisting optimization can re-order operations with FPU stack due to no dependency set. Patch adds resource dependency between these operations. Change-Id: Iccce98c8f3c565903667c03803884d9de1281ea8 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
b373e091eac39b1a79c11f2dcbd610af01e9e8a9	21-Feb-2014	Dave Allison <dallison@google.com>	Implicit null/suspend checks (oat version bump) This adds the ability to use SEGV signals to throw NullPointerException exceptions from Java code rather than having the compiler generate explicit comparisons and branches. It does this by using sigaction to trap SIGSEGV and when triggered makes sure it's in compiled code and if so, sets the return address to the entry point to throw the exception. It also uses this signal mechanism to determine whether to check for thread suspension. Instead of the compiler generating calls to a function to check for threads being suspended, the compiler will now load indirect via an address in the TLS area. To trigger a suspend, the contents of this address are changed from something valid to 0. A SIGSEGV will occur and the handler will check for a valid instruction pattern before invoking the thread suspension check code. If a user program taps SIGSEGV it will prevent our signal handler working. This will cause a failure in the runtime. There are two signal handlers at present. You can control them individually using the flags -implicit-checks: on the runtime command line. This takes a string parameter, a comma separated set of strings. Each can be one of: none switch off null null pointer checks suspend suspend checks all all checks So to switch only suspend checks on, pass: -implicit-checks:suspend There is also -explicit-checks to provide the reverse once we change the default. For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar The default is -implicit-checks:none There is also a property 'dalvik.vm.implicit_checks' whose value is the same string as the command option. The default is 'none'. For example to switch on null checks using the option: setprop dalvik.vm.implicit_checks null It only works for ARM right now. Bumps OAT version number due to change to Thread offsets. Bug: 13121132 Change-Id: If743849138162f3c7c44a523247e413785677370
3bc8615332b7848dec8c2297a40f7e4d176c0efb	13-Mar-2014	Vladimir Marko <vmarko@google.com>	Use LIRSlowPath for intrinsics, improve String.indexOf(). Rewrite intrinsic launchpads to use the LIRSlowPath. Improve String.indexOf for constant chars by avoiding the check for code points over 0xFFFF. Change-Id: I7fd5583214c5b4ab9c38ee36c5d6f003dd6345a8
49161cef10a308aedada18e9aa742498d6e6c8c7	12-Mar-2014	Jeff Hao <jeffhao@google.com>	Allow patching between dex files in the boot classpath. Change-Id: I53f219a5382d0fcd580e96e50025fdad4fc399df
83cc7ae96d4176533dd0391a1591d321b0a87f4f	12-Feb-2014	Vladimir Marko <vmarko@google.com>	Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
a1a7074eb8256d101f7b5d256cda26d7de6ce6ce	03-Mar-2014	Vladimir Marko <vmarko@google.com>	Rewrite kMirOpSelect for all IF_ccZ opcodes. Also improve special cases for ARM and add tests. Change-Id: I06f575b9c7b547dbc431dbfadf2b927151fe16b9
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2	28-Feb-2014	Bill Buzbee <buzbee@android.com>	Revert "Revert "Rework Quick compiler's register handling"" This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace. Ready. Fixed the original type, plus some mechanical changes for rebasing. Still needs additional testing, but the problem with the original CL appears to have been a typo in the definition of the x86 double return template RegLocation. Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
be0e546730e532ef0987cd4bde2c6f5a1b14dd2a	26-Feb-2014	Vladimir Marko <vmarko@google.com>	Cache field lowering info in mir_graph. Change-Id: I9f9d76e3ae6c31e88bdf3f59820d31a625da020f
ae9fd93c39a341e2dffe15c61cc7d9e841fa92c4	11-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Tell GDB about Quick ART generated code This is actually a lot of work. To do this, we need: .debug_info .debug_abbrev .debug_frame .debug_str These are generated into the OAT file by OatWriter and ElfWriterQuick. Since the Quick ART runtime doesn't use dlopen to load the OAT files, GDB can't find this information. Use the alternate GDB JIT interface, which can be invoked at runtime. To use this interface, an ELF image needs to be built in memory. Read the information from the OAT file, fixup the addresses to point to the real locations, add a symbol table to hold the .text symbol, and then let GDB know about the information, which will be read from the runtime address space. This is quite primitive now, and could be cleaned up considerably. It probably needs symbol table entries for the methods, and descriptions of parameters and return types. Currently only supported for X86. This defaults to enabled for debug builds. Added dexoat --gen-gdb-info and --no-gen-gdb-info flags to override. Change-Id: I4d18b2370f6dfaa00c8cc1925f10717be3bd1a62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a1ce1fef2d49d1d537776a5308ace7102a815fe5	25-Feb-2014	Brian Carlstrom <bdc@google.com>	Split up CommonTest into CommonRuntimeTest and CommonCompilerTest Change-Id: I8dcf6b29a5aecd445f1a3ddb06386cf81dbc9c70
86ec520fc8b696ed6f164d7b756009ecd6e4aace	26-Feb-2014	Bill Buzbee <buzbee@android.com>	Revert "Rework Quick compiler's register handling" This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c. Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
2c1ed456dcdb027d097825dd98dbe48c71599b6c	20-Feb-2014	buzbee <buzbee@google.com>	Rework Quick compiler's register handling For historical reasons, the Quick backend found it convenient to consider all 64-bit Dalvik values held in registers to be contained in a pair of 32-bit registers. Though this worked well for ARM (with double-precision registers also treated as a pair of 32-bit single-precision registers) it doesn't play well with other targets. And, it is somewhat problematic for 64-bit architectures. This is the first of several CLs that will rework the way the Quick backend deals with physical registers. The goal is to eliminate the "64-bit value backed with 32-bit register pair" requirement from the target-indendent portions of the backend and support 64-bit registers throughout. The key RegLocation struct, which describes the location of Dalvik virtual register & register pairs, previously contained fields for high and low physical registers. The low_reg and high_reg fields are being replaced with a new type: RegStorage. There will be a single instance of RegStorage for each RegLocation. Note that RegStorage does not increase the space used. It is 16 bits wide, the same as the sum of the 8-bit low_reg and high_reg fields. At a target-independent level, it will describe whether the physical register storage associated with the Dalvik value is a single 32 bit, single 64 bit, pair of 32 bit or vector. The actual register number encoding is left to the target-dependent code layer. Because physical register handling is pervasive throughout the backend, this restructuring necessarily involves large CLs with lots of changes. I'm going to roll these out in stages, and attempt to segregate the CLs with largely mechanical changes from those which restructure or rework the logic. This CL is of the mechanical change variety - it replaces low_reg and high_reg from RegLocation and introduces RegStorage. It also includes a lot of new code (such as many calls to GetReg()) that should go away in upcoming CLs. The tentative plan for the subsequent CLs is: o Rework standard register utilities such as AllocReg() and FreeReg() to use RegStorage instead of ints. o Rework the target-independent GenXXX, OpXXX, LoadValue, StoreValue, etc. routines to take RegStorage rather than int register encodings. o Take advantage of the vector representation and eliminate the current vector field in RegLocation. o Replace the "wide" variants of codegen utilities that take low_reg/high_reg pairs with versions that use RegStorage. o Add 64-bit register target independent codegen utilities where possible, and where not virtualize with 32-bit general register and 64-bit general register variants in the target dependent layer. o Expand/rework the LIR def/use flags to allow for more registers (currently, we lose out on 16 MIPS floating point regs as well as ARM's D16..D31 for lack of space in the masks). o [Possibly] move the float/non-float determination of a register from the target-dependent encoding to RegStorage. In other words, replace IsFpReg(register_encoding_bits). At the end of the day, all code in the target independent layer should be using RegStorage, as should much of the target dependent layer. Ideally, we won't be using the physical register number encoding extracted from RegStorage (i.e. GetReg()) until the NewLIRx() layer. Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
9c86a0279aaf953377aa9e2277592e68bf814989	21-Feb-2014	Ian Rogers <irogers@google.com>	Revert "Annotate used fields." This reverts commit 7f6cf56942c8469958b273ea968db253051c5b05. Change-Id: Ic389a194c3404ecb5bb563a405bf4a0d6336ea0d
4028a6c83a339036864999fdfd2855b012a9f1a7	20-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Inline x86 String.indexOf Take advantage of the presence of a constant search char or start index to tune the generated code. Change-Id: I0adcf184fb91b899a95aa4d8ef044a14deb51d88 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
7f6cf56942c8469958b273ea968db253051c5b05	29-Jan-2014	Vladimir Marko <vmarko@google.com>	Annotate used fields. Annotate all fields used by a method early during the compilation, check acces rights and record field offset, volatility, etc. Use these annotations when generating code for IGET/IPUT/SGET/SPUT instructions. Change-Id: I4bbf5cca4fecf53c9bf9c93ac1793e2f40c16b5f
818f2107e6d2d9e80faac8ae8c92faffa83cbd11	18-Feb-2014	Nicolas Geoffray <ngeoffray@google.com>	Re-apply: Initial check-in of an optimizing compiler. The classes and the names are very much inspired by V8/Dart. It currently only supports the RETURN_VOID dex instruction, and there is a pretty printer to check if the building of the graph is correct. Change-Id: I28e125dfee86ae6ec9b3fec6aa1859523b92a893
1af0c0b88a956813eb0ad282664cedc391e2938f	19-Feb-2014	Nicolas Geoffray <ngeoffray@google.com>	Revert "Initial check-in of an optimizing compiler." g++ warnings turned into errors. This reverts commit 68a5fefa90f03fdf5a238ac85c9439c6b03eae96. Change-Id: I09bb95d9cc13764ca8a266c41af04801a34b9fd0
68a5fefa90f03fdf5a238ac85c9439c6b03eae96	18-Feb-2014	Nicolas Geoffray <ngeoffray@google.com>	Initial check-in of an optimizing compiler. The classes and the names are very much inspired by V8/Dart. It currently only supports the RETURN_VOID dex instruction, and there is a pretty printer to check if the building of the graph is correct. Change-Id: Id5ef1b317ab997010d4e3888e456c26bef1ab9c0
3bc01748ef1c3e43361bdf520947a9d656658bf8	06-Feb-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	GenSpecialCase support for x86 Moved GenSpecialCase from being ARM specific to common code to allow it to be used by x86 quick as well. Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
614c2b4e219631e8c190fd9fd5d4d9cd343434e1	29-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Support to generate inline long to FP bytecodes for x86 long-to-float and long-to-double are now generated inline instead of calling a helper routine. The conversion is done by using x87. Change-Id: I196e526afec1be212898baceca8527549c3655b6 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
6607d97166984ce578817269f9775c15b9044190	10-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Tweak Mir2Lir::GenInstanceofCallingHelper for X86 Make this virtual, and split out the X86 logic. Take advantage of SETcc instruction for X86. I don't think I can do much more due to need to preserve arguments for the calls. Change-Id: I10e3eaa61b61ceac384267e3078bb6f75c37cee4 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
55d0eac918321e0525f6e6491f36a80977e0d416	06-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Support Direct Method/Type access for X86 Thumb generates code to optimize calls to methods within core.oat. Implement this for X86 as well, but take advantage of mov with 32 bit immediate and call relative with 32 bit immediate. Fix some incorrect return locations for long inlines. Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
dbb17e378b538133750e56375bbdbb217db7b248	07-Feb-2014	Yixin Shou <yixin.shou@intel.com>	Added inlined abs method with float and double type This patch added the implementation for inlining java.lang.Math.abs() method with float and double type. Change-Id: Ic99471b4ab4176e4a0153bef383bb49944fb636f Signed-off-by: Yixin Shou <yixin.shou@intel.com>
2c498d1f28e62e81fbdb477ff93ca7454e7493d7	30-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Specializing x86 range argument copying The ARM implementation of range argument copying was specialized in some cases. For all other architectures, it would fall back to generating memcpy. This patch updates the x86 implementation so it does not call memcpy and instead generates loads and stores, favoring movement of 128-bit chunks. Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
bcec6fba95ee7974d3f7b81c3c02e7eb3ca3df00	17-Jan-2014	Dave Allison <dallison@google.com>	Make slow paths easier to write This adds a class LIRSlowPath that allows for deferred compilation of slow paths. Using this object you can add code that will be invoked out of line using a forward branch. The intention is to move the slow paths out of the main flow and avoid branch-over constructs that will almost always trigger. The forward branch to the slow path code will be predicted false and this will be correct most of the time. The slow path code returns to the instruction after the original branch using an unconditional branch. This is used in the following opcodes: sput, sget, const-string, check-cast, const-class. Others will follow. Bug: 10864890 Change-Id: I17130c5dc20d369bc6bbf50b8cf04343263e888e
feb2b4e2d1c6538777bb80b60f3a247537b6221d	28-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Redo x86 int arithmetic Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it to allow use of memory operands and knowledge of the fact that x86 has (mostly) two operand instructions. Remove x86 specific code from the generic version. Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide cases. Add some x86 helper routines to simplify generation. Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
da7a69b3fa7bb22d087567364b7eb5a75824efd8	09-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Enable compiler temporaries Compiler temporaries are a facility for having virtual register sized space for dealing with intermediate values during MIR transformations. They receive explicit space in managed frames so they can have a home location in case they need to be spilled. The facility also supports "special" temporaries which have specific semantic purpose and their location in frame must be tracked. The compiler temporaries are treated in the same way as virtual registers so that the MIR level transformations do not need to have special logic. However, generated code needs to know stack layout so that it can distinguish between home locations. MIRGraph has received an interface for dealing with compiler temporaries. This interface allows allocation of wide and non-wide virtual register temporaries. The information about how temporaries are kept on stack has been moved to stack.h. This is was necessary because stack layout is dependent on where the temporaries are placed. Change-Id: Iba5cf095b32feb00d3f648db112a00209c8e5f55 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
df8ee2ea9908db3dde463fed68391b0040517653	28-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	x86 updates GenInlinedUnsafePut/GenInstanceofFinal Allow x86 to inline GenInlinedUnsafePut by freeing up a temporary register early. Make an x86 specific version of GenInstanceofFinal that uses compare to memory and a setCC instruction. Change-Id: I67788d7ae83776b0b9069fe4b379452190774992 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
766e9295d2c34cd1846d81610c9045b5d5093ddd	27-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Improve GenConstString, GenS{get,put} for x86 Rewrite GenConstString for x86 to skip calling ResolveString when the string is already resolved. Also try to avoid a register copy if the Method* is in a promoted register. Implement the TODO for GenS{get,put} to use compare to memory for x86 by adding a new codegen function to compare directly to memory. Implement a default implementation that uses a temporary register for RISC architectures. Change-Id: Ie163cca3d3d841aa10c50dc6592ec30af7a7cbc9 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bb8f0ab736b61db8f543e433859272e83f96ee9b	28-Jan-2014	Hiroshi Yamauchi <yamauchi@google.com>	Embed array class pointers at array allocation sites. Following https://android-review.googlesource.com/#/c/79302, embed array class pointers at array allocation sites in the compiled code. Change-Id: I67a1292466dfbb7f48e746e5060e992dd93525c5
e27b3bf2c1044bfbfbe874affd3758a73009c6c6	23-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Support GenSelect for x86 kMirOpSelect is an extended MIR that has been generated in order to remove trivial diamond shapes where the conditional is an if-eqz or if-nez and on each of the paths there is a move or const bytecode with same destination register. This patch enables x86 to generate code for this extended MIR. A) Handling the constant specialization of kMirOpSelect: 1) When the true case is zero and result_reg is not same as src_reg: xor result_reg, result_reg cmp $0, src_reg mov t1, $false_case cmovnz result_reg, t1 2) When the false case is zero and result_reg is not same as src_reg: xor result_reg, result_reg cmp $0, src_reg mov t1, $true_case cmovz result_reg, t1 3) All other cases (we do compare first to set eflags): cmp $0, src_reg mov result_reg, $true_case mov t1, $false_case cmovnz result_reg, t1 B) Handling the move specialization of kMirOpSelect: 1) When true case is already in place: cmp $0, src_reg cmovnz result_reg, false_reg 2) When false case is already in place: cmp $0, src_reg cmovz result_reg, true_reg 3) When neither cases are in place: cmp $0, src_reg mov result_reg, true_reg cmovnz result_reg, false_reg Change-Id: Ic7c50823208fe82019916476a0a77c6a271679fe Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4708dcd68eebf1173aef1097dad8ab13466059aa	22-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Improve x86 long multiply and shifts Generate inline code for long shifts by constants and do long multiplication inline. Convert multiplication by a constant to a shift when we can. Fix some x86 assembler problems and add the new instructions that were needed (64 bit shifts). Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2bf31e67694da24a19fc1f328285cebb1a4b9964	23-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Improve x86 long divide Implement inline division for literal and variable divisors. Use the general case for dividing by a literal by using a double length multiply by the appropriate constant with fixups. This is the Hacker's Delight algorithm. Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
be1ca55db3362f5b100c4c65da5342fd299520bb	15-Jan-2014	Hiroshi Yamauchi <yamauchi@google.com>	Use direct class pointers at allocation sites in the compiled code. - Rather than looking up a class from its type ID (and checking if it's resolved/initialized, resolving/initializing if not), use direct class pointers, if possible (boot-code-to-boot-class pointers and app-code-to-boot-class pointers.) - This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4. - Embedding the object size (along with class pointers) caused a 1-2% slowdown in MemAllocTest and isn't implemented in this change. - TODO: do the same for array allocations. - TODO: when/if an application gets its own image, implement app-code-to-app-class pointers. - Fix a -XX:gc bug. cf. https://android-review.googlesource.com/79460/ - Add /tmp/android-data/dalvik-cache to the list of locations to remove oat files in clean-oat-host. cf. https://android-review.googlesource.com/79550 - Add back a dropped UNLIKELY in FindMethodFromCode(). cf. https://android-review.googlesource.com/74205 Bug: 9986565 Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
e02d48fb24747f90fd893e1c3572bb3c500afced	15-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Optimize x86 long arithmetic Be smarter about taking advantage of a constant operand for x86 long add/sub/and/or/xor. Using instructions with immediates and generating results directly into memory reduces the number of temporary registers and avoids hardcoded register usage. Also rewrite the existing non-const x86 arithmetic to avoid fixed register use, and use the fact that x86 instructions are two operand. Pass the opcode to the XXXLong() routines to easily detect two operand DEX opcodes. Add a new StoreFinalValueWide() routine, which is similar to StoreValueWide, but doesn't do an EvalLoc to allocate registers. The src operand must already be in registers, and it just updates the dest location, and calls the right live/dirty routines to get the src into the dest properly. Change-Id: Iefc16e7bc2236a73dc780d3d5137ae8343171f62 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d61ba4ba6fcde666adb5d5c81b1c32f0534fb2c8	13-Jan-2014	Bill Buzbee <buzbee@android.com>	Revert "Revert "Better support for x86 XMM registers"" This reverts commit 8ff67e3338952c70ccf3b609559bf8cc0f379cfd. Fix applied to loc.fp usage. Change-Id: I1eb3005392544fcf30c595923ed25bcee2dc4859
8ff67e3338952c70ccf3b609559bf8cc0f379cfd	11-Jan-2014	Bill Buzbee <buzbee@android.com>	Revert "Better support for x86 XMM registers" The invalid usage of loc.fp must be corrected before this change can be submitted. This reverts commit 766a5e5940b469ab40e52770862c81cfec1d835b. Change-Id: I1173a9bf829da89cccd9c2898f5e11164987a22b
766a5e5940b469ab40e52770862c81cfec1d835b	10-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Better support for x86 XMM registers Currently, ART Quick mode assumes that a double FP register is composed of two single consecutive FP registers. This is true for ARM and MIPS, but not x86. This means that only half of the 8 XMM registers are available for use by x86 doubles. This patch breaks the assumption that a wide FP RegisterLocation must be a paired set of FP registers. This is done by making some routines in common code virtual and overriding them in the X86Mir2Lir class. For these wide fp locations, the high register is set to the same value as the low register, in order to minimize changes to common code. In a couple of places, the common code checks for this case. The changes are also supposed to allow the possibility of using the XMM registers for vector operations,but that support is still WIP. Change-Id: Ic6ef24ea764991c6f4d9fb88d483a619f5a468cb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bd288c2c1206bc99fafebfb9120a83f13cf9723b	21-Dec-2013	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Add conditional move support to x86 and allow GenMinMax to use it X86 supports conditional moves which is useful for reducing branchiness. This patch adds support to the x86 backend to generate conditional reg to reg operations. Both encoder and decoder support was added for cmov. The x86 version of GenMinMax used for generating inlined version Math.min/max has been updated to make use of the conditional move support. Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
090dd4489eeffb5f10051a5d9c1ed71b0a6bc4b9	20-Dec-2013	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Eliminate redundant x86 compare for GenDivZeroCheck For x86, the ALU operations on general purpose registers update the flags. Thus, when generating the zero check for divide/remainder operations, the compare is not needed. Change-Id: I07bfdf7d5491d3e3e9d98a932472d7f18d5b46d3 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
5816ed48bc339c983b40dc493e96b97821ce7966	27-Nov-2013	Vladimir Marko <vmarko@google.com>	Detect special methods at the end of verification. This moves special method handling to method inliner and prepares for eventual inlining of these methods. Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
e13717e796d338b08ea66f6a7e3470ca44de707f	20-Nov-2013	Vladimir Marko <vmarko@google.com>	Per-DexFile locking for inliner initialization. And clean up lock and compiler driver naming. Change-Id: I1562c7f55c4b0174a36007ba6199360da06169ff
31c2aac7137b69d5622eea09597500731fbee2ef	09-Dec-2013	Vladimir Marko <vmarko@google.com>	Rename ClobberCalleeSave to Caller, fix it for x86. Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
06606b9c4a1c00154ed15f719ad8ea994e54ee8e	02-Dec-2013	Vladimir Marko <vmarko@google.com>	Performance improvement for mapping table creation. Avoid the raw mapping tables altogether. Change-Id: I6d1c786325d369e899a75f15701edbafdd14363f
70b797d998f2a28e39f7d6ffc8a07c9cbc47da14	03-Dec-2013	Vladimir Marko <vmarko@google.com>	Unsafe.compareAndSwapLong() intrinsic for x86. Change-Id: Idbc5371a62dfdd84485a657d4548990519200205
1e6cb63d77090ddc6aa19c755d7066f66e9ff87e	28-Nov-2013	Vladimir Marko <vmarko@google.com>	Delta-encoding of mapping tables. Both PC offsets and dalvik offsets are delta-encoded. Since PC offsets are increasing, the deltas are then compressed as unsigned LEB128. Dalvik offsets are not monotonic, so their deltas are compressed as signed LEB128. This reduces the size of the mapping tables by about 30% on average, 25% from the PC offset and 5% from the dalvik offset delta encoding. Bug: 9437697 Change-Id: I600ab9c22dec178088d4947a811cca3bc8bd4cf4
3e5af82ae1a2cd69b7b045ac008ac3b394d17f41	21-Nov-2013	Vladimir Marko <vmarko@google.com>	Intrinsic Unsafe.CompareAndSwapLong() for ARM. (cherry picked from cb53fcd79b1a5ce608208ec454b5c19f64aaba37) Change-Id: Iadd3cc8b4ed390670463b80f8efd579ce6ece226
1c282e2b9a9b432e132b2c332f861cad9feb4a73	21-Nov-2013	Vladimir Marko <vmarko@google.com>	Refactor intrinsic CAS, prepare for 64-bit version. Bug: 11391018 Change-Id: Ic0f740e0cd0eb47f2c915f81be02f52f7721f8a3
5c96e6b4dc354a7439b211b93462fbe8edea5e57	14-Nov-2013	Vladimir Marko <vmarko@google.com>	Rewrite intrinsics detection. Intrinsic methods should be treated as a special case of inline methods. They should be detected early and used to guide other optimizations. This CL rewrites the intrinsics detection so that it can be moved to any compilation phase. Change-Id: I4424a6a869bd98b9c478953c9e3bcaf1c6de2b33
e508a2090b19fe705fbc6b99d76474037a74bbfb	04-Nov-2013	Vladimir Marko <vmarko@google.com>	Fix unaligned Memory peek/poke intrinsics. Change-Id: Id454464d0b28aa37f5239f1c6589ceb0b3bbbdea
65636e5de2375839e29e3e19ee7a7db737901cf0	24-Oct-2013	Vladimir Marko <vmarko@google.com>	Add intrinsics for Memory peek/poke. Add intrinsics for single memory access (non-array) peek/poke methods in libcore.io.Memory. Change-Id: I5d66a5b14ea89875d8afb8252eb293f7d637b83f
6bdf1fff5f841f3997d4b488f00647f7aa2cdaa3	29-Oct-2013	Vladimir Marko <vmarko@google.com>	Add intrinsics for {Short,Int,Long}.reverseBytes(). Change-Id: I34a2ec642f59fc4ff18aed59769a9e8d7e361098
0d82948094d9a198e01aa95f64012bdedd5b6fc9	12-Oct-2013	buzbee <buzbee@google.com>	64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
409fe94ad529d9334587be80b9f6a3d166805508	11-Oct-2013	buzbee <buzbee@google.com>	Quick assembler fix This CL re-instates the select pattern optimization disabled by CL 374310, and fixes the underlying problem: improper handling of the kPseudoBarrier LIR opcode. The bug was introduced in the recent assembler restructuring. In short, LIR pseudo opcodes (which have values < 0), should always have size 0 - and thus cause no bits to be emitted during assembly. In this case, bad logic caused us to set the size of a kPseudoBarrier opcode via lookup through the EncodingMap. Because all pseudo ops are < 0, this meant we did an array underflow load, picking up whatever garbage was located before the EncodingMap. This explains why this error showed up recently - we'd previuosly just gotten a lucky layout. This CL corrects the faulty logic, and adds DCHECKs to uses of the EncodingMap to ensure that we don't try to access w/ a pseudo op. Additionally, the existing is_pseudo_op() macro is replaced with IsPseudoLirOp(), named similar to the existing IsPseudoMirOp(). Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
a9a8254c920ce8e22210abfc16c9842ce0aea28f	04-Oct-2013	Ian Rogers <irogers@google.com>	Improve quick codegen for aput-object. 1) don't type check known null. 2) if we know types in verify don't check at runtime. 3) if we're runtime checking then move all the code out-of-line. Also, don't set up a callee-save frame for check-cast, do an instance-of test then throw an exception if that fails. Tidy quick entry point of Ldivmod to Lmod which it is on x86 and mips. Fix monitor-enter/exit NPE for MIPS. Fix benign bug in mirror::Class::CannotBeAssignedFromOtherTypes, a byte[] cannot be assigned to from other types. Change-Id: I9cb3859ec70cca71ed79331ec8df5bec969d6745
d9c4fc94fa618617f94e1de9af5f034549100753	02-Oct-2013	Ian Rogers <irogers@google.com>	Inflate contended lock word by suspending owner. Bug 6961405. Don't inflate monitors for Notify and NotifyAll. Tidy lock word, handle recursive lock case alongside unlocked case and move assembly out of line (except for ARM quick). Also handle null in out-of-line assembly as the test is quick and the enter/exit code is already a safepoint. To gain ownership of a monitor on behalf of another thread, monitor contenders must not hold the monitor_lock_, so they wait on a condition variable. Reduce size of per mutex contention log. Be consistent in calling thin lock thread ids just thread ids. Fix potential thread death races caused by the use of FindThreadByThreadId, make it invariant that returned threads are either self or suspended now. Code size reduction on ARM boot.oat 0.2%. Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%, nexus 4 speedup 2.09% on DeltaBlue. Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
b48819db07f9a0992a72173380c24249d7fc648a	15-Sep-2013	buzbee <buzbee@google.com>	Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
d91d6d6a80748f277fd938a412211e5af28913b1	26-Sep-2013	Ian Rogers <irogers@google.com>	Introduce Signature type to avoid string comparisons. Method resolution currently creates strings to then compare with strings formed from methods in other dex files. The temporary strings are purely created for the sake of comparisons. This change creates a new Signature type that represents a method signature but not as a string. This type supports comparisons and so can be used when searching for methods in resolution. With this change malloc is no longer the hottest method during dex2oat (now its memset) and allocations during verification have been reduced. The verifier is commonly what is populating the dex cache for methods and fields not declared in the dex file itself. Change-Id: I5ef0542823fbcae868aaa4a2457e8da7df0e9dae
c729a6b936d59562bd9fb830a595d9ff65dfd129	15-Sep-2013	buzbee <buzbee@google.com>	Improve promotion of double-precision regs Minor rework of the double allocation mechanism to more explicitly manage the allocation of preserved floating point single pairs as doubles. Change-Id: Id9db4b0e86e5ef54a5db587f367e00efdf7e98d6
bd663de599b16229085759366c56e2ed5a1dc7ec	11-Sep-2013	buzbee <buzbee@google.com>	Compile-time tuning: register/bb utilities This CL yeilds about a 4% improvement in the compilation phase of dex2oat (single-threaded; multi-threaded compilation is more difficult to accurately measure). The register utilities could stand to be completely rewritten, but this gets most of the easy benefit. Next up: the assembly phase. Change-Id: Ife5a474e9b1a6d9e501e888dda6749d34eb77e96
252254b130067cd7a5071865e793966871ae0246	09-Sep-2013	buzbee <buzbee@google.com>	More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so only because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
56c717860df2d71d66fb77aa77f29dd346e559d3	06-Sep-2013	buzbee <buzbee@google.com>	Compile-time tuning Specialized the dataflow iterators and did a few other minor tweaks. Showing ~5% compile-time improvement in a single-threaded environment; less in multi-threaded (presumably because we're blocked by something else). Change-Id: I2e2ed58d881414b9fc97e04cd0623e188259afd2
9b297bfc588c7d38efd12a6f38cd2710fc513ee3	06-Sep-2013	Ian Rogers <irogers@google.com>	Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
11b63d13f0a3be0f74390b66b58614a37f9aa6c1	27-Aug-2013	buzbee <buzbee@google.com>	Quick compiler: division by literal fix The constant propagation optimization pass attempts to identify constants in Dalvik virtual registers and handle them more efficiently. The use of small constants in divison, though, was handled incorrectly in that the high level code correctly detected the use of a constant, but the actual code generation routine was only expecting the use of a special constant form opcode. see b/10503566 Change-Id: I88aa4d2eafebb2b1af1a1e88049f1845aefae261
96faf5b363d922ae91cf25404dee0e87c740c7c5	10-Aug-2013	Ian Rogers <irogers@google.com>	Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a (cherry picked from commit 1809a72a66d245ae598582d658b93a24ac3bf01e)
468532ea115657709bc32ee498e701a4c71762d4	05-Aug-2013	Ian Rogers <irogers@google.com>	Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
1809a72a66d245ae598582d658b93a24ac3bf01e	10-Aug-2013	Ian Rogers <irogers@google.com>	Uleb128 compression of vmap and mapping table. Bug 9437697. Change-Id: I30bcb97d12cd8b46d3b2cdcbdd358f08fbb9947a
848871b4d8481229c32e0d048a9856e5a9a17ef9	05-Aug-2013	Ian Rogers <irogers@google.com>	Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
7934ac288acfb2552bb0b06ec1f61e5820d924a4	26-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d	19-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
9b7085a4e7c40e7fa01932ea1647a4a33ac1c585	19-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint readability/braces issues Change-Id: I56b88956510077b0e13aad4caee8898313fab55b
df62950e7a32031b82360c407d46a37b94188fbb	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
0cd7ec2dcd8d7ba30bf3ca420b40dac52849876c	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/blank_line issues Change-Id: Ice937e95e23dd622c17054551d4ae4cebd0ef8a2
2ce745c06271d5223d57dbf08117b20d5b60694a	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
fc0e3219edc9a5bf81b166e82fd5db2796eb6a0d	17-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix multiple inclusion guards to match new pathnames Change-Id: Id7735be1d75bc315733b1773fba45c1deb8ace43
7940e44f4517de5e2634a7e07d58d0fb26160513	12-Jul-2013	Brian Carlstrom <bdc@google.com>	Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81