Cross Reference: /art/compiler/dex/quick/arm/call

History log of /art/compiler/dex/quick/arm/call_arm.cc
Revision	Date	Author	Comments
3d21bdf8894e780d349c481e5c9e29fe1556051c	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
41b175aba41c9365a1c53b8a1afbd17129c87c14	19-May-2015	Vladimir Marko <vmarko@google.com>	ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
848f70a3d73833fc1bf3032a9ff6812e429661d9	15-Jan-2014	Jeff Hao <jeffhao@google.com>	Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
2cebb24bfc3247d3e9be138a3350106737455918	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
1109fb3cacc8bb667979780c2b4b12ce5bb64549	07-Apr-2015	David Srbecky <dsrbecky@google.com>	Implement CFI for Quick. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
cc23481b66fd1f2b459d82da4852073e32f033aa	07-Apr-2015	Vladimir Marko <vmarko@google.com>	Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
e5c76c515a481074aaa6b869aa16490a47ba98bc	06-Apr-2015	Vladimir Marko <vmarko@google.com>	PC-relative loads from dex cache arrays for arm. Change-Id: Ic25df4b51a901ff1d2ca356b5eec71d4acc5d9b7
6f7158927fee233255f8e96719c374694b10cad3	30-Mar-2015	David Srbecky <dsrbecky@google.com>	Write .debug_line section using the new DWARF library. Also simplify dex to java mapping and handle mapping in prologues and epilogues. Change-Id: I410f06024580f2a8788f2c93fe9bca132805029a
20f85597828194c12be10d3a927999def066555e	19-Mar-2015	Vladimir Marko <vmarko@google.com>	Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
f6737f7ed741b15cfd60c2530dab69f897540735	23-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
0b40ecf156e309aa17c72a28cd1b0237dbfb8746	20-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Clean up slow paths. Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
e15ea086439b41a805d164d2beb07b4ba96aaa97	10-Feb-2015	Hiroshi Yamauchi <yamauchi@google.com>	Reserve bits in the lock word for read barriers. This prepares for the CC collector to use the standard object header model by storing the read barrier state in the lock word. Bug: 19355854 Bug: 12687968 Change-Id: Ia7585662dd2cebf0479a3e74f734afe5059fb70f
6ce3eba0f2e6e505ed408cdc40d213c8a512238d	16-Feb-2015	Vladimir Marko <vmarko@google.com>	Add suspend checks to special methods. Generate suspend checks at the beginning of special methods. If we need to call to runtime, go to the slow path where we create a simplified but valid frame, spill all arguments, call art_quick_test_suspend, restore necessary arguments and return back to the fast path. This keeps the fast path overhead to a minimum. Bug: 19245639 Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
72f53af0307b9109a1cfc0671675ce5d45c66d3a	12-Nov-2014	Chao-ying Fu <chao-ying.fu@intel.com>	ART: Remove MIRGraph::dex_pc_to_block_map_ This patch removes MIRGraph::dex_pc_to_block_map_, adds a local variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and updates several functions to pass dex_pc_to_block_map. The goal is to limit the scope of dex_pc_to_block_map and the usage of FindBlock, so that various compiler optimizations cannot rely on dex pc to look up basic blocks to avoid duplicated dex pc issues. Also, this patch changes quick targets to use successor blocks for switch case target generation at Mir2Lir::InstallSwitchTables(). Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
0b9203e7996ee1856f620f95d95d8a273c43a3df	23-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
7e499925f8b4da46ae51040e9322690f3df992e6	06-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
9d5c25acdd1e9635fde8f8bf52a126b4d371dabd	26-Nov-2014	Vladimir Marko <vmarko@google.com>	Quick: Use 16-bit Thumb2 PUSH/POP when possible. Generate correct PUSH/POP in Gen{Entry,Exit}Sequence() to avoid extra processing during insn fixup. Change-Id: I396168e2a42faee6980d40779c7de9657531867b
bf535be514570fc33fc0a6347a87dcd9097d9bfd	19-Nov-2014	Vladimir Marko <vmarko@google.com>	Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
2d7210188805292e463be4bcf7a133b654d7e0ea	10-Nov-2014	Mathieu Chartier <mathieuc@google.com>	Change 64 bit ArtMethod fields to be pointer sized Changed the 64 bit entrypoint and gc map fields in ArtMethod to be pointer sized. This saves a large amount of memory on 32 bit systems. Reduces ArtMethod size by 16 bytes on 32 bit. Total number of ArtMethod on low memory mako: 169957 Image size: 49203 methods -> 787248 image size reduction. Zygote space size: 1070 methods -> 17120 size reduction. App methods: ~120k -> 2 MB savings. Savings per app on low memory mako: 125K+ per app (less active apps -> more image methods per app). Savings depend on how often the shared methods are on dirty pages vs shared. TODO in another CL, delete gc map field from ArtMethod since we should be able to get it from the Oat method header. Bug: 17643507 Change-Id: Ie9508f05907a9f693882d4d32a564460bf273ee8 (cherry picked from commit e832e64a7e82d7f72aedbd7d798fb929d458ee8f)
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f	31-Oct-2014	Ian Rogers <irogers@google.com>	Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
832336b3c9eb892045a8de1bb12c9361112ca3c5	09-Oct-2014	Ian Rogers <irogers@google.com>	Don't copy fill array data to quick literal pool. Currently quick copies the fill array data from the dex file to the literal pool. It then has to go through hoops to pass this PC relative address down to out-of-line code. Instead, pass the offset of the table to the out-of-line code and use the CodeItem data associated with the ArtMethod. This reduces the size of oat code while greatly simplifying it. Unify the FillArrayData implementation in quick, portable and the interpreters. Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
f4da675bbc4615c5f854c81964cac9dd1153baea	01-Aug-2014	Vladimir Marko <vmarko@google.com>	Implement method calls using relative BL on ARM. Store the linker patches with each CompiledMethod instead of keeping them in CompilerDriver. Reorganize oat file creation to apply the patches as we're writing the method code. Add framework for platform-specific relative call patches in the OatWriter. Implement relative call patches for ARM. Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
e39c54ea575ec710d5e84277fcdcc049f8acb3c9	22-Sep-2014	Vladimir Marko <vmarko@google.com>	Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
8d0d03e24325463f0060abfd05dba5598044e9b1	07-Jun-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
b038ba66a166fb264ca121632f447712e0973b5b	14-Aug-2014	Dave Allison <dallison@google.com>	Revert "Revert "Reduce stack usage for overflow checks"" Fixes stack protection issue. Fixes mac build issue. This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5. Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
4cf00ba324f5f6884059796a6ba41937f32e1844	14-Aug-2014	Dave Allison <dallison@google.com>	Revert "Reduce stack usage for overflow checks" This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429. Change-Id: I282a048994fcd130fe73842b16c21680053c592f
03c9785a8a6d712775cf406c4371d0227c44148f	14-Aug-2014	Dave Allison <dallison@google.com>	Revert "Revert "Reduce stack usage for overflow checks"" Fixes stack protection issue. Fixes mac build issue. This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5. Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
83b1940e6482b9d8feba5c492507735686650ea5	14-Aug-2014	Dave Allison <dallison@google.com>	Revert "Reduce stack usage for overflow checks" This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429. Change-Id: I282a048994fcd130fe73842b16c21680053c592f
63c051a540e6dfc806f656b88ac3a63e99395429	26-Jul-2014	Dave Allison <dallison@google.com>	Reduce stack usage for overflow checks This reduces the stack space reserved for overflow checks to 12K, split into an 8K gap and a 4K protected region. GC needs over 8K when running in a stack overflow situation. Also prevents signal runaway by detecting a signal inside code that resulted from a signal handler invokation. And adds a max signal count to the SignalTest to prevent it running forever. Also reduces the number of iterations for the InterfaceTest as this was taking (almost) forever with the --trace option on run-test. Bug: 15435566 Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694 Conflicts: compiler/optimizing/code_generator_x86_64.cc runtime/arch/x86/fault_handler_x86.cc runtime/arch/x86_64/quick_entrypoints_x86_64.S
648d7112609dd19c38131b3e71c37bcbbd19d11e	26-Jul-2014	Dave Allison <dallison@google.com>	Reduce stack usage for overflow checks This reduces the stack space reserved for overflow checks to 12K, split into an 8K gap and a 4K protected region. GC needs over 8K when running in a stack overflow situation. Also prevents signal runaway by detecting a signal inside code that resulted from a signal handler invokation. And adds a max signal count to the SignalTest to prevent it running forever. Also reduces the number of iterations for the InterfaceTest as this was taking (almost) forever with the --trace option on run-test. Bug: 15435566 Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694
8c18c2aaedb171f9b03ec49c94b0e33449dc411b	06-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Bug: 16241558 (cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13) Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
48971b3242e5126bcd800cc9c68df64596b43d13	06-Aug-2014	Andreas Gampe <agampe@google.com>	ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
0f45f22eb3c52f0ece4c56989180e79c6680d825	15-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Bug: 16256184 (cherry picked from commit 7ea6f79bbddd69d5db86a8656a31aaaf64ae2582) Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
7ea6f79bbddd69d5db86a8656a31aaaf64ae2582	15-Jul-2014	Andreas Gampe <agampe@google.com>	ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
147eb41b53729ec8d5c188d1cac90964a51afb8a	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
69dfe51b684dd9d510dbcb63295fe180f998efde	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
48f5c47907654350ce30a8dfdda0e977f5d3d39f	27-Jun-2014	Hans Boehm <hboehm@google.com>	Replace memory barriers to better reflect Java needs. Replaces barriers that enforce ordering of one access type (e.g. Load) with respect to another (e.g. store) with more general ones that better reflect both Java requirements and actual hardware barrier/fence instructions. The old code was inconsistent and unclear about which barriers implied which others. Sometimes multiple barriers were generated and then eliminated; sometimes it was assumed that certain barriers implied others. The new barriers closely parallel those in C++11, though, for now, we use something closer to the old naming. Bug: 14685856 Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e	10-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
0025a86411145eb7cd4971f9234fc21c7b4aced1	11-Jul-2014	Nicolas Geoffray <ngeoffray@google.com>	Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
de68676b24f61a55adc0b22fe828f036a5925c41	24-Jun-2014	Andreas Gampe <agampe@google.com>	Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
3c12c512faf6837844d5465b23b9410889e5eb11	24-Jun-2014	Andreas Gampe <agampe@google.com>	Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d	23-Jun-2014	Andreas Gampe <agampe@google.com>	ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
5655e84e8d71697d8ef3ea901a0b853af42c559e	18-Jun-2014	Andreas Gampe <agampe@google.com>	ART: Implicit checks in the compiler are independent from Runtime When cross-compiling, those flags are independent. This is an initial CL that helps bypass fatal failures when cross-compiling, as not all architectures support (and have turned on) implicit checks. The actual transport for the target architecture when it is different from the runtime needs to be implemented in a follow-up CL. Bug: 15703710 Change-Id: Idc881a9a4abfd38643b862a491a5af9b8841f693
7cd26f355ba83be75b72ed628ed5ee84a3245c4f	19-Jun-2014	Andreas Gampe <agampe@google.com>	ART: Target-dependent stack overflow, less check elision Refactor the separate stack overflow reserved sizes from thread.h into instruction_set.h and make sure they're used in the compiler. Refactor the decision on when to elide stack overflow checks: especially with large interpreter stack frames, it is not a good idea to elide checks when the frame size is even close to the reserved size. Currently enforce checks when the frame size is >= 2KB, but make sure that frame sizes 1KB and below will elide the checks (number from experience). Bug: 15728765 Change-Id: I016bfd3d8218170cbccbd123ed5e2203db167c06
8dea81ca9c0201ceaa88086b927a5838a06a3e69	06-Jun-2014	Vladimir Marko <vmarko@google.com>	Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
576ca0cd692c0b6ae70e776de91015b8ff000a08	07-Jun-2014	Ian Rogers <irogers@google.com>	Reduce header files including header files. Main focus is getting heap.h out of runtime.h. Change-Id: I8d13dce8512816db2820a27b24f5866cc871a04b
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879	01-Jun-2014	buzbee <buzbee@google.com>	Quick compiler: reference cleanup For 32-bit targets, object references are 32 bits wide both in Dalvik virtual registers and in core physical registers. Because of this, object references and non-floating point values were both handled as if they had the same register class (kCoreReg). However, for 64-bit systems, references are 32 bits in Dalvik vregs, but 64 bits in physical registers. Although the same underlying physical core registers will still be used for object reference and non-float values, different register class views will be used to represent them. For example, an object reference in arm64 might be held in x3 at some point, while the same underlying physical register, w3, would be used to hold a 32-bit int. This CL breaks apart the handling of object reference and non-float values to allow the proper register class (or register view) to be used. A new register class, kRefReg, is introduced which will map to a 32-bit core register on 32-bit targets, and 64-bit core registers on 64-bit targets. From this point on, object references should be allocated registers in the kRefReg class rather than kCoreReg. Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
fe8cf8b1c1b4af0f8b4bb639576f7a5fc59f52ea	15-May-2014	Bill Buzbee <buzbee@google.com>	Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. Similarly, for medium-large frames r12 could get clobbered during frame creation. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Cherry-pick of internal commit If5b30f729e31d86db604045dd7581fd4626e0b55 Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
b14329f90f725af0f67c45dfcb94933a426d63ce	15-May-2014	Andreas Gampe <agampe@google.com>	ART: Fix MonitorExit code on ARM We do not emit barriers on non-SMP systems. But on ARM, we have places that need to conditionally execute, which is done through an IT instruction. The guide of said instruction thus changes between SMP and non-SMP systems. To cleanly approach this, change the API so that GenMemBarrier returns whether it generated an instruction. ARM will have to query the result and update any dependent IT. Throw a build system error if TARGET_CPU_SMP is not set. Fix runtime/Android.mk to work with new multilib host. Bug: 14989275 Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
56e86eaf73eb3efa029f2dd53b2d21e3597d8e5f	15-May-2014	Bill Buzbee <buzbee@google.com>	Revert "Revert "Quick Compiler: fix Arm cts failures"" It turns out the medium-large frame explicit stack overflow check was also broken in a similar way: r12 was live going into the frame extension, but the frame extension code sometimes needs a free temp. This reverts commit 9cf44af1a223f905457688931317a4e4cb086a84. Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
9cf44af1a223f905457688931317a4e4cb086a84	15-May-2014	Bill Buzbee <buzbee@google.com>	Revert "Quick Compiler: fix Arm cts failures" Error detected on further testing. This reverts commit 06a4809f271c44ec1491e0b07ae9974aa35bc8ad. Change-Id: Ia7b6b463f6422abac432f1a9484e4e080d003148
06a4809f271c44ec1491e0b07ae9974aa35bc8ad	14-May-2014	buzbee <buzbee@google.com>	Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Change-Id: I2c5074c9570b022af680f472deac9fe72a2e827e
9b9dec8bbcb812315eb0b68b3465c6c567f09527	14-May-2014	Andreas Gampe <agampe@google.com>	ART: Fix ARM dmb placement in monitor-exit This moves the dmb in quick-compiled monitor-exit before the str perfoming the unlock. Change-Id: I231f98ff21eb7bac45b4a1b7ff57316deeb858cc
5cd33753b96d92c03e3cb10cb802e68fb6ef2f21	16-Apr-2014	Dave Allison <dallison@google.com>	Handle implicit stack overflow without affecting stack walks This changes the way in which implicit stack overflows are handled to satisfy concerns about changes to the stack walk code. Instead of creating a gap in the stack and checking for it in the stack walker, use the ManagedStack infrastructure to concoct an invisible gap that will never be seen by a stack walk. Also, this uses madvise to tell the kernel that the main stack's protected region will probably never be accessed, and instead of using memset to map the pages in, use memcpy to read from them. This will save 32K on the main stack. Also adds a 'signals' verbosity level as per a review request. Bug: 14066862 Change-Id: I5257305feeaea241d11e6aa6f021d2a81da20b81
091cc408e9dc87e60fb64c61e186bea568fc3d3a	31-Mar-2014	buzbee <buzbee@google.com>	Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
6ffcfa04ebb2660e238742a6000f5ccebdd5df15	25-Apr-2014	Mingyao Yang <mingyao@google.com>	Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb	19-Apr-2014	buzbee <buzbee@google.com>	Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b	10-Apr-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
f9487c039efb4112616d438593a2ab02792e0304	09-Apr-2014	Dave Allison <dallison@google.com>	Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
081f73e888b3c246cf7635db37b7f1105cf1a2ff	07-Apr-2014	Dave Allison <dallison@google.com>	Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
754ddad084ccb610d0cf486f6131bdc69bae5bc6	19-Feb-2014	Dave Allison <dallison@google.com>	Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
3da67a558f1fd3d8a157d8044d521753f3f99ac8	03-Apr-2014	Dave Allison <dallison@google.com>	Add OpEndIT() for marking the end of OpIT blocks In ARM we need to prevent code motion to the inside of an IT block. This was done using a GenBarrier() to mark the end, but it wasn't obvious that this is what was happening. This CL adds an explicit OpEndIT() that takes the LIR of the OpIT for future checks. Bug: 13751744 Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
dd7624d2b9e599d57762d12031b10b89defc9807	15-Mar-2014	Ian Rogers <irogers@google.com>	Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
f943914730db8ad2ff03d49a2cacd31885d08fd7	27-Mar-2014	Dave Allison <dallison@google.com>	Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
05a48b1f8e62564abb7c2fe674e3234d5861647f	01-Apr-2014	Mathieu Chartier <mathieuc@google.com>	Fix stack overflow slow path error. The frame size without spill was being passed into the slow path instead of the spill size. This was incorrect since only the spills will have been pushed at the point of the overflow check. Also addressed an other comment. Change-Id: Ic6e455122473a8f796b291d71f945bcf72788662
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6	07-Mar-2014	buzbee <buzbee@google.com>	Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
0d507d1e0441e6bd6f3affca3a60774ea920f317	19-Mar-2014	Mathieu Chartier <mathieuc@google.com>	Optimize stack overflow handling. We now subtract the frame size from the stack pointer for methods which have a frame smaller than a certain size. Also changed code to use slow paths instead of launchpads. Delete kStackOverflow launchpad since it is no longer needed. ARM optimizations: One less move per stack overflow check (without fault handler for stack overflows). Use ldr pc instead of ldr r12, b r12. Code size (boot.oat): Before: 58405348 After: 57803236 TODO: X86 doesn't have the case for large frames. This could case an incoming signal to go past the end of the stack (unlikely however). Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
b373e091eac39b1a79c11f2dcbd610af01e9e8a9	21-Feb-2014	Dave Allison <dallison@google.com>	Implicit null/suspend checks (oat version bump) This adds the ability to use SEGV signals to throw NullPointerException exceptions from Java code rather than having the compiler generate explicit comparisons and branches. It does this by using sigaction to trap SIGSEGV and when triggered makes sure it's in compiled code and if so, sets the return address to the entry point to throw the exception. It also uses this signal mechanism to determine whether to check for thread suspension. Instead of the compiler generating calls to a function to check for threads being suspended, the compiler will now load indirect via an address in the TLS area. To trigger a suspend, the contents of this address are changed from something valid to 0. A SIGSEGV will occur and the handler will check for a valid instruction pattern before invoking the thread suspension check code. If a user program taps SIGSEGV it will prevent our signal handler working. This will cause a failure in the runtime. There are two signal handlers at present. You can control them individually using the flags -implicit-checks: on the runtime command line. This takes a string parameter, a comma separated set of strings. Each can be one of: none switch off null null pointer checks suspend suspend checks all all checks So to switch only suspend checks on, pass: -implicit-checks:suspend There is also -explicit-checks to provide the reverse once we change the default. For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar The default is -implicit-checks:none There is also a property 'dalvik.vm.implicit_checks' whose value is the same string as the command option. The default is 'none'. For example to switch on null checks using the option: setprop dalvik.vm.implicit_checks null It only works for ARM right now. Bumps OAT version number due to change to Thread offsets. Bug: 13121132 Change-Id: If743849138162f3c7c44a523247e413785677370
83cc7ae96d4176533dd0391a1591d321b0a87f4f	12-Feb-2014	Vladimir Marko <vmarko@google.com>	Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2	28-Feb-2014	Bill Buzbee <buzbee@android.com>	Revert "Revert "Rework Quick compiler's register handling"" This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace. Ready. Fixed the original type, plus some mechanical changes for rebasing. Still needs additional testing, but the problem with the original CL appears to have been a typo in the definition of the x86 double return template RegLocation. Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
dbb8c49d540edd2a39076093163c7218f03aa502	28-Feb-2014	Vladimir Marko <vmarko@google.com>	Remove non-existent ARM insn kThumb2SubsRRI12. For kOpSub/kOpAdd, prefer modified immediate encodings because they set flags. Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
86ec520fc8b696ed6f164d7b756009ecd6e4aace	26-Feb-2014	Bill Buzbee <buzbee@android.com>	Revert "Rework Quick compiler's register handling" This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c. Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
2c1ed456dcdb027d097825dd98dbe48c71599b6c	20-Feb-2014	buzbee <buzbee@google.com>	Rework Quick compiler's register handling For historical reasons, the Quick backend found it convenient to consider all 64-bit Dalvik values held in registers to be contained in a pair of 32-bit registers. Though this worked well for ARM (with double-precision registers also treated as a pair of 32-bit single-precision registers) it doesn't play well with other targets. And, it is somewhat problematic for 64-bit architectures. This is the first of several CLs that will rework the way the Quick backend deals with physical registers. The goal is to eliminate the "64-bit value backed with 32-bit register pair" requirement from the target-indendent portions of the backend and support 64-bit registers throughout. The key RegLocation struct, which describes the location of Dalvik virtual register & register pairs, previously contained fields for high and low physical registers. The low_reg and high_reg fields are being replaced with a new type: RegStorage. There will be a single instance of RegStorage for each RegLocation. Note that RegStorage does not increase the space used. It is 16 bits wide, the same as the sum of the 8-bit low_reg and high_reg fields. At a target-independent level, it will describe whether the physical register storage associated with the Dalvik value is a single 32 bit, single 64 bit, pair of 32 bit or vector. The actual register number encoding is left to the target-dependent code layer. Because physical register handling is pervasive throughout the backend, this restructuring necessarily involves large CLs with lots of changes. I'm going to roll these out in stages, and attempt to segregate the CLs with largely mechanical changes from those which restructure or rework the logic. This CL is of the mechanical change variety - it replaces low_reg and high_reg from RegLocation and introduces RegStorage. It also includes a lot of new code (such as many calls to GetReg()) that should go away in upcoming CLs. The tentative plan for the subsequent CLs is: o Rework standard register utilities such as AllocReg() and FreeReg() to use RegStorage instead of ints. o Rework the target-independent GenXXX, OpXXX, LoadValue, StoreValue, etc. routines to take RegStorage rather than int register encodings. o Take advantage of the vector representation and eliminate the current vector field in RegLocation. o Replace the "wide" variants of codegen utilities that take low_reg/high_reg pairs with versions that use RegStorage. o Add 64-bit register target independent codegen utilities where possible, and where not virtualize with 32-bit general register and 64-bit general register variants in the target dependent layer. o Expand/rework the LIR def/use flags to allow for more registers (currently, we lose out on 16 MIPS floating point regs as well as ARM's D16..D31 for lack of space in the masks). o [Possibly] move the float/non-float determination of a register from the target-dependent encoding to RegStorage. In other words, replace IsFpReg(register_encoding_bits). At the end of the day, all code in the target independent layer should be using RegStorage, as should much of the target dependent layer. Ideally, we won't be using the physical register number encoding extracted from RegStorage (i.e. GetReg()) until the NewLIRx() layer. Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
3bc01748ef1c3e43361bdf520947a9d656658bf8	06-Feb-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	GenSpecialCase support for x86 Moved GenSpecialCase from being ARM specific to common code to allow it to be used by x86 quick as well. Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
80b7f4f217958df6950291a5ae861249bf5b943d	13-Feb-2014	Vladimir Marko <vmarko@google.com>	am 47c42cae: am 76559681: Merge "Generate ARM special methods from InlineMethod data." * commit '47c42caef35bdc29229b2714d78e48fbd7dc57e6': Generate ARM special methods from InlineMethod data.
502c2a84888b7da075049dcaaeb0156602304f65	06-Feb-2014	Vladimir Marko <vmarko@google.com>	Generate ARM special methods from InlineMethod data. Change-Id: I204b01660a1e515879524018d1371e31f41da59b
c9bf407643329fee7eb2603fdace46eebf618cc6	10-Feb-2014	Vladimir Marko <vmarko@google.com>	Fix special getter/setter generation. Change-Id: I381618bdcc46c51b50e94042f332db99c3a71a38
2bc47809febcf36369dd40877b8226318642b428	10-Feb-2014	Vladimir Marko <vmarko@google.com>	Revert "Revert "Check FastInstance() early for special getters and setters."" This reverts commit 632e458dc267fadfb8120be3ab02701e09e64875. Change-Id: I5098c41ee84fbbb39397133a7ecfd367fecebe42
632e458dc267fadfb8120be3ab02701e09e64875	08-Feb-2014	Ian Rogers <irogers@google.com>	Revert "Check FastInstance() early for special getters and setters." This reverts commit 5dc5727261e87ba8a418e2d0e970c75f67e4ab79. Change-Id: I3299c8ca5c3ce3f2de994bab61ea16a734f1de33
f33ffde784fdbbc123dc7d2a5470f2b8ff6c3de4	08-Feb-2014	Ian Rogers <irogers@google.com>	Revert "Generate ARM special methods from InlineMethod data." This reverts commit 1ca62346583210f64092a44a74b5947d51826e7a. Change-Id: I9589fc6ff15fbea36edb99f6d336040a8e0b4e71
1ca62346583210f64092a44a74b5947d51826e7a	06-Feb-2014	Vladimir Marko <vmarko@google.com>	Generate ARM special methods from InlineMethod data. Change-Id: Icd3af7fae67f9bd33d218056509a23549d1eeba2
5dc5727261e87ba8a418e2d0e970c75f67e4ab79	05-Feb-2014	Vladimir Marko <vmarko@google.com>	Check FastInstance() early for special getters and setters. Perform the FastInstance() check for getters and setters when they are detected by the inliner. This will help avoid the FastInstance() check for inlining. We also record the field offset and whether the field is volatile and whether the method is static for use when inlining or generating the special accessors. Change-Id: I3f832fc9ae263883b8a984be89a3b7793398b55a
58af1f9385742f70aca4fcb5e13aba53b8be2ef4	19-Dec-2013	Vladimir Marko <vmarko@google.com>	Clean up usage of carry flag condition codes. On X86, kCondUlt and kCondUge are bound to CS and CC, respectively, while on ARM it's the other way around. The explicit binding in ConditionCode was wrong and misleading and could lead to subtle bugs. Therefore, we detach those constants and clean up usage. The CS and CC conditions are now effectively unused but we keep them around as they may eventually be useful. And some minor cleanup and comments. Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
5816ed48bc339c983b40dc493e96b97821ce7966	27-Nov-2013	Vladimir Marko <vmarko@google.com>	Detect special methods at the end of verification. This moves special method handling to method inliner and prepares for eventual inlining of these methods. Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
31c2aac7137b69d5622eea09597500731fbee2ef	09-Dec-2013	Vladimir Marko <vmarko@google.com>	Rename ClobberCalleeSave to Caller, fix it for x86. Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
0d82948094d9a198e01aa95f64012bdedd5b6fc9	12-Oct-2013	buzbee <buzbee@google.com>	64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
d9c4fc94fa618617f94e1de9af5f034549100753	02-Oct-2013	Ian Rogers <irogers@google.com>	Inflate contended lock word by suspending owner. Bug 6961405. Don't inflate monitors for Notify and NotifyAll. Tidy lock word, handle recursive lock case alongside unlocked case and move assembly out of line (except for ARM quick). Also handle null in out-of-line assembly as the test is quick and the enter/exit code is already a safepoint. To gain ownership of a monitor on behalf of another thread, monitor contenders must not hold the monitor_lock_, so they wait on a condition variable. Reduce size of per mutex contention log. Be consistent in calling thin lock thread ids just thread ids. Fix potential thread death races caused by the use of FindThreadByThreadId, make it invariant that returned threads are either self or suspended now. Code size reduction on ARM boot.oat 0.2%. Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%, nexus 4 speedup 2.09% on DeltaBlue. Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
252254b130067cd7a5071865e793966871ae0246	09-Sep-2013	buzbee <buzbee@google.com>	More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so only because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
9b297bfc588c7d38efd12a6f38cd2710fc513ee3	06-Sep-2013	Ian Rogers <irogers@google.com>	Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db	25-Aug-2013	Mathieu Chartier <mathieuc@google.com>	New arena memory allocator. Before we were creating arenas for each method. The issue with doing this is that we needed to memset each memory allocation. This can be improved if you start out with arenas that contain all zeroed memory and recycle them for each method. When you give memory back to the arena pool you do a single memset to zero out all of the memory that you used. Always inlined the fast path of the allocation code. Removed the "zero" parameter since the new arena allocator always returns zeroed memory. Host dex2oat time on target oat apks (2 samples each). Before: real 1m11.958s user 4m34.020s sys 1m28.570s After: real 1m9.690s user 4m17.670s sys 1m23.960s Target device dex2oat samples (Mako, Thinkfree.apk): Without new arena allocator: 0m26.47s real 0m54.60s user 0m25.85s system 0m25.91s real 0m54.39s user 0m26.69s system 0m26.61s real 0m53.77s user 0m27.35s system 0m26.33s real 0m54.90s user 0m25.30s system 0m26.34s real 0m53.94s user 0m27.23s system With new arena allocator: 0m25.02s real 0m54.46s user 0m19.94s system 0m25.17s real 0m55.06s user 0m20.72s system 0m24.85s real 0m55.14s user 0m19.30s system 0m24.59s real 0m54.02s user 0m20.07s system 0m25.06s real 0m55.00s user 0m20.42s system Correctness of Thinkfree.apk.oat verified by diffing both of the oat files. Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
468532ea115657709bc32ee498e701a4c71762d4	05-Aug-2013	Ian Rogers <irogers@google.com>	Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
848871b4d8481229c32e0d048a9856e5a9a17ef9	05-Aug-2013	Ian Rogers <irogers@google.com>	Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
834b394ee759ed31c5371d8093d7cd8cd90014a8	31-Jul-2013	Brian Carlstrom <bdc@google.com>	Merge remote-tracking branch 'goog/dalvik-dev' into merge-art-to-dalvik-dev Change-Id: I323e9e8c29c3e39d50d9aba93121b26266c52a46
7655f29fabc0a12765de828914a18314382e5a35	29-Jul-2013	Ian Rogers <irogers@google.com>	Portable refactorings. Separate quick from portable entrypoints. Move architectural dependencies into arch. Change-Id: I9adbc0a9782e2959fdc3308215f01e3107632b7c
166db04e259ca51838c311891598664deeed85ad	26-Jul-2013	Ian Rogers <irogers@google.com>	Move assembler out of runtime into compiler/utils. Other directory layout bits of clean up. There is still work to separate quick and portable in some files (e.g. argument visitor, proxy..). Change-Id: If8fecffda8ba5c4c47a035f0c622c538c6b58351
7934ac288acfb2552bb0b06ec1f61e5820d924a4	26-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d	19-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
2ce745c06271d5223d57dbf08117b20d5b60694a	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
7940e44f4517de5e2634a7e07d58d0fb26160513	12-Jul-2013	Brian Carlstrom <bdc@google.com>	Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81