3d21bdf8894e780d349c481e5c9e29fe1556051c |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
41b175aba41c9365a1c53b8a1afbd17129c87c14 |
|
19-May-2015 |
Vladimir Marko <vmarko@google.com> |
ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
|
848f70a3d73833fc1bf3032a9ff6812e429661d9 |
|
15-Jan-2014 |
Jeff Hao <jeffhao@google.com> |
Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
|
2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
1109fb3cacc8bb667979780c2b4b12ce5bb64549 |
|
07-Apr-2015 |
David Srbecky <dsrbecky@google.com> |
Implement CFI for Quick. CFI is necessary for stack unwinding in gdb, lldb, and libunwind. Change-Id: Ic3b84c9dc91c4bae80e27cda02190f3274e95ae8
|
cc23481b66fd1f2b459d82da4852073e32f033aa |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
|
e5c76c515a481074aaa6b869aa16490a47ba98bc |
|
06-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
PC-relative loads from dex cache arrays for arm. Change-Id: Ic25df4b51a901ff1d2ca356b5eec71d4acc5d9b7
|
6f7158927fee233255f8e96719c374694b10cad3 |
|
30-Mar-2015 |
David Srbecky <dsrbecky@google.com> |
Write .debug_line section using the new DWARF library. Also simplify dex to java mapping and handle mapping in prologues and epilogues. Change-Id: I410f06024580f2a8788f2c93fe9bca132805029a
|
20f85597828194c12be10d3a927999def066555e |
|
19-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Fixed layout for dex caches in boot image. Define a fixed layout for dex cache arrays (type, method, string and field arrays) for dex caches in the boot image. This gives those arrays fixed offsets from the boot image code and allows PC-relative addressing of their elements. Use the PC-relative load on arm64 for relevant instructions, i.e. invoke-static, invoke-direct, const-string, const-class, check-cast and instance-of. This reduces the arm64 boot.oat on Nexus 9 by 1.1MiB. This CL provides the infrastructure and shows on the arm64 the gains that we can achieve by having fixed dex cache arrays' layout. To fully use this for the boot images, we need to implement the PC-relative addressing for other architectures. To achieve similar gains for apps, we need to move the dex cache arrays to a .bss section of the oat file. These changes will be implemented in subsequent CLs. (Also remove some compiler_driver.h dependencies to reduce incremental build times.) Change-Id: Ib1859fa4452d01d983fd92ae22b611f45a85d69b
|
f6737f7ed741b15cfd60c2530dab69f897540735 |
|
23-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
|
0b40ecf156e309aa17c72a28cd1b0237dbfb8746 |
|
20-Mar-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: Clean up slow paths. Change-Id: I278d42be77b02778c4a419ae9024b37929915b64
|
e15ea086439b41a805d164d2beb07b4ba96aaa97 |
|
10-Feb-2015 |
Hiroshi Yamauchi <yamauchi@google.com> |
Reserve bits in the lock word for read barriers. This prepares for the CC collector to use the standard object header model by storing the read barrier state in the lock word. Bug: 19355854 Bug: 12687968 Change-Id: Ia7585662dd2cebf0479a3e74f734afe5059fb70f
|
6ce3eba0f2e6e505ed408cdc40d213c8a512238d |
|
16-Feb-2015 |
Vladimir Marko <vmarko@google.com> |
Add suspend checks to special methods. Generate suspend checks at the beginning of special methods. If we need to call to runtime, go to the slow path where we create a simplified but valid frame, spill all arguments, call art_quick_test_suspend, restore necessary arguments and return back to the fast path. This keeps the fast path overhead to a minimum. Bug: 19245639 Change-Id: I3de5aee783943941322a49c4cf2c4c94411dbaa2
|
72f53af0307b9109a1cfc0671675ce5d45c66d3a |
|
12-Nov-2014 |
Chao-ying Fu <chao-ying.fu@intel.com> |
ART: Remove MIRGraph::dex_pc_to_block_map_ This patch removes MIRGraph::dex_pc_to_block_map_, adds a local variable dex_pc_to_block_map inside MIRGraph::InlineMethod(), and updates several functions to pass dex_pc_to_block_map. The goal is to limit the scope of dex_pc_to_block_map and the usage of FindBlock, so that various compiler optimizations cannot rely on dex pc to look up basic blocks to avoid duplicated dex pc issues. Also, this patch changes quick targets to use successor blocks for switch case target generation at Mir2Lir::InstallSwitchTables(). Change-Id: I9f571efebd2706b4e1606279bd61f3b406ecd1c4 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
7e499925f8b4da46ae51040e9322690f3df992e6 |
|
06-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Remove LowestSetBit and IsPowerOfTwo Remove those functions from Mir2Lir and replace with functionality from utils.h. Change-Id: Ieb67092b22d5d460b5241c7c7931c15b9faf2815
|
9d5c25acdd1e9635fde8f8bf52a126b4d371dabd |
|
26-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Quick: Use 16-bit Thumb2 PUSH/POP when possible. Generate correct PUSH/POP in Gen{Entry,Exit}Sequence() to avoid extra processing during insn fixup. Change-Id: I396168e2a42faee6980d40779c7de9657531867b
|
bf535be514570fc33fc0a6347a87dcd9097d9bfd |
|
19-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Add card mark to filled-new-array. Bug: 18032332 Change-Id: I35576b27f9115e4d0b02a11afc5e483b9e93a04a
|
2d7210188805292e463be4bcf7a133b654d7e0ea |
|
10-Nov-2014 |
Mathieu Chartier <mathieuc@google.com> |
Change 64 bit ArtMethod fields to be pointer sized Changed the 64 bit entrypoint and gc map fields in ArtMethod to be pointer sized. This saves a large amount of memory on 32 bit systems. Reduces ArtMethod size by 16 bytes on 32 bit. Total number of ArtMethod on low memory mako: 169957 Image size: 49203 methods -> 787248 image size reduction. Zygote space size: 1070 methods -> 17120 size reduction. App methods: ~120k -> 2 MB savings. Savings per app on low memory mako: 125K+ per app (less active apps -> more image methods per app). Savings depend on how often the shared methods are on dirty pages vs shared. TODO in another CL, delete gc map field from ArtMethod since we should be able to get it from the Oat method header. Bug: 17643507 Change-Id: Ie9508f05907a9f693882d4d32a564460bf273ee8 (cherry picked from commit e832e64a7e82d7f72aedbd7d798fb929d458ee8f)
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
832336b3c9eb892045a8de1bb12c9361112ca3c5 |
|
09-Oct-2014 |
Ian Rogers <irogers@google.com> |
Don't copy fill array data to quick literal pool. Currently quick copies the fill array data from the dex file to the literal pool. It then has to go through hoops to pass this PC relative address down to out-of-line code. Instead, pass the offset of the table to the out-of-line code and use the CodeItem data associated with the ArtMethod. This reduces the size of oat code while greatly simplifying it. Unify the FillArrayData implementation in quick, portable and the interpreters. Change-Id: I9c6971cf46285fbf197856627368c0185fdc98ca
|
f4da675bbc4615c5f854c81964cac9dd1153baea |
|
01-Aug-2014 |
Vladimir Marko <vmarko@google.com> |
Implement method calls using relative BL on ARM. Store the linker patches with each CompiledMethod instead of keeping them in CompilerDriver. Reorganize oat file creation to apply the patches as we're writing the method code. Add framework for platform-specific relative call patches in the OatWriter. Implement relative call patches for ARM. Change-Id: Ie2effb3d92b61ac8f356140eba09dc37d62290f8
|
e39c54ea575ec710d5e84277fcdcc049f8acb3c9 |
|
22-Sep-2014 |
Vladimir Marko <vmarko@google.com> |
Deprecate GrowableArray, use ArenaVector instead. Purge GrowableArray from Quick and Portable. Remove GrowableArray<T>::Iterator. Change-Id: I92157d3a6ea5975f295662809585b2dc15caa1c6
|
8d0d03e24325463f0060abfd05dba5598044e9b1 |
|
07-Jun-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
ART: Change temporaries to positive names Changes compiler temporaries to have positive names. The numbering now puts them above the code VRs (locals + ins, in that order). The patch also introduces APIs to query the number of temporaries, locals and ins. The compiler temp infrastructure suffered from several issues which are also addressed by this patch: -There is no longer a queue of compiler temps. This would be polluted with Method* when post opts were called multiple times. -Sanity checks have been added to allow requesting of temps from BE and to prevent temps after frame is committed. -None of the structures holding temps can overflow because they are allocated to allow holding maximum temps. Thus temps can be requested by BE with no problem. -Since the queue of compiler temps is no longer maintained, it is no longer possible to refer to a temp that has invalid ssa (because it was requested before ssa was run). -The BE can now request temps after all ME allocations and it is guaranteed to actually receive them. -ME temps are now treated like normal VRs in all cases with no special handling. Only the BE temps are handled specially because there are no references to them from MIRs. -Deprecated and removed several fields in CompilationUnit that saved register information and updated callsites to call the new interface from MIRGraph. Change-Id: Ia8b1fec9384a1a83017800a59e5b0498dfb2698c Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
b038ba66a166fb264ca121632f447712e0973b5b |
|
14-Aug-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Reduce stack usage for overflow checks"" Fixes stack protection issue. Fixes mac build issue. This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5. Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
|
4cf00ba324f5f6884059796a6ba41937f32e1844 |
|
14-Aug-2014 |
Dave Allison <dallison@google.com> |
Revert "Reduce stack usage for overflow checks" This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429. Change-Id: I282a048994fcd130fe73842b16c21680053c592f
|
03c9785a8a6d712775cf406c4371d0227c44148f |
|
14-Aug-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Reduce stack usage for overflow checks"" Fixes stack protection issue. Fixes mac build issue. This reverts commit 83b1940e6482b9d8feba5c492507735686650ea5. Change-Id: I7ba17252882b23a740bcda2ea94aacf398255406
|
83b1940e6482b9d8feba5c492507735686650ea5 |
|
14-Aug-2014 |
Dave Allison <dallison@google.com> |
Revert "Reduce stack usage for overflow checks" This reverts commit 63c051a540e6dfc806f656b88ac3a63e99395429. Change-Id: I282a048994fcd130fe73842b16c21680053c592f
|
63c051a540e6dfc806f656b88ac3a63e99395429 |
|
26-Jul-2014 |
Dave Allison <dallison@google.com> |
Reduce stack usage for overflow checks This reduces the stack space reserved for overflow checks to 12K, split into an 8K gap and a 4K protected region. GC needs over 8K when running in a stack overflow situation. Also prevents signal runaway by detecting a signal inside code that resulted from a signal handler invokation. And adds a max signal count to the SignalTest to prevent it running forever. Also reduces the number of iterations for the InterfaceTest as this was taking (almost) forever with the --trace option on run-test. Bug: 15435566 Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694 Conflicts: compiler/optimizing/code_generator_x86_64.cc runtime/arch/x86/fault_handler_x86.cc runtime/arch/x86_64/quick_entrypoints_x86_64.S
|
648d7112609dd19c38131b3e71c37bcbbd19d11e |
|
26-Jul-2014 |
Dave Allison <dallison@google.com> |
Reduce stack usage for overflow checks This reduces the stack space reserved for overflow checks to 12K, split into an 8K gap and a 4K protected region. GC needs over 8K when running in a stack overflow situation. Also prevents signal runaway by detecting a signal inside code that resulted from a signal handler invokation. And adds a max signal count to the SignalTest to prevent it running forever. Also reduces the number of iterations for the InterfaceTest as this was taking (almost) forever with the --trace option on run-test. Bug: 15435566 Change-Id: Id4fd46f22d52d42a9eb431ca07948673e8fda694
|
8c18c2aaedb171f9b03ec49c94b0e33449dc411b |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Bug: 16241558 (cherry picked from commit 48971b3242e5126bcd800cc9c68df64596b43d13) Change-Id: I0bb3071b8676523e90e0258e9b0e3fd69c1237f4
|
48971b3242e5126bcd800cc9c68df64596b43d13 |
|
06-Aug-2014 |
Andreas Gampe <agampe@google.com> |
ART: Generate chained compare-and-branch for short switches Refactor Mir2Lir to generate chained compare-and-branch sequences for short switches on all architectures. Change-Id: Ie2a572ae69d462ba68a119e9fb93ae538cddd08f
|
0f45f22eb3c52f0ece4c56989180e79c6680d825 |
|
15-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Bug: 16256184 (cherry picked from commit 7ea6f79bbddd69d5db86a8656a31aaaf64ae2582) Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
|
7ea6f79bbddd69d5db86a8656a31aaaf64ae2582 |
|
15-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Throw StackOverflowError in native code Initialize stack-overflow errors in native code to be able to reduce the preserved area size of the stack. Includes a refactoring away from constexpr in instruction_set.h to allow for easy changing of the values. Change-Id: I117cc8485f43da5f0a470f0f5e5b3dc3b5a06246
|
147eb41b53729ec8d5c188d1cac90964a51afb8a |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
|
69dfe51b684dd9d510dbcb63295fe180f998efde |
|
11-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
|
48f5c47907654350ce30a8dfdda0e977f5d3d39f |
|
27-Jun-2014 |
Hans Boehm <hboehm@google.com> |
Replace memory barriers to better reflect Java needs. Replaces barriers that enforce ordering of one access type (e.g. Load) with respect to another (e.g. store) with more general ones that better reflect both Java requirements and actual hardware barrier/fence instructions. The old code was inconsistent and unclear about which barriers implied which others. Sometimes multiple barriers were generated and then eliminated; sometimes it was assumed that certain barriers implied others. The new barriers closely parallel those in C++11, though, for now, we use something closer to the old naming. Bug: 14685856 Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
|
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e |
|
10-Jul-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
|
0025a86411145eb7cd4971f9234fc21c7b4aced1 |
|
11-Jul-2014 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
|
de68676b24f61a55adc0b22fe828f036a5925c41 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
3c12c512faf6837844d5465b23b9410889e5eb11 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d |
|
23-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
5655e84e8d71697d8ef3ea901a0b853af42c559e |
|
18-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Implicit checks in the compiler are independent from Runtime When cross-compiling, those flags are independent. This is an initial CL that helps bypass fatal failures when cross-compiling, as not all architectures support (and have turned on) implicit checks. The actual transport for the target architecture when it is different from the runtime needs to be implemented in a follow-up CL. Bug: 15703710 Change-Id: Idc881a9a4abfd38643b862a491a5af9b8841f693
|
7cd26f355ba83be75b72ed628ed5ee84a3245c4f |
|
19-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Target-dependent stack overflow, less check elision Refactor the separate stack overflow reserved sizes from thread.h into instruction_set.h and make sure they're used in the compiler. Refactor the decision on when to elide stack overflow checks: especially with large interpreter stack frames, it is not a good idea to elide checks when the frame size is even close to the reserved size. Currently enforce checks when the frame size is >= 2KB, but make sure that frame sizes 1KB and below will elide the checks (number from experience). Bug: 15728765 Change-Id: I016bfd3d8218170cbccbd123ed5e2203db167c06
|
8dea81ca9c0201ceaa88086b927a5838a06a3e69 |
|
06-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
576ca0cd692c0b6ae70e776de91015b8ff000a08 |
|
07-Jun-2014 |
Ian Rogers <irogers@google.com> |
Reduce header files including header files. Main focus is getting heap.h out of runtime.h. Change-Id: I8d13dce8512816db2820a27b24f5866cc871a04b
|
a0cd2d701f29e0bc6275f1b13c0edfd4ec391879 |
|
01-Jun-2014 |
buzbee <buzbee@google.com> |
Quick compiler: reference cleanup For 32-bit targets, object references are 32 bits wide both in Dalvik virtual registers and in core physical registers. Because of this, object references and non-floating point values were both handled as if they had the same register class (kCoreReg). However, for 64-bit systems, references are 32 bits in Dalvik vregs, but 64 bits in physical registers. Although the same underlying physical core registers will still be used for object reference and non-float values, different register class views will be used to represent them. For example, an object reference in arm64 might be held in x3 at some point, while the same underlying physical register, w3, would be used to hold a 32-bit int. This CL breaks apart the handling of object reference and non-float values to allow the proper register class (or register view) to be used. A new register class, kRefReg, is introduced which will map to a 32-bit core register on 32-bit targets, and 64-bit core registers on 64-bit targets. From this point on, object references should be allocated registers in the kRefReg class rather than kCoreReg. Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
|
fe8cf8b1c1b4af0f8b4bb639576f7a5fc59f52ea |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. Similarly, for medium-large frames r12 could get clobbered during frame creation. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Cherry-pick of internal commit If5b30f729e31d86db604045dd7581fd4626e0b55 Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
|
b14329f90f725af0f67c45dfcb94933a426d63ce |
|
15-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix MonitorExit code on ARM We do not emit barriers on non-SMP systems. But on ARM, we have places that need to conditionally execute, which is done through an IT instruction. The guide of said instruction thus changes between SMP and non-SMP systems. To cleanly approach this, change the API so that GenMemBarrier returns whether it generated an instruction. ARM will have to query the result and update any dependent IT. Throw a build system error if TARGET_CPU_SMP is not set. Fix runtime/Android.mk to work with new multilib host. Bug: 14989275 Change-Id: I9e611b770e8a1cd4ca19367d7dae0573ec08dc61
|
56e86eaf73eb3efa029f2dd53b2d21e3597d8e5f |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Revert "Revert "Quick Compiler: fix Arm cts failures"" It turns out the medium-large frame explicit stack overflow check was also broken in a similar way: r12 was live going into the frame extension, but the frame extension code sometimes needs a free temp. This reverts commit 9cf44af1a223f905457688931317a4e4cb086a84. Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
|
9cf44af1a223f905457688931317a4e4cb086a84 |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Revert "Quick Compiler: fix Arm cts failures" Error detected on further testing. This reverts commit 06a4809f271c44ec1491e0b07ae9974aa35bc8ad. Change-Id: Ia7b6b463f6422abac432f1a9484e4e080d003148
|
06a4809f271c44ec1491e0b07ae9974aa35bc8ad |
|
14-May-2014 |
buzbee <buzbee@google.com> |
Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Change-Id: I2c5074c9570b022af680f472deac9fe72a2e827e
|
9b9dec8bbcb812315eb0b68b3465c6c567f09527 |
|
14-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Fix ARM dmb placement in monitor-exit This moves the dmb in quick-compiled monitor-exit before the str perfoming the unlock. Change-Id: I231f98ff21eb7bac45b4a1b7ff57316deeb858cc
|
5cd33753b96d92c03e3cb10cb802e68fb6ef2f21 |
|
16-Apr-2014 |
Dave Allison <dallison@google.com> |
Handle implicit stack overflow without affecting stack walks This changes the way in which implicit stack overflows are handled to satisfy concerns about changes to the stack walk code. Instead of creating a gap in the stack and checking for it in the stack walker, use the ManagedStack infrastructure to concoct an invisible gap that will never be seen by a stack walk. Also, this uses madvise to tell the kernel that the main stack's protected region will probably never be accessed, and instead of using memset to map the pages in, use memcpy to read from them. This will save 32K on the main stack. Also adds a 'signals' verbosity level as per a review request. Bug: 14066862 Change-Id: I5257305feeaea241d11e6aa6f021d2a81da20b81
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
6ffcfa04ebb2660e238742a6000f5ccebdd5df15 |
|
25-Apr-2014 |
Mingyao Yang <mingyao@google.com> |
Rewrite suspend test check with LIRSlowPath. Change-Id: I2dc17d079655586bfc588349c7a04afc2c6879af
|
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb |
|
19-Apr-2014 |
buzbee <buzbee@google.com> |
Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b |
|
10-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
f9487c039efb4112616d438593a2ab02792e0304 |
|
09-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
|
081f73e888b3c246cf7635db37b7f1105cf1a2ff |
|
07-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
754ddad084ccb610d0cf486f6131bdc69bae5bc6 |
|
19-Feb-2014 |
Dave Allison <dallison@google.com> |
Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
3da67a558f1fd3d8a157d8044d521753f3f99ac8 |
|
03-Apr-2014 |
Dave Allison <dallison@google.com> |
Add OpEndIT() for marking the end of OpIT blocks In ARM we need to prevent code motion to the inside of an IT block. This was done using a GenBarrier() to mark the end, but it wasn't obvious that this is what was happening. This CL adds an explicit OpEndIT() that takes the LIR of the OpIT for future checks. Bug: 13751744 Change-Id: If41d2adea1f43f11ebb3b72906bd308252ce3d01
|
dd7624d2b9e599d57762d12031b10b89defc9807 |
|
15-Mar-2014 |
Ian Rogers <irogers@google.com> |
Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
|
f943914730db8ad2ff03d49a2cacd31885d08fd7 |
|
27-Mar-2014 |
Dave Allison <dallison@google.com> |
Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
|
05a48b1f8e62564abb7c2fe674e3234d5861647f |
|
01-Apr-2014 |
Mathieu Chartier <mathieuc@google.com> |
Fix stack overflow slow path error. The frame size without spill was being passed into the slow path instead of the spill size. This was incorrect since only the spills will have been pushed at the point of the overflow check. Also addressed an other comment. Change-Id: Ic6e455122473a8f796b291d71f945bcf72788662
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
0d507d1e0441e6bd6f3affca3a60774ea920f317 |
|
19-Mar-2014 |
Mathieu Chartier <mathieuc@google.com> |
Optimize stack overflow handling. We now subtract the frame size from the stack pointer for methods which have a frame smaller than a certain size. Also changed code to use slow paths instead of launchpads. Delete kStackOverflow launchpad since it is no longer needed. ARM optimizations: One less move per stack overflow check (without fault handler for stack overflows). Use ldr pc instead of ldr r12, b r12. Code size (boot.oat): Before: 58405348 After: 57803236 TODO: X86 doesn't have the case for large frames. This could case an incoming signal to go past the end of the stack (unlikely however). Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
|
b373e091eac39b1a79c11f2dcbd610af01e9e8a9 |
|
21-Feb-2014 |
Dave Allison <dallison@google.com> |
Implicit null/suspend checks (oat version bump) This adds the ability to use SEGV signals to throw NullPointerException exceptions from Java code rather than having the compiler generate explicit comparisons and branches. It does this by using sigaction to trap SIGSEGV and when triggered makes sure it's in compiled code and if so, sets the return address to the entry point to throw the exception. It also uses this signal mechanism to determine whether to check for thread suspension. Instead of the compiler generating calls to a function to check for threads being suspended, the compiler will now load indirect via an address in the TLS area. To trigger a suspend, the contents of this address are changed from something valid to 0. A SIGSEGV will occur and the handler will check for a valid instruction pattern before invoking the thread suspension check code. If a user program taps SIGSEGV it will prevent our signal handler working. This will cause a failure in the runtime. There are two signal handlers at present. You can control them individually using the flags -implicit-checks: on the runtime command line. This takes a string parameter, a comma separated set of strings. Each can be one of: none switch off null null pointer checks suspend suspend checks all all checks So to switch only suspend checks on, pass: -implicit-checks:suspend There is also -explicit-checks to provide the reverse once we change the default. For dalvikvm, pass --runtime-arg -implicit-checks:foo,bar The default is -implicit-checks:none There is also a property 'dalvik.vm.implicit_checks' whose value is the same string as the command option. The default is 'none'. For example to switch on null checks using the option: setprop dalvik.vm.implicit_checks null It only works for ARM right now. Bumps OAT version number due to change to Thread offsets. Bug: 13121132 Change-Id: If743849138162f3c7c44a523247e413785677370
|
83cc7ae96d4176533dd0391a1591d321b0a87f4f |
|
12-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Create a scoped arena allocator and use that for LVN. This saves more than 0.5s of boot.oat compilation time on Nexus 5. TODO: Move other stuff to the scoped allocator. This CL alone increases the peak memory allocation. By reusing the memory for other parts of the compilation we should reduce this overhead. Change-Id: Ifbc00aab4f3afd0000da818dfe68b96713824a08
|
00e1ec6581b5b7b46ca4c314c2854e9caa647dd2 |
|
28-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Revert "Rework Quick compiler's register handling"" This reverts commit 86ec520fc8b696ed6f164d7b756009ecd6e4aace. Ready. Fixed the original type, plus some mechanical changes for rebasing. Still needs additional testing, but the problem with the original CL appears to have been a typo in the definition of the x86 double return template RegLocation. Change-Id: I828c721f91d9b2546ef008c6ea81f40756305891
|
dbb8c49d540edd2a39076093163c7218f03aa502 |
|
28-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Remove non-existent ARM insn kThumb2SubsRRI12. For kOpSub/kOpAdd, prefer modified immediate encodings because they set flags. Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
|
86ec520fc8b696ed6f164d7b756009ecd6e4aace |
|
26-Feb-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Rework Quick compiler's register handling" This reverts commit 2c1ed456dcdb027d097825dd98dbe48c71599b6c. Change-Id: If88d69ba88e0af0b407ff2240566d7e4545d8a99
|
2c1ed456dcdb027d097825dd98dbe48c71599b6c |
|
20-Feb-2014 |
buzbee <buzbee@google.com> |
Rework Quick compiler's register handling For historical reasons, the Quick backend found it convenient to consider all 64-bit Dalvik values held in registers to be contained in a pair of 32-bit registers. Though this worked well for ARM (with double-precision registers also treated as a pair of 32-bit single-precision registers) it doesn't play well with other targets. And, it is somewhat problematic for 64-bit architectures. This is the first of several CLs that will rework the way the Quick backend deals with physical registers. The goal is to eliminate the "64-bit value backed with 32-bit register pair" requirement from the target-indendent portions of the backend and support 64-bit registers throughout. The key RegLocation struct, which describes the location of Dalvik virtual register & register pairs, previously contained fields for high and low physical registers. The low_reg and high_reg fields are being replaced with a new type: RegStorage. There will be a single instance of RegStorage for each RegLocation. Note that RegStorage does not increase the space used. It is 16 bits wide, the same as the sum of the 8-bit low_reg and high_reg fields. At a target-independent level, it will describe whether the physical register storage associated with the Dalvik value is a single 32 bit, single 64 bit, pair of 32 bit or vector. The actual register number encoding is left to the target-dependent code layer. Because physical register handling is pervasive throughout the backend, this restructuring necessarily involves large CLs with lots of changes. I'm going to roll these out in stages, and attempt to segregate the CLs with largely mechanical changes from those which restructure or rework the logic. This CL is of the mechanical change variety - it replaces low_reg and high_reg from RegLocation and introduces RegStorage. It also includes a lot of new code (such as many calls to GetReg()) that should go away in upcoming CLs. The tentative plan for the subsequent CLs is: o Rework standard register utilities such as AllocReg() and FreeReg() to use RegStorage instead of ints. o Rework the target-independent GenXXX, OpXXX, LoadValue, StoreValue, etc. routines to take RegStorage rather than int register encodings. o Take advantage of the vector representation and eliminate the current vector field in RegLocation. o Replace the "wide" variants of codegen utilities that take low_reg/high_reg pairs with versions that use RegStorage. o Add 64-bit register target independent codegen utilities where possible, and where not virtualize with 32-bit general register and 64-bit general register variants in the target dependent layer. o Expand/rework the LIR def/use flags to allow for more registers (currently, we lose out on 16 MIPS floating point regs as well as ARM's D16..D31 for lack of space in the masks). o [Possibly] move the float/non-float determination of a register from the target-dependent encoding to RegStorage. In other words, replace IsFpReg(register_encoding_bits). At the end of the day, all code in the target independent layer should be using RegStorage, as should much of the target dependent layer. Ideally, we won't be using the physical register number encoding extracted from RegStorage (i.e. GetReg()) until the NewLIRx() layer. Change-Id: Idc5c741478f720bdd1d7123b94e4288be5ce52cb
|
3bc01748ef1c3e43361bdf520947a9d656658bf8 |
|
06-Feb-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
GenSpecialCase support for x86 Moved GenSpecialCase from being ARM specific to common code to allow it to be used by x86 quick as well. Change-Id: I728733e8f4c4da99af6091ef77e5c76ae0fee850 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
80b7f4f217958df6950291a5ae861249bf5b943d |
|
13-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
am 47c42cae: am 76559681: Merge "Generate ARM special methods from InlineMethod data." * commit '47c42caef35bdc29229b2714d78e48fbd7dc57e6': Generate ARM special methods from InlineMethod data.
|
502c2a84888b7da075049dcaaeb0156602304f65 |
|
06-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Generate ARM special methods from InlineMethod data. Change-Id: I204b01660a1e515879524018d1371e31f41da59b
|
c9bf407643329fee7eb2603fdace46eebf618cc6 |
|
10-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Fix special getter/setter generation. Change-Id: I381618bdcc46c51b50e94042f332db99c3a71a38
|
2bc47809febcf36369dd40877b8226318642b428 |
|
10-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Revert "Revert "Check FastInstance() early for special getters and setters."" This reverts commit 632e458dc267fadfb8120be3ab02701e09e64875. Change-Id: I5098c41ee84fbbb39397133a7ecfd367fecebe42
|
632e458dc267fadfb8120be3ab02701e09e64875 |
|
08-Feb-2014 |
Ian Rogers <irogers@google.com> |
Revert "Check FastInstance() early for special getters and setters." This reverts commit 5dc5727261e87ba8a418e2d0e970c75f67e4ab79. Change-Id: I3299c8ca5c3ce3f2de994bab61ea16a734f1de33
|
f33ffde784fdbbc123dc7d2a5470f2b8ff6c3de4 |
|
08-Feb-2014 |
Ian Rogers <irogers@google.com> |
Revert "Generate ARM special methods from InlineMethod data." This reverts commit 1ca62346583210f64092a44a74b5947d51826e7a. Change-Id: I9589fc6ff15fbea36edb99f6d336040a8e0b4e71
|
1ca62346583210f64092a44a74b5947d51826e7a |
|
06-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Generate ARM special methods from InlineMethod data. Change-Id: Icd3af7fae67f9bd33d218056509a23549d1eeba2
|
5dc5727261e87ba8a418e2d0e970c75f67e4ab79 |
|
05-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Check FastInstance() early for special getters and setters. Perform the FastInstance() check for getters and setters when they are detected by the inliner. This will help avoid the FastInstance() check for inlining. We also record the field offset and whether the field is volatile and whether the method is static for use when inlining or generating the special accessors. Change-Id: I3f832fc9ae263883b8a984be89a3b7793398b55a
|
58af1f9385742f70aca4fcb5e13aba53b8be2ef4 |
|
19-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Clean up usage of carry flag condition codes. On X86, kCondUlt and kCondUge are bound to CS and CC, respectively, while on ARM it's the other way around. The explicit binding in ConditionCode was wrong and misleading and could lead to subtle bugs. Therefore, we detach those constants and clean up usage. The CS and CC conditions are now effectively unused but we keep them around as they may eventually be useful. And some minor cleanup and comments. Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
|
5816ed48bc339c983b40dc493e96b97821ce7966 |
|
27-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Detect special methods at the end of verification. This moves special method handling to method inliner and prepares for eventual inlining of these methods. Change-Id: I51c51b940fb7bc714e33135cd61be69467861352
|
31c2aac7137b69d5622eea09597500731fbee2ef |
|
09-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Rename ClobberCalleeSave to *Caller*, fix it for x86. Change-Id: I6a72703a11985e2753fa9b4520c375a164301433
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
d9c4fc94fa618617f94e1de9af5f034549100753 |
|
02-Oct-2013 |
Ian Rogers <irogers@google.com> |
Inflate contended lock word by suspending owner. Bug 6961405. Don't inflate monitors for Notify and NotifyAll. Tidy lock word, handle recursive lock case alongside unlocked case and move assembly out of line (except for ARM quick). Also handle null in out-of-line assembly as the test is quick and the enter/exit code is already a safepoint. To gain ownership of a monitor on behalf of another thread, monitor contenders must not hold the monitor_lock_, so they wait on a condition variable. Reduce size of per mutex contention log. Be consistent in calling thin lock thread ids just thread ids. Fix potential thread death races caused by the use of FindThreadByThreadId, make it invariant that returned threads are either self or suspended now. Code size reduction on ARM boot.oat 0.2%. Old nexus 7 speedup 0.25%, new nexus 7 speedup 1.4%, nexus 10 speedup 2.24%, nexus 4 speedup 2.09% on DeltaBlue. Change-Id: Id52558b914f160d9c8578fdd7fc8199a9598576a
|
252254b130067cd7a5071865e793966871ae0246 |
|
09-Sep-2013 |
buzbee <buzbee@google.com> |
More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so *only* because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
|
9b297bfc588c7d38efd12a6f38cd2710fc513ee3 |
|
06-Sep-2013 |
Ian Rogers <irogers@google.com> |
Refactor CompilerDriver::Compute..FieldInfo Don't use non-const reference arguments. Move ins before outs. Change-Id: I7b251156388d8f07513b3da62ebfd29e5fd9ff76
|
f6c4b3ba3825de1dbb3e747a68b809c6cc8eb4db |
|
25-Aug-2013 |
Mathieu Chartier <mathieuc@google.com> |
New arena memory allocator. Before we were creating arenas for each method. The issue with doing this is that we needed to memset each memory allocation. This can be improved if you start out with arenas that contain all zeroed memory and recycle them for each method. When you give memory back to the arena pool you do a single memset to zero out all of the memory that you used. Always inlined the fast path of the allocation code. Removed the "zero" parameter since the new arena allocator always returns zeroed memory. Host dex2oat time on target oat apks (2 samples each). Before: real 1m11.958s user 4m34.020s sys 1m28.570s After: real 1m9.690s user 4m17.670s sys 1m23.960s Target device dex2oat samples (Mako, Thinkfree.apk): Without new arena allocator: 0m26.47s real 0m54.60s user 0m25.85s system 0m25.91s real 0m54.39s user 0m26.69s system 0m26.61s real 0m53.77s user 0m27.35s system 0m26.33s real 0m54.90s user 0m25.30s system 0m26.34s real 0m53.94s user 0m27.23s system With new arena allocator: 0m25.02s real 0m54.46s user 0m19.94s system 0m25.17s real 0m55.06s user 0m20.72s system 0m24.85s real 0m55.14s user 0m19.30s system 0m24.59s real 0m54.02s user 0m20.07s system 0m25.06s real 0m55.00s user 0m20.42s system Correctness of Thinkfree.apk.oat verified by diffing both of the oat files. Change-Id: I5ff7b85ffe86c57d3434294ca7a621a695bf57a9
|
468532ea115657709bc32ee498e701a4c71762d4 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
|
848871b4d8481229c32e0d048a9856e5a9a17ef9 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
|
834b394ee759ed31c5371d8093d7cd8cd90014a8 |
|
31-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Merge remote-tracking branch 'goog/dalvik-dev' into merge-art-to-dalvik-dev Change-Id: I323e9e8c29c3e39d50d9aba93121b26266c52a46
|
7655f29fabc0a12765de828914a18314382e5a35 |
|
29-Jul-2013 |
Ian Rogers <irogers@google.com> |
Portable refactorings. Separate quick from portable entrypoints. Move architectural dependencies into arch. Change-Id: I9adbc0a9782e2959fdc3308215f01e3107632b7c
|
166db04e259ca51838c311891598664deeed85ad |
|
26-Jul-2013 |
Ian Rogers <irogers@google.com> |
Move assembler out of runtime into compiler/utils. Other directory layout bits of clean up. There is still work to separate quick and portable in some files (e.g. argument visitor, proxy..). Change-Id: If8fecffda8ba5c4c47a035f0c622c538c6b58351
|
7934ac288acfb2552bb0b06ec1f61e5820d924a4 |
|
26-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
|
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|