Cross Reference: /art/runtime/interpreter/mterp/arm/invoke.S

History log of /art/runtime/interpreter/mterp/arm/invoke.S
Revision		Date	Author	Comments
fd522f9039befff986701ff05054ffdd1be1dd33		11-Feb-2016	Bill Buzbee <buzbee@google.com>	Revert "Revert "Revert "Revert "ART: Enable Jit Profiling in Mterp for arm/arm64"""" This reverts commit 5d03317a834efdf3b5240c401f1bc2ceac7a2f25. We need to catch all possible cases in which new instrumentation appears or the debugger is attached, and then switch to the reference interpreter if necessary. We may, in a future CL, use the alt-mterp mechanism to accompish this (as did Dalvik). Only enables Arm64 for now. Once it survives extended testing, will enable arm and update x86. Updated OSR handling to match other interpreters. Change-Id: I076f1d752d6f59899876bab26b18e2221cd92f69
1452bee8f06b9f76a333ddf4760e4beaa82f8099		06-Mar-2015	buzbee <buzbee@google.com>	Fast Art interpreter Add a Dalvik-style fast interpreter to Art. Three primary deficiencies in the existing Art interpreter will be addressed: 1. Structural inefficiencies (primarily the bloated fetch/decode/execute overhead of the C++ interpreter implementation). 2. Stack memory wastage. Each managed-language invoke adds a full copy of the interpreter's compiler-generated locals on the shared stack. We're at the mercy of the compiler now in how much memory is wasted here. An assembly based interpreter can manage memory usage more effectively. 3. Shadow frame model, which not only spends twice the memory to store the Dalvik virtual registers, but causes vreg stores to happen twice. This CL mostly deals with #1 (but does provide some stack memory savings). Subsequent CLs will address the other issues. Current status: Passes all run-tests. Phone boots interpret-only. 2.5x faster than Clang-compiled Art goto interpreter on fetch/decode/execute microbenchmark, 5x faster than gcc-compiled goto interpreter. 1.6x faster than Clang goto on Caffeinemark overall 2.0x faster than Clang switch on Caffeinemark overall 68% of Dalvik interpreter performance on Caffeinemark (still much slower, primarily because of poor invoke performance and lack of execute-inline) Still nearly an order of magnitude slower than Dalvik on invokes (but slightly better than Art Clang goto interpreter. Importantly, saves ~200 bytes of stack memory per invoke (but still wastes ~400 relative to Dalvik). What's needed: Remove the (large quantity of) bring-up hackery in place. Integrate into the build mechanism. I'm still using the old Dalvik manual build step to generate assembly code from the stub files. Remove the suspend check hack. For bring-up purposes, I'm using an explicit suspend check (like the other Art interpreters). However, we should be doing a Dalvik style suspend check via the table base switch mechanism. This should be done during the alternative interpreter activation. General cleanup. Add CFI info. Update the new target bring-up README documentation. Add other targets. In later CLs: Consolidate mterp handlers for expensive operations (such as new-instance) with the code used by the switch interpreter. No need to duplicate the code for heavyweight operations (but will need some refactoring to align). Tuning - some fast paths needs to be moved down to the assembly handlers, rather than being dealt with in the out-of-line code. JIT profiling. Currently, the fast interpreter is used only in the fast case - no instrumentation, no transactions and no access checks. We will want to implement fast + JIT-profiling as the alternate fast interpreter. All other cases can still fall back to the reference interpreter. Improve invoke performance. We're nearly an order of magnitude slower than Dalvik here. Some of that is unavoidable, but I suspect we can do better. Add support for our other targets. Change-Id: I43e25dc3d786fb87245705ac74a87274ad34fedc