Cross Reference: /art/runtime/interpreter/mterp/arm/op_mul

History log of /art/runtime/interpreter/mterp/arm/op_mul_long.S
Revision	Date	Author	Comments
1d011d9306fd4ff57d72411775d415a86f5ed398	04-Apr-2016	Bill Buzbee <buzbee@google.com>	Revert "Revert "Revert "Revert "ART: Improve JitProfile perf in arm/arm64 mterp"""" Bug: 28081559 This reverts commit 961ea9fe42edcc2c57469bf451d1ca421da5cd59. Change-Id: I98a5bb8112646706ae7bd73bf6393cb956466be3
961ea9fe42edcc2c57469bf451d1ca421da5cd59	01-Apr-2016	Hiroshi Yamauchi <yamauchi@google.com>	Revert "Revert "Revert "ART: Improve JitProfile perf in arm/arm64 mterp""" This reverts commit 4a8ac9cee4312ac910fabf31c64d28d4c8362836. 570-checker-osr intermittently failing. Bug: 27939339
d6190dcdfb1f247ff0bf8268dacf413c93b58cf5	31-Mar-2016	Calin Juravle <calin@google.com>	Revert "Revert "Revert "ART: Improve JitProfile perf in arm/arm64 mterp""" This reverts commit 4a8ac9cee4312ac910fabf31c64d28d4c8362836. Bug: 27939339
4a8ac9cee4312ac910fabf31c64d28d4c8362836	25-Mar-2016	Bill Buzbee <buzbee@google.com>	Revert "Revert "ART: Improve JitProfile perf in arm/arm64 mterp"" Ready for review. This reverts commit 6aef867f4d1a98a12bcdd65e9bf2ff894f0f2d7e. Change-Id: I5d53ed2bedc7e429ce7d3cdf80b6696a9628740e
6aef867f4d1a98a12bcdd65e9bf2ff894f0f2d7e	25-Mar-2016	Calin Juravle <calin@google.com>	Revert "ART: Improve JitProfile perf in arm/arm64 mterp" This reverts commit c1d6b341eed646e5adafc6c4fd4e3748f0292368.
c1d6b341eed646e5adafc6c4fd4e3748f0292368	02-Mar-2016	buzbee <buzbee@google.com>	ART: Improve JitProfile perf in arm/arm64 mterp ART currently requires two profiling-related things from the interpreters: hotness updates and OSR switch checks. The hotness updates previously used the existing instrumentation framework - which is flexible, but quite heavyweight. For most things, the instrumentation framework overhead is acceptable, but because we do a hotness update on every backwards branch the overhead is unacceptable. Prior to this CL, branch profiling dominates interpreter cost. Here, we bypass the instrumentation framework for hotness updates and deliver a significant performance improvement. Running interpreter-only (dalvikvm -Xint) on a Nexus 6, we see the logic subtest of Caffeinemark improving from 2600 to 9200, and the overall score going from 1979 to over 3000. Compared to the C++ switch interpreter, we see a 6x improvement on the branchy logic subtest and a 2.6x improvement overall. Compared with the previous mterp which did not have support for jit profiling, we see a few (1% to 5%) performance loss on the standard command-line benchmarks. I consider this acceptable (we could create an alternate non-profiling mterp which would have no penalty, but I don't consider this overhead big enough to justify that). Change-Id: I50b5b8c5ed8ebda3c8b4e65d27ba7393c3feae04
ace690f5e440930d7bbad97fdbfdc3eb65e230be	11-Mar-2016	buzbee <buzbee@google.com>	ART: mterp arm/arm64 cleanup Assembly code cleanup in response to comments from already-submitted CL: https://android-review.googlesource.com/#/c/188977/ Change-Id: I0ea85c5759a08cb50ef3e97dc5cf79b3ba041640
1452bee8f06b9f76a333ddf4760e4beaa82f8099	06-Mar-2015	buzbee <buzbee@google.com>	Fast Art interpreter Add a Dalvik-style fast interpreter to Art. Three primary deficiencies in the existing Art interpreter will be addressed: 1. Structural inefficiencies (primarily the bloated fetch/decode/execute overhead of the C++ interpreter implementation). 2. Stack memory wastage. Each managed-language invoke adds a full copy of the interpreter's compiler-generated locals on the shared stack. We're at the mercy of the compiler now in how much memory is wasted here. An assembly based interpreter can manage memory usage more effectively. 3. Shadow frame model, which not only spends twice the memory to store the Dalvik virtual registers, but causes vreg stores to happen twice. This CL mostly deals with #1 (but does provide some stack memory savings). Subsequent CLs will address the other issues. Current status: Passes all run-tests. Phone boots interpret-only. 2.5x faster than Clang-compiled Art goto interpreter on fetch/decode/execute microbenchmark, 5x faster than gcc-compiled goto interpreter. 1.6x faster than Clang goto on Caffeinemark overall 2.0x faster than Clang switch on Caffeinemark overall 68% of Dalvik interpreter performance on Caffeinemark (still much slower, primarily because of poor invoke performance and lack of execute-inline) Still nearly an order of magnitude slower than Dalvik on invokes (but slightly better than Art Clang goto interpreter. Importantly, saves ~200 bytes of stack memory per invoke (but still wastes ~400 relative to Dalvik). What's needed: Remove the (large quantity of) bring-up hackery in place. Integrate into the build mechanism. I'm still using the old Dalvik manual build step to generate assembly code from the stub files. Remove the suspend check hack. For bring-up purposes, I'm using an explicit suspend check (like the other Art interpreters). However, we should be doing a Dalvik style suspend check via the table base switch mechanism. This should be done during the alternative interpreter activation. General cleanup. Add CFI info. Update the new target bring-up README documentation. Add other targets. In later CLs: Consolidate mterp handlers for expensive operations (such as new-instance) with the code used by the switch interpreter. No need to duplicate the code for heavyweight operations (but will need some refactoring to align). Tuning - some fast paths needs to be moved down to the assembly handlers, rather than being dealt with in the out-of-line code. JIT profiling. Currently, the fast interpreter is used only in the fast case - no instrumentation, no transactions and no access checks. We will want to implement fast + JIT-profiling as the alternate fast interpreter. All other cases can still fall back to the reference interpreter. Improve invoke performance. We're nearly an order of magnitude slower than Dalvik here. Some of that is unavoidable, but I suspect we can do better. Add support for our other targets. Change-Id: I43e25dc3d786fb87245705ac74a87274ad34fedc