Cross Reference: /dalvik/vm/compiler/template/armv5te/TEMPLATE

History log of /dalvik/vm/compiler/template/armv5te/TEMPLATE_RETURN.S
Revision	Date	Author	Comments
547a2b20f442ac0310e3e78cbf614bb2ed6f1e58	04-Apr-2013	Bill Buzbee <buzbee@google.com>	Revert "Tiny optimization for complier templates for arm." This reverts commit 87bc7988cdb4e61421a3e701e84f7070f603635d No obvious problems with this change - but reverting to aid in tracking down bug: 8543495 NCs in Play Store : >>> com.android.vending Change-Id: I8bd6dbe6a7b3a4650a5e857a5a529cde6569b987
4afb260cf1f312382541e30cab5766bff890e6fe	04-Apr-2013	Bill Buzbee <buzbee@google.com>	Revert "Tiny optimization for complier templates for arm." This reverts commit 87bc7988cdb4e61421a3e701e84f7070f603635d No obvious problems with this change - but reverting to aid in tracking down bug: 8543495 NCs in Play Store : >>> com.android.vending Change-Id: I8bd6dbe6a7b3a4650a5e857a5a529cde6569b987
87bc7988cdb4e61421a3e701e84f7070f603635d	02-Apr-2013	You Kim <you.kim72@gmail.com>	Tiny optimization for complier templates for arm. 1. Remove possible bubble in TEMPLATE_STRING_INDEXOF.S 2. Remove 1 instruction and reorder the opcodes TEMPLATE_MUL_LONG.S 3. Reorder ldr r2 instruction in TEMPLATE_RETURN.S (cherry-pick of a2dc68acd954827cdc67929a859354e5ed9b5713.) Change-Id: I78b9797aff3c2255c5d34a8391b1a94a1b09b613
8b095215a4d5bde723819087f3455bdcc250a78f	20-Jun-2012	David Butcher <david.butcher@arm.com>	Switched code to blx <reg> ldr ip,<addr> blx ip is preferred over mov lr,pc ldr pc,<addr> from armv5te, and will typically perform better on later ARM processors. Change-Id: I8f2e5e794c644faafd767037ad56579f2934de47
30bc0d46ae730d78c42c39cfa56a59ba3025380b	22-Apr-2011	buzbee <buzbee@google.com>	Consolidate curFrame fields in thread storage We ended up with two locations in the Thread structure for saved Dalvik frame pointer. This change consolidates them. Change-Id: I78f288e4e57e232f29663be930101e775bfe370f
d5f6ef487f2fc6edb8c1e6394d2a82712169f491	18-Apr-2011	buzbee <buzbee@google.com>	[JIT] Clear inJitCodeCache flag on return This CL plugs a hole in which control could return to the interpreter from JIT'd code without resetting the inJitCodeCache flag. Change-Id: Id0241bf3490f5bef9b274483af694c81f33334cf
9a3147c7412f4794434b4c2604aa2ba784867774	03-Mar-2011	buzbee <buzbee@google.com>	Interpreter restructuring This is a restructuring of the Dalvik ARM and x86 interpreters: o Combine the old portstd and portdbg interpreters into a single portable interpreter. o Add debug/profiling support to the fast (mterp) interpreters. o Delete old mechansim of switching between interpreters. Now, once you choose an interpreter at startup, you stick with it. o Allow JIT to co-exist with profiling & debugging (necessary for first-class support of debugging with the JIT active). o Adds single-step capability to the fast assembly interpreters without slowing them down (and, in fact, measurably improves their performance). o Remove old "polling for safe point" mechanism. Breakouts now achieved via modifying base of interpreter handler table. o Simplify interpeter control mechanism. o Allow thread-granularity control for profiling & debugging The primary motivation behind this change was to improve the responsiveness of debugging and profiling and to make it easier to add new debugging and profiling capabilities in the future. Instead of always bailing out to the slow debug portable interpreter, we can now stay in the fast interpreter. A nice side effect of the change is that the fast interpreters got a healthy speed boost because we were able to replace the polling safepoint check that involved a dozen or so instructions with a single table-base reload. When combined with the two earlier CLs related to this restructuring, we show a 5.6% performance improvement using libdvm_interp.so on the Checkers benchmark relative to Honeycomb. Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
9f601a917c8878204482c37aec7005054b6776fa	12-Feb-2011	buzbee <buzbee@google.com>	Interpreter restructuring: eliminate InterpState The key datastructure for the interpreter is InterpState. This change eliminates it, merging its data with the Thread structure. Here's why: In principio creavit Fadden Thread et InterpState. And it was good. Thread holds thread-private state, while InterpState captures data associated with a Dalvik interpreter activation. Because JNI calls can result in nested interpreter invocations, we can have more than one InterpState for each actual thread. InterpState was relatively small, and it all worked well. It was used enough that in the Arm version a register (rGLUE) was dedicated to it. Then, along came the JIT guys, who saw InterpState as a convenient place to dump all sorts of useful data that they wanted quick access to through that dedicated register. InterpState grew and grew. In terms of space, this wasn't a big problem - but it did mean that the initialization cost of each interpreter activation grew as well. For applications that do a lot of callbacks from native code into Dalvik, this is measurable. It's also mostly useless cost because much of the JIT-related InterpState initialization was setting up useful constants - things that don't need to be saved and restored all the time. The biggest problem, though, deals with thread control. When something interesting is happening that needs all threads to be stopped (such as GC and debugger attach), we have access to all of the Thread structures, but we don't have access to all of the InterpState structures (which may be buried/nested on the native stack). As a result, polling for thread suspension is done via a one-indirection pointer chase. InterpState itself can't hold the stop bits because we can't always find it, so instead it holds a pointer to the global or thread-specific stop control. Yuck. With this change, we eliminate InterpState and merge all needed data into Thread. Further, we replace the decidated rGLUE register with a pointer to the Thread structure (rSELF). The small subset of state data that needs to be saved and restored across nested interpreter activations is collected into a record that is saved to the interpreter frame, and restored on exit. Further, these small records are linked together to allow tracebacks to show nested activations. Old InterpState variables that simply contain useful constants are initialized once at thread creation time. This CL is large enough by itself that the new ability to streamline suspend checks is not done here - that will happen in a future CL. Here we just focus on consolidation. Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
18fba346582c08d81aa96d9508c0e935bad5f36f	20-Jan-2011	buzbee <buzbee@google.com>	Support traceview-style profiling in all builds This change builds on an earlier bccheng change that allowed JIT'd code to avoid reverting to the debug portable interpeter when doing traceview-style method profiling. That CL introduced a new traceview build (libdvm_traceview) because the performance delta was too great to enable the capability for all builds. In this CL, we remove the libdvm_traceview build and provide full-speed method tracing in all builds. This is done by introducing "_PROF" versions of invoke and return templates used by the JIT. Normally, these templates are not used, and performace in unaffected. However, when method profiling is enabled, all existing translation are purged and new translations are created using the _PROF templates. These templates introduce a smallish performance penalty above and beyond the actual tracing cost, but again are only used when tracing has been enabled. Strictly speaking, there is a slight burden that is placed on invokes and returns in the non-tracing case - on the order of an additional 3 or 4 cycles per invoke/return. Those operations are already heavyweight enough that I was unable to measure the added cost in benchmarks. Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
8c9ac9ab0ab6fd75b73cb0d99005da3aa90c167c	22-Oct-2010	Ben Cheng <bccheng@android.com>	Avoid conditional loads if WORKAROUND_CORTEX_A9_745320 is defined. No noticeable performance impact by this change. Bug: 3117632 Change-Id: I31c6adc6cb9999498bb456f1e87f6f04f33e4144
d88756df5b4dbc6fd450afd0019a5f64ebe4432d	22-Oct-2010	Elliott Hughes <enh@google.com>	Remove junk from platform.S now armv4t is gone. Change-Id: I30079aacc753c89cfc3a3f64bd900a0cc858d65f
5cc61d70ec727aa22f58463bf7940cc717cf3eb1	31-Aug-2010	Ben Cheng <bccheng@android.com>	Collect method traces with the fast interpreter and the JIT'ed code. Insert inline code instead of switching to the debug interpreter in the hope that the time stamps collected in traceview are more close to the real world behavior with minimal profiling overhead. Because the inline polling still introduces additional overhead (20% ~ 100%), it is only enabled in the special VM build called "libdvm_traceview.so". It won't work on the emulator because it is not implemented to collect the detailed instruction traces. Here are some performance numbers using the FibonacciSlow microbenchmark (ie recursive workloads / the shorter the faster): time: configuration 8,162,602: profiling off/libdvm.so/JIT off 2,801,829: profiling off/libdvm.so/JIT on 9,952,236: profiling off/libdvm_traceview.so/JIT off 4,465,701: profiling off/libdvm_traceview.so/JIT on 164,786,585: profiling on/libdvm.so/JIT off 164,664,634: profiling on/libdvm.so/JIT on 11,231,707: profiling on/libdvm_traceview.so/JIT off 8,427,846: profiling on/libdvm_traceview.so/JIT on Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline performance number of 2,801,829, the new libdvm_traceview.so improves the time skew from 58x to 3x. Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
7a2697d327936e20ef5484f7819e2e4bf91c891f	07-Jun-2010	Ben Cheng <bccheng@android.com>	Implement method inlining for getters/setters Changes include: 1) Force the trace that ends with an invoke instruction to include the next instruction if it is a move-result (because both need to be turned into no-ops if callee is inlined). 2) Interpreter entry point/trace builder changes so that return target won't automatically be considered as trace starting points (to avoid duplicate traces that include the move result instructions). 3) Codegen changes to handle getters/setters invoked from both monomorphic and polymorphic callsites. 4) Extend/fix self-verification to form identical trace regions and handle traces with inlined callees. 5) Apply touchups to the method based parsing - still not in use. Change-Id: I116b934df01bf9ada6d5a25187510e352bccd13c
fbdcfb9ea9e2a78f295834424c3f24986ea45dac	29-May-2010	Brian Carlstrom <bdc@google.com>	Merge remote branch 'goog/dalvik-dev' into dalvik-dev-to-master Change-Id: I0c0edb3ebf0d5e040d6bbbf60269fab0deb70ef9
978738d2cbf9d08fa78c65762eaac3351ab76b9a	13-May-2010	Ben Cheng <bccheng@android.com>	Add counters to track JIT inline cache hit rate and code cache patch counts. Also did some WITH_JIT_TUNING cleanup. Change-Id: I8bb2d681a06b0f2af1f976a007326825a88cea38
86717f79d9b018f4d69cc991075fa36611f234e5	06-Mar-2010	Ben Cheng <bccheng@android.com>	Collect more JIT stats in the assert build. New stuff includes breakdown of callsite types (ie monomorphic vs polymorphic vs monoporphic resolved to native), total time spent in JIT'ing, and average JIT time per compilation. Example output: D/dalvikvm( 840): 4042 compilations using 1976 + 329108 bytes D/dalvikvm( 840): Compiler arena uses 10 blocks (8100 bytes each) D/dalvikvm( 840): Compiler work queue length is 0/36 D/dalvikvm( 840): size if 8192, entries used is 4137 D/dalvikvm( 840): JIT: 4137 traces, 8192 slots, 1099 chains, 40 thresh, Non-blocking D/dalvikvm( 840): JIT: Lookups: 1128780 hits, 168564 misses; 179520 normal, 6 punt D/dalvikvm( 840): JIT: noChainExit: 528464 IC miss, 194708 interp callsite, 0 switch overflow D/dalvikvm( 840): JIT: Invoke: 507 mono, 988 poly, 72 native, 1038 return D/dalvikvm( 840): JIT: Total compilation time: 2342 ms D/dalvikvm( 840): JIT: Avg unit compilation time: 579 us D/dalvikvm( 840): JIT: 3357 Translation chains, 97 interp stubs D/dalvikvm( 840): dalvik.vm.jit.op = 0-2,4-5,7-8,a-c,e-16,19-1a,1c-23,26,28-29,2b-2f,31-3d,44-4b,4d-51,60,62-63,68-69,70-72,76-78,7b,81-82,84,87,89,8d-93,95-98,a1,a3,a6,a8-a9,b0-b3,b5-b6,bb-bf,c6-c8,d0,d2-d6,d8,da-e2,ee-f0,f2-fb, D/dalvikvm( 840): Code size stats: 50666/105126 (compiled/total Dalvik), 329108 (native)
7a0bcd0de6c4da6499a088a18d1750e51204c2a6	23-Jan-2010	Ben Cheng <bccheng@android.com>	Tighten the safe points for code cache resets to happen. Add a new flag in the Thread struct to track the whereabout of the top frame in each Java thread. It is not safe to blow away the code cache if any thread is in the JIT'ed land.
72e93344b4d1ffc71e9c832ec23de0657e5b04a5	13-Nov-2009	Jean-Baptiste Queru <jbq@google.com>	eclair snapshot
6c10a977ec892c26c8e306356491833bbb073d40	29-Oct-2009	Ben Cheng <bccheng@google.com>	Implement chaining up to the first 64 cases in a switch statement.
97319a8a234e9fe1cf90ca39aa6eca37d729afd5	13-Aug-2009	Jeff Hao <jeffhao@google.com>	New changes to enable self verification mode.
ba4fc8bfc1bccae048403bd1cea3b869dca61dd7	01-Jun-2009	Ben Cheng <bccheng@android.com>	Initial port of the Dalvik JIT enging to the internal repository. Fixed files with trailing spaces. Addressed review comments from Dan. Addressed review comments from fadden. Addressed review comments from Dan x 2. Addressed review comments from Dan x 3.