d585beda3690b8b5b978e3c59af224336614ba72 |
|
21-May-2013 |
Yanchuan Nian <ycnian@gmail.com> |
Rename unreasonable function name dmvCompilerTemplateEnd In dalvik, most function names start with "dvm" except dmvCompilerTemplateEnd. Convert it to dvmCompilerTemplateEnd in order to follow the rule. Change-Id: I09c41f8c9d55058013fbdb62ac5922ccd067ce39
|
5dfcc78af479937ba8dafceefd9b1931a88dfaaf |
|
11-Aug-2012 |
Ard Biesheuvel <ard.biesheuvel@gmail.com> |
hardening: eliminate all text relocations from lidbvm This patch consists of: - changes to mterp/ that turn all literals from absolute to PC relative, so the relocations can be resolved at (build) link time - changes to compiler/template/ that result in the compiler templates to live in the non-executable .data.rel.ro section (this code is never executed directly, only from the jit heap, so there is no reason to put it in the .text section) Change-Id: I2dc97bd4720b393a74b7277a188f0c7b681fc932 Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
|
8b095215a4d5bde723819087f3455bdcc250a78f |
|
20-Jun-2012 |
David Butcher <david.butcher@arm.com> |
Switched code to blx <reg> ldr ip,<addr> blx ip is preferred over mov lr,pc ldr pc,<addr> from armv5te, and will typically perform better on later ARM processors. Change-Id: I8f2e5e794c644faafd767037ad56579f2934de47
|
389e258a5b9b2afb7bfaee3344c615d3310fae4e |
|
23-Apr-2011 |
buzbee <buzbee@google.com> |
InterpBreak cleanup (part 1) Moved the suspend count variables from the interpBreak structure. These are already protected by a mutex, and we need the space in interpBreak for additional subMode flags. This CL just does the move and expands the width of subMode to 16-bits. Change-Id: I4a6070b1ba4fb08a0f6e0aba6f150b30f9159eed
|
30bc0d46ae730d78c42c39cfa56a59ba3025380b |
|
22-Apr-2011 |
buzbee <buzbee@google.com> |
Consolidate curFrame fields in thread storage We ended up with two locations in the Thread structure for saved Dalvik frame pointer. This change consolidates them. Change-Id: I78f288e4e57e232f29663be930101e775bfe370f
|
9a3147c7412f4794434b4c2604aa2ba784867774 |
|
03-Mar-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring This is a restructuring of the Dalvik ARM and x86 interpreters: o Combine the old portstd and portdbg interpreters into a single portable interpreter. o Add debug/profiling support to the fast (mterp) interpreters. o Delete old mechansim of switching between interpreters. Now, once you choose an interpreter at startup, you stick with it. o Allow JIT to co-exist with profiling & debugging (necessary for first-class support of debugging with the JIT active). o Adds single-step capability to the fast assembly interpreters without slowing them down (and, in fact, measurably improves their performance). o Remove old "polling for safe point" mechanism. Breakouts now achieved via modifying base of interpreter handler table. o Simplify interpeter control mechanism. o Allow thread-granularity control for profiling & debugging The primary motivation behind this change was to improve the responsiveness of debugging and profiling and to make it easier to add new debugging and profiling capabilities in the future. Instead of always bailing out to the slow debug portable interpreter, we can now stay in the fast interpreter. A nice side effect of the change is that the fast interpreters got a healthy speed boost because we were able to replace the polling safepoint check that involved a dozen or so instructions with a single table-base reload. When combined with the two earlier CLs related to this restructuring, we show a 5.6% performance improvement using libdvm_interp.so on the Checkers benchmark relative to Honeycomb. Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
|
9f601a917c8878204482c37aec7005054b6776fa |
|
12-Feb-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring: eliminate InterpState The key datastructure for the interpreter is InterpState. This change eliminates it, merging its data with the Thread structure. Here's why: In principio creavit Fadden Thread et InterpState. And it was good. Thread holds thread-private state, while InterpState captures data associated with a Dalvik interpreter activation. Because JNI calls can result in nested interpreter invocations, we can have more than one InterpState for each actual thread. InterpState was relatively small, and it all worked well. It was used enough that in the Arm version a register (rGLUE) was dedicated to it. Then, along came the JIT guys, who saw InterpState as a convenient place to dump all sorts of useful data that they wanted quick access to through that dedicated register. InterpState grew and grew. In terms of space, this wasn't a big problem - but it did mean that the initialization cost of each interpreter activation grew as well. For applications that do a lot of callbacks from native code into Dalvik, this is measurable. It's also mostly useless cost because much of the JIT-related InterpState initialization was setting up useful constants - things that don't need to be saved and restored all the time. The biggest problem, though, deals with thread control. When something interesting is happening that needs all threads to be stopped (such as GC and debugger attach), we have access to all of the Thread structures, but we don't have access to all of the InterpState structures (which may be buried/nested on the native stack). As a result, polling for thread suspension is done via a one-indirection pointer chase. InterpState itself can't hold the stop bits because we can't always find it, so instead it holds a pointer to the global or thread-specific stop control. Yuck. With this change, we eliminate InterpState and merge all needed data into Thread. Further, we replace the decidated rGLUE register with a pointer to the Thread structure (rSELF). The small subset of state data that needs to be saved and restored across nested interpreter activations is collected into a record that is saved to the interpreter frame, and restored on exit. Further, these small records are linked together to allow tracebacks to show nested activations. Old InterpState variables that simply contain useful constants are initialized once at thread creation time. This CL is large enough by itself that the new ability to streamline suspend checks is not done here - that will happen in a future CL. Here we just focus on consolidation. Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
18fba346582c08d81aa96d9508c0e935bad5f36f |
|
20-Jan-2011 |
buzbee <buzbee@google.com> |
Support traceview-style profiling in all builds This change builds on an earlier bccheng change that allowed JIT'd code to avoid reverting to the debug portable interpeter when doing traceview-style method profiling. That CL introduced a new traceview build (libdvm_traceview) because the performance delta was too great to enable the capability for all builds. In this CL, we remove the libdvm_traceview build and provide full-speed method tracing in all builds. This is done by introducing "_PROF" versions of invoke and return templates used by the JIT. Normally, these templates are not used, and performace in unaffected. However, when method profiling is enabled, all existing translation are purged and new translations are created using the _PROF templates. These templates introduce a smallish performance penalty above and beyond the actual tracing cost, but again are only used when tracing has been enabled. Strictly speaking, there is a slight burden that is placed on invokes and returns in the non-tracing case - on the order of an additional 3 or 4 cycles per invoke/return. Those operations are already heavyweight enough that I was unable to measure the added cost in benchmarks. Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
|
2e152baec01433de9c63633ebc6f4adf1cea3a87 |
|
16-Dec-2010 |
buzbee <buzbee@google.com> |
[JIT] Trace profiling support In preparation for method compilation, this CL causes all traces to include two entry points: profiling and non-profiling. For now, the profiling entry will only be used if dalvik is run with -Xjitprofile, and largely works like it did before. The difference is that profiling support no longer requires the "assert" build - it's always there now. This will enable us to do a form of sampling profiling of traces in order to identify hot methods or hot trace groups, while keeping the overhead low by only switching profiling on periodically. To turn the periodic profiling on and off, we simply unchain all existing translations and set the appropriate global profile state. The underlying translation lookup and chaining utilties will examine the profile state to determine which entry point to use (i.e. - profiling or non-profiling) while the traces naturally rechain during further execution. Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
d88756df5b4dbc6fd450afd0019a5f64ebe4432d |
|
22-Oct-2010 |
Elliott Hughes <enh@google.com> |
Remove junk from platform.S now armv4t is gone. Change-Id: I30079aacc753c89cfc3a3f64bd900a0cc858d65f
|
5cc61d70ec727aa22f58463bf7940cc717cf3eb1 |
|
31-Aug-2010 |
Ben Cheng <bccheng@android.com> |
Collect method traces with the fast interpreter and the JIT'ed code. Insert inline code instead of switching to the debug interpreter in the hope that the time stamps collected in traceview are more close to the real world behavior with minimal profiling overhead. Because the inline polling still introduces additional overhead (20% ~ 100%), it is only enabled in the special VM build called "libdvm_traceview.so". It won't work on the emulator because it is not implemented to collect the detailed instruction traces. Here are some performance numbers using the FibonacciSlow microbenchmark (ie recursive workloads / the shorter the faster): time: configuration 8,162,602: profiling off/libdvm.so/JIT off 2,801,829: profiling off/libdvm.so/JIT on 9,952,236: profiling off/libdvm_traceview.so/JIT off 4,465,701: profiling off/libdvm_traceview.so/JIT on 164,786,585: profiling on/libdvm.so/JIT off 164,664,634: profiling on/libdvm.so/JIT on 11,231,707: profiling on/libdvm_traceview.so/JIT off 8,427,846: profiling on/libdvm_traceview.so/JIT on Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline performance number of 2,801,829, the new libdvm_traceview.so improves the time skew from 58x to 3x. Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
|
7a2697d327936e20ef5484f7819e2e4bf91c891f |
|
07-Jun-2010 |
Ben Cheng <bccheng@android.com> |
Implement method inlining for getters/setters Changes include: 1) Force the trace that ends with an invoke instruction to include the next instruction if it is a move-result (because both need to be turned into no-ops if callee is inlined). 2) Interpreter entry point/trace builder changes so that return target won't automatically be considered as trace starting points (to avoid duplicate traces that include the move result instructions). 3) Codegen changes to handle getters/setters invoked from both monomorphic and polymorphic callsites. 4) Extend/fix self-verification to form identical trace regions and handle traces with inlined callees. 5) Apply touchups to the method based parsing - still not in use. Change-Id: I116b934df01bf9ada6d5a25187510e352bccd13c
|
fbdcfb9ea9e2a78f295834424c3f24986ea45dac |
|
29-May-2010 |
Brian Carlstrom <bdc@google.com> |
Merge remote branch 'goog/dalvik-dev' into dalvik-dev-to-master Change-Id: I0c0edb3ebf0d5e040d6bbbf60269fab0deb70ef9
|
978738d2cbf9d08fa78c65762eaac3351ab76b9a |
|
13-May-2010 |
Ben Cheng <bccheng@android.com> |
Add counters to track JIT inline cache hit rate and code cache patch counts. Also did some WITH_JIT_TUNING cleanup. Change-Id: I8bb2d681a06b0f2af1f976a007326825a88cea38
|
11d8f14eef83d1b7bfa8f116de56a92d5ba9e71e |
|
24-Mar-2010 |
Ben Cheng <bccheng@android.com> |
Fix for the JIT blocking mode plus some code cleanup. Bug: 2517606 Change-Id: I2b5aa92ceaf23d484329330ae20de5966704280b
|
86717f79d9b018f4d69cc991075fa36611f234e5 |
|
06-Mar-2010 |
Ben Cheng <bccheng@android.com> |
Collect more JIT stats in the assert build. New stuff includes breakdown of callsite types (ie monomorphic vs polymorphic vs monoporphic resolved to native), total time spent in JIT'ing, and average JIT time per compilation. Example output: D/dalvikvm( 840): 4042 compilations using 1976 + 329108 bytes D/dalvikvm( 840): Compiler arena uses 10 blocks (8100 bytes each) D/dalvikvm( 840): Compiler work queue length is 0/36 D/dalvikvm( 840): size if 8192, entries used is 4137 D/dalvikvm( 840): JIT: 4137 traces, 8192 slots, 1099 chains, 40 thresh, Non-blocking D/dalvikvm( 840): JIT: Lookups: 1128780 hits, 168564 misses; 179520 normal, 6 punt D/dalvikvm( 840): JIT: noChainExit: 528464 IC miss, 194708 interp callsite, 0 switch overflow D/dalvikvm( 840): JIT: Invoke: 507 mono, 988 poly, 72 native, 1038 return D/dalvikvm( 840): JIT: Total compilation time: 2342 ms D/dalvikvm( 840): JIT: Avg unit compilation time: 579 us D/dalvikvm( 840): JIT: 3357 Translation chains, 97 interp stubs D/dalvikvm( 840): dalvik.vm.jit.op = 0-2,4-5,7-8,a-c,e-16,19-1a,1c-23,26,28-29,2b-2f,31-3d,44-4b,4d-51,60,62-63,68-69,70-72,76-78,7b,81-82,84,87,89,8d-93,95-98,a1,a3,a6,a8-a9,b0-b3,b5-b6,bb-bf,c6-c8,d0,d2-d6,d8,da-e2,ee-f0,f2-fb, D/dalvikvm( 840): Code size stats: 50666/105126 (compiled/total Dalvik), 329108 (native)
|
40094c16d9727cc1e047a7d4bddffe04dd566211 |
|
25-Feb-2010 |
Ben Cheng <bccheng@android.com> |
Tweak the interpreter entries and 2nd level trace filter to capture more traces. Real changes: 1) Add a new entry point from JIT to the interpreter to request hot traces w/o doing chaining. 2) Increase the granularity of the secondary profile filter to match 64-byte chunks using 64 entries. The remaining are just cosmetic changes.
|
88dc28740f628a8d0d8fe05af0e11443f8793aa1 |
|
03-Feb-2010 |
jeffhao <jeffhao@google.com> |
Made Self Verification mode's memory interface less intrusive.
|
9e45c0b968d63ea38353c99252d233879c2efdaf |
|
03-Feb-2010 |
jeffhao <jeffhao@google.com> |
Made Self Verification mode's memory interface less intrusive.
|
f5ceaebfe5633a16b11a7073d2bf36b5bb0c9945 |
|
02-Feb-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Rework monitor enter/exit to simplify thread suspension The Jit must stop all threads in order to flush the translation cache (and other tables). Threads which are blocked in a monitor wait cause some headache here because they effectively hold a references to the translation cache (though the return address on the native stack). The new model introduced in this CL is that for the fast path of monitor enter, control is allowed to resume in the translation cache. However, if we need to do a heavyweight lock (which may cause us to block) control does not return to the translation cache but instead bails out to the interpreter. This allows us to safely clear the code cache even if some threads are in THREAD_MONITOR state.
|
c1d9ed490a7bd6caab51df41f3c9e590fcecb727 |
|
02-Feb-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Rework monitor enter/exit to simplify thread suspension The Jit must stop all threads in order to flush the translation cache (and other tables). Threads which are blocked in a monitor wait cause some headache here because they effectively hold a references to the translation cache (though the return address on the native stack). The new model introduced in this CL is that for the fast path of monitor enter, control is allowed to resume in the translation cache. However, if we need to do a heavyweight lock (which may cause us to block) control does not return to the translation cache but instead bails out to the interpreter. This allows us to safely clear the code cache even if some threads are in THREAD_MONITOR state.
|
964a7b06a9134947b5985c7f712d18d57ed665d2 |
|
28-Jan-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Rework delayed start plus misc. cleanup Defer initialization of jit to support upcoming feature to wait until first screen is painted to start in order to avoid wasting effort on jit'ng initialization code. Timed delay in place for the moment. To change the on/off state, call dvmSuspendAllThreads(), update the value of gDvmJit.pJitTable and then dvmResumeAllThreads(). Each time a thread goes through the heavyweight check suspend path, returns from a monitor lock/unlock or returns from a JNI call, it will refresh its on/off state. Also: Recognize and handle failure to increase size of JitTable. Avoid repeated lock/unlock of JitTable modification mutex during resize Make all work order enqueue actions non-blocking, which includes adding a non-blocking mutex lock: dvmTryLockMutex(). Fix bug Jeff noticed where we were using a half-word form of a Thumb2 instruction rather than the byte form. Minor comment changes.
|
7a0bcd0de6c4da6499a088a18d1750e51204c2a6 |
|
23-Jan-2010 |
Ben Cheng <bccheng@android.com> |
Tighten the safe points for code cache resets to happen. Add a new flag in the Thread struct to track the whereabout of the top frame in each Java thread. It is not safe to blow away the code cache if any thread is in the JIT'ed land.
|
60c24f436d603c564d5351a6f81821f12635733c |
|
04-Jan-2010 |
Ben Cheng <bccheng@google.com> |
Tear down the code cache when it is full and restart from scratch. Because the code cache may be wiped out after safe points now the patching of inline cache for predicted chains is done through the compiler thread's work queue.
|
72e93344b4d1ffc71e9c832ec23de0657e5b04a5 |
|
13-Nov-2009 |
Jean-Baptiste Queru <jbq@google.com> |
eclair snapshot
|
4f48917c0741e4d9b15ca7c45956aea05fea103f |
|
28-Sep-2009 |
Ben Cheng <bccheng@google.com> |
Fixed OOM exception handling in JIT'ed code and added a new unit test.
|
d5ab726b65d7271be261864c7e224fb90bfe06e0 |
|
25-Aug-2009 |
Andy McFadden <fadden@android.com> |
Another round of scary indirect ref changes. This change adds a not-really-working implementation to Jni.c, with various changes #ifdefed throughout the code. The ifdef is currently disabled, so the old behavior should continue. Eventually the old version will be stripped out and the ifdefs removed. This renames the stack's "localRefTop" field, which nudged a bunch of code. The name wasn't really right before (it's the *bottom* of the local references), and it's even less right now. This and one other mterp-visible constant were changed, which caused some ripples through mterp and the JIT, but the ifdeffing was limited to one in asm-constants.h (and the constant is the same both ways, so toggling the ifdef won't require rebuilding asm sources). Some comments and arg names in ReferenceTable were updated for the correct orientation of bottom vs. top. Some adjustments were made to the JNI code, e.g. dvmCallMethod now needs to understand if it needs to convert reference arguments from local/global refs to pointers (it's called from various places throughout the VM).
|
cc6600c2702c0b29457836acde1cfc4f25c63b67 |
|
22-Jun-2009 |
Ben Cheng <bccheng@android.com> |
The address of dvmMterpCommonExceptionThrown should be loaded in a position-independent way since these handlers are copied into the code cache. BTW fixed a couple recently introduced compiler warnings in Codegen.c.
|
ba4fc8bfc1bccae048403bd1cea3b869dca61dd7 |
|
01-Jun-2009 |
Ben Cheng <bccheng@android.com> |
Initial port of the Dalvik JIT enging to the internal repository. Fixed files with trailing spaces. Addressed review comments from Dan. Addressed review comments from fadden. Addressed review comments from Dan x 2. Addressed review comments from Dan x 3.
|