375fb116bcb817b37509ab579dbd55cdbb765cbf |
|
15-Jun-2011 |
Carl Shapiro <cshapiro@google.com> |
Normalize the include guard style. An leading underscore followed by a capital letter is a reserved name space in C and C++. This change also moves any #include directives within the include guard in some of the compiler/codegen/arm header files. Change-Id: I9715e2c5301699d31886e61d0fe6e29483555a2a
|
d862faa2ceae186da5518607505eb942d634ced9 |
|
28-Apr-2011 |
Carl Shapiro <cshapiro@google.com> |
Get rid of uneeded extern, enum, typedef and struct qualifiers. Change-Id: I236c5a1553a51f82c9bc3eaaab042046c854d3b4
|
1813ab265f691e93401c7307c0b34247842ab35e |
|
16-Apr-2011 |
Carl Shapiro <cshapiro@google.com> |
Move the verifier and parts of the interpreter into C++. Change-Id: I8ce5fb558871d9709b251512dd01206be5ca8497
|
ae188c676c681e47a93ade7fdf0144099b470e03 |
|
08-Apr-2011 |
Carl Shapiro <cshapiro@google.com> |
Compile the garbage collector and heap profiler as C++. Change-Id: I25d8fa821987a3dd6d7109d07fd42dbf2fe0e589
|
99e3e6e72e3471eb85fc2e405866392b01c080fe |
|
29-Mar-2011 |
buzbee <buzbee@google.com> |
Fix interpreter debug attach Fix a few miscellaneous bugs from the interpreter restructuring that were causing a segfault on debugger attach. Added a sanity checking routine for debugging. Fixed a problem in which the JIT's threshold and on/off switch wouldn't get initialized properly on thread creation. Renamed dvmCompilerStateRefresh() to dvmCompilerUpdateGlobalState() to better reflect its function. Change-Id: I5b8af1ce2175e3c6f53cda19dd8e052a5f355587
|
9a3147c7412f4794434b4c2604aa2ba784867774 |
|
03-Mar-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring This is a restructuring of the Dalvik ARM and x86 interpreters: o Combine the old portstd and portdbg interpreters into a single portable interpreter. o Add debug/profiling support to the fast (mterp) interpreters. o Delete old mechansim of switching between interpreters. Now, once you choose an interpreter at startup, you stick with it. o Allow JIT to co-exist with profiling & debugging (necessary for first-class support of debugging with the JIT active). o Adds single-step capability to the fast assembly interpreters without slowing them down (and, in fact, measurably improves their performance). o Remove old "polling for safe point" mechanism. Breakouts now achieved via modifying base of interpreter handler table. o Simplify interpeter control mechanism. o Allow thread-granularity control for profiling & debugging The primary motivation behind this change was to improve the responsiveness of debugging and profiling and to make it easier to add new debugging and profiling capabilities in the future. Instead of always bailing out to the slow debug portable interpreter, we can now stay in the fast interpreter. A nice side effect of the change is that the fast interpreters got a healthy speed boost because we were able to replace the polling safepoint check that involved a dozen or so instructions with a single table-base reload. When combined with the two earlier CLs related to this restructuring, we show a 5.6% performance improvement using libdvm_interp.so on the Checkers benchmark relative to Honeycomb. Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
|
d82cebc13cf8ff110f2ad1a299782c9d1e8aada6 |
|
14-Mar-2011 |
buzbee <buzbee@google.com> |
Add trace description dump routine for debugging New dvmJitDumpTraceDescription prints out a trace descriptor. Change-Id: I613fe17cf75e4ee26ad976b15c2cf89ce4552192
|
385828e36ea70effe9aa18a954d008b1f7dc1d63 |
|
05-Mar-2011 |
Ben Cheng <bccheng@android.com> |
Handle relocatable class objects in JIT'ed code. 1) Split the original literal pool into class object literals and constants. Elements in the class object pool have to match the specicial values perfectly (ie no +delta space optimizations) since they might be relocated. 2) Implement dvmJitScanAllClassPointers(void (*callback)(void *)) which is the entry routine to report all memory locations in the code cache that contain class objects (ie class object pool and predicted chaining cells for virtual calls). 3) Major codegen changes on how/when the class object pool are populated and how predicted chains are patched. Before this change the compiler thread is always in the VM_WAIT state, which won't prevent GC from running. Since the class object pointers captured by a worker thread are no longer guaranteed to be stable at JIT time, change various internal data structures to capture the class descriptor/loader tuple instead. The conversion from descriptor/loader tuple to actual class object pointers are only performed when the thread state is RUNNING or at GC safe point. 4) Separate the class object installation phase out of the main dvmCompilerAssembleLIR routine so that the impact to blocking GC requests is minimal. Add new stats to report the potential block time. For example: Potential GC blocked by compiler: max 46 us / avg 25 us 5) Various cleanup in the trace structure walkup code. Modified the verbose print routine to show the class descriptor in the class literal pool. For example: D/dalvikvm( 1450): -------- end of chaining cells (0x007c) D/dalvikvm( 1450): 0x44020628 (00b4): .class (Lcom/android/unit_tests/PerformanceTests$EmptyClass;) D/dalvikvm( 1450): 0x4402062c (00b8): .word (0xaca8d1a5) D/dalvikvm( 1450): 0x44020630 (00bc): .word (0x401abc02) D/dalvikvm( 1450): End Bug: 3482956 Change-Id: I2e736b00d63adc255c33067544606b8b96b72ffc
|
9f601a917c8878204482c37aec7005054b6776fa |
|
12-Feb-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring: eliminate InterpState The key datastructure for the interpreter is InterpState. This change eliminates it, merging its data with the Thread structure. Here's why: In principio creavit Fadden Thread et InterpState. And it was good. Thread holds thread-private state, while InterpState captures data associated with a Dalvik interpreter activation. Because JNI calls can result in nested interpreter invocations, we can have more than one InterpState for each actual thread. InterpState was relatively small, and it all worked well. It was used enough that in the Arm version a register (rGLUE) was dedicated to it. Then, along came the JIT guys, who saw InterpState as a convenient place to dump all sorts of useful data that they wanted quick access to through that dedicated register. InterpState grew and grew. In terms of space, this wasn't a big problem - but it did mean that the initialization cost of each interpreter activation grew as well. For applications that do a lot of callbacks from native code into Dalvik, this is measurable. It's also mostly useless cost because much of the JIT-related InterpState initialization was setting up useful constants - things that don't need to be saved and restored all the time. The biggest problem, though, deals with thread control. When something interesting is happening that needs all threads to be stopped (such as GC and debugger attach), we have access to all of the Thread structures, but we don't have access to all of the InterpState structures (which may be buried/nested on the native stack). As a result, polling for thread suspension is done via a one-indirection pointer chase. InterpState itself can't hold the stop bits because we can't always find it, so instead it holds a pointer to the global or thread-specific stop control. Yuck. With this change, we eliminate InterpState and merge all needed data into Thread. Further, we replace the decidated rGLUE register with a pointer to the Thread structure (rSELF). The small subset of state data that needs to be saved and restored across nested interpreter activations is collected into a record that is saved to the interpreter frame, and restored on exit. Further, these small records are linked together to allow tracebacks to show nested activations. Old InterpState variables that simply contain useful constants are initialized once at thread creation time. This CL is large enough by itself that the new ability to streamline suspend checks is not done here - that will happen in a future CL. Here we just focus on consolidation. Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
1b3da59fff0c63770e10684e243a36f3d0218637 |
|
03-Feb-2011 |
Bill Buzbee <buzbee@google.com> |
[JIT] Fix for 3385583: Performance variance Closes a window in which the "interpret-only" templace could get chained to an existing trace while the intended translation was under construction. Note that this CL also introduces some small, but fundamental changes in trace formation: 1. Previouosly, when an exception or other trace terminating event occurred during trace formation, the entire trace was abandoned. With this change, we instead end the trace at the last successful instruction. 2. We previously allowed multiple attempts (perhaps by multiple threads) to form a trace compilation request for a dalvik PC. This was done in an attempt to allow recovery from compiler failures. Now we enforce a new rule: only the thread that wins the race to allocate an entry in the JitTable will form the trace request. 3. In a (probably misguided) attempt avoid unnecessary contention, we previously allowed work order enqueue requests to be dropped if a requester did not aquire TableLock on first attempt (assuming that if the trace were hot, it would be requested again). Now we block on enqueue. Change-Id: I40ea4f1b012250219ca37d5c40c5f22cae2092f1
|
cfdeca37fcaa27c37bad5077223e4d1e87f1182e |
|
14-Jan-2011 |
Ben Cheng <bccheng@android.com> |
Add runtime support for method based compilation. Enhanced code cache management to accommodate both trace and method compilations. Also implemented a hacky dispatch routine for virtual leaf methods. Microbenchmark showed 3x speedup in leaf method invocation. Change-Id: I79d95b7300ba993667b3aa221c1df9c7b0583521
|
2e152baec01433de9c63633ebc6f4adf1cea3a87 |
|
16-Dec-2010 |
buzbee <buzbee@google.com> |
[JIT] Trace profiling support In preparation for method compilation, this CL causes all traces to include two entry points: profiling and non-profiling. For now, the profiling entry will only be used if dalvik is run with -Xjitprofile, and largely works like it did before. The difference is that profiling support no longer requires the "assert" build - it's always there now. This will enable us to do a form of sampling profiling of traces in order to identify hot methods or hot trace groups, while keeping the overhead low by only switching profiling on periodically. To turn the periodic profiling on and off, we simply unchain all existing translations and set the appropriate global profile state. The underlying translation lookup and chaining utilties will examine the profile state to determine which entry point to use (i.e. - profiling or non-profiling) while the traces naturally rechain during further execution. Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
7a2697d327936e20ef5484f7819e2e4bf91c891f |
|
07-Jun-2010 |
Ben Cheng <bccheng@android.com> |
Implement method inlining for getters/setters Changes include: 1) Force the trace that ends with an invoke instruction to include the next instruction if it is a move-result (because both need to be turned into no-ops if callee is inlined). 2) Interpreter entry point/trace builder changes so that return target won't automatically be considered as trace starting points (to avoid duplicate traces that include the move result instructions). 3) Codegen changes to handle getters/setters invoked from both monomorphic and polymorphic callsites. 4) Extend/fix self-verification to form identical trace regions and handle traces with inlined callees. 5) Apply touchups to the method based parsing - still not in use. Change-Id: I116b934df01bf9ada6d5a25187510e352bccd13c
|
d5adae17d71e86a1a5f3ae7825054e3249fb7879 |
|
27-Mar-2010 |
Ben Cheng <bccheng@android.com> |
Improve JIT self verifier test coverage to follow single-step instructions. Bug: 2549326 Change-Id: I01412d4aac1379b61c90fe6e59c534b33be93f66
|
94e79ebaa340e8ba3bb4d13f5e908fef6d9d5eed |
|
05-Feb-2010 |
Ben Cheng <bccheng@android.com> |
Enable JIT parameters to be initialized in an architecture dependent way. The search for optimial value is still ongoing. The current settings are: v5 v7 JIT profile table 512 2048 JIT code cache 512K 1M JIT threshold 200 40
|
5e7bfdfc9598b00af4ced9d43fc797af6036b708 |
|
09-Feb-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Startup/Shutdown cleanup A legacy of early parallel Jit development was separate Startup & Shutdown code for the interpreter half of the jit (dvmJitStartup/dvmJitShutdown) and the compiler half (dvmCompilerStartup/dvmCompilerShutdown). This cl eliminates the dvmJit pair. Additionally, guard coded added to the framework callback to return immediately if the Jit isn't active.
|
96cfe6c39b91dabc78182e2f7676b27b4012886a |
|
09-Feb-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Startup/Shutdown cleanup A legacy of early parallel Jit development was separate Startup & Shutdown code for the interpreter half of the jit (dvmJitStartup/dvmJitShutdown) and the compiler half (dvmCompilerStartup/dvmCompilerShutdown). This cl eliminates the dvmJit pair. Additionally, guard coded added to the framework callback to return immediately if the Jit isn't active.
|
7b133ef7c84e68c3c4042176d830ea5b52e84139 |
|
05-Feb-2010 |
Ben Cheng <bccheng@android.com> |
Enable JIT parameters to be initialized in an architecture dependent way. The search for optimial value is still ongoing. The current settings are: v5 v7 JIT profile table 512 2048 JIT code cache 512K 1M JIT threshold 200 40
|
964a7b06a9134947b5985c7f712d18d57ed665d2 |
|
28-Jan-2010 |
Bill Buzbee <buzbee@google.com> |
Jit: Rework delayed start plus misc. cleanup Defer initialization of jit to support upcoming feature to wait until first screen is painted to start in order to avoid wasting effort on jit'ng initialization code. Timed delay in place for the moment. To change the on/off state, call dvmSuspendAllThreads(), update the value of gDvmJit.pJitTable and then dvmResumeAllThreads(). Each time a thread goes through the heavyweight check suspend path, returns from a monitor lock/unlock or returns from a JNI call, it will refresh its on/off state. Also: Recognize and handle failure to increase size of JitTable. Avoid repeated lock/unlock of JitTable modification mutex during resize Make all work order enqueue actions non-blocking, which includes adding a non-blocking mutex lock: dvmTryLockMutex(). Fix bug Jeff noticed where we were using a half-word form of a Thumb2 instruction rather than the byte form. Minor comment changes.
|
9797a237b48e880c33e2a2f497f48fb6f67c7a16 |
|
12-Jan-2010 |
Bill Buzbee <buzbee@google.com> |
Performance tweak for Jit lookup & adjust table sizes for better performance Also, move setting of Jit table parameters to architecture-specific init funciton.
|
60c24f436d603c564d5351a6f81821f12635733c |
|
04-Jan-2010 |
Ben Cheng <bccheng@google.com> |
Tear down the code cache when it is full and restart from scratch. Because the code cache may be wiped out after safe points now the patching of inline cache for predicted chains is done through the compiler thread's work queue.
|
72e93344b4d1ffc71e9c832ec23de0657e5b04a5 |
|
13-Nov-2009 |
Jean-Baptiste Queru <jbq@google.com> |
eclair snapshot
|
d726991ba52466cde88e37aba4de2395b62477fa |
|
10-Nov-2009 |
Bill Buzbee <buzbee@google.com> |
Jit stress mode: translate everything we can and self verify. This represents a general clean-up of some existing command-line options: -Xjitthreshold:num and -Xjitblocking. The jit threshold controls how quickly we treat a Dalvik address as a potential trace head. Normally this is set around 200 (and the range is 0..255, where 0 is in effect 256 and 1 means begin trace selection on first visit). -Xjitblocking forces the system to pause execution whenever a translation request is made and resume when that translation is complete. Normally the system make a request but continues execution (to avoid jitter). Additionally, if WITH_SELF_VERIFICATION is defined, we force blocking to be true, and set the threshold to 1. And finally, we treat threshold==1 as a special case and disable the 2nd-level trace-building filter - which causes the system to immediately start trace selection.
|
9a8c75adb2abf551d06dbf757bff558c1feded08 |
|
08-Nov-2009 |
Bill Buzbee <buzbee@google.com> |
Introduce "just interpret" chainable pseudo-translation. This is the first step towards enabling translation & self-cosim stress modes. When trace selection begins, the trace head address is pinned and remains in a limbo state until the translation is complete. Previously, if the trace selected aborted for any reason, the trace head would remain forever in limbo. This was not a correctness problem, but caused some small performance anomolies and made life more difficult for self-cosimulation mode. This CL introduces a pseudo-translation that simply routes control to the interpreter. When we detect that a trace selection attempt has failed, the trace head is associated with this fully-chainable pseudo-translation. This also has the benefit for self-cosimulation that we are guaranteed forward progress.
|
ccd6c0102d1f898aaea1c94761167fdd083b5275 |
|
15-Oct-2009 |
Ben Cheng <bccheng@google.com> |
Make the traige process for self-verification found divergence easier. 1. Automatically replay the code compilation with verbose mode turned on for the offending compilation. 3. Mark the registers with divergence explicitly. 2. Print accurate operand names using the dataflow attributes. Constant values are still printed for reference only. 3. Fixed a few correctness/style issues in self-verification code.
|
bcdc1ded149faa8dec51778db4047cae0862027f |
|
22-Aug-2009 |
Ben Cheng <bccheng@google.com> |
Updated expected outputs in dalvik benchmarks. Improved debugging output and added spin loop on detection of divergence in self verification tool.
|
97319a8a234e9fe1cf90ca39aa6eca37d729afd5 |
|
13-Aug-2009 |
Jeff Hao <jeffhao@google.com> |
New changes to enable self verification mode.
|
716f120d7f33ca18a5dcbef811399df0cbefe5d0 |
|
23-Jul-2009 |
Bill Buzbee <buzbee@google.com> |
First phase of restructuring to support THUMB2 & ARM traces Store some useful info about traces in JitTable entry; some general cleanup
|
50a6bf2f01efba0acbff9bb03e7ee09688553e08 |
|
08-Jul-2009 |
Bill Buzbee <buzbee@google.com> |
Inline-execute for Java.Lang.Math routines, jit codegen restructure, various bug fixes.
|
48f1824fd36241067e7bed2302cc00b2d880be7f |
|
20-Jun-2009 |
Bill Buzbee <buzbee@google.com> |
New threshold mechanism for trace selections. Intended to reduce number of junk traces.
|
6e963e1cfbaeac377fed3ba8d5715c1dccfc1a57 |
|
18-Jun-2009 |
Bill Buzbee <buzbee@google.com> |
Trace profiling support for the jit
|
2717622484eb0f7ad537275f7260b2f93324eda2 |
|
09-Jun-2009 |
Bill Buzbee <buzbee@google.com> |
Makes the primary Jit table growable. Also includes a change suggested earlier by Dan to use a pre-defined mask in the hash function. Reduce the default JitTable size from 2048 entries to 512 entries. Update per Ben's comments.
|
1efc9c5e4c5c4c2fccde18e5771c68d064c33bd3 |
|
09-Jun-2009 |
Ben Cheng <bccheng@android.com> |
Fix two codegen problems: out-of-bound PC-relative addresses and missing branch to the chaining cell at the end of non-branch-ending basic blocks.
|
ba4fc8bfc1bccae048403bd1cea3b869dca61dd7 |
|
01-Jun-2009 |
Ben Cheng <bccheng@android.com> |
Initial port of the Dalvik JIT enging to the internal repository. Fixed files with trailing spaces. Addressed review comments from Dan. Addressed review comments from fadden. Addressed review comments from Dan x 2. Addressed review comments from Dan x 3.
|