9f640af99bebc6e96f6e1e9903557e2c8f567483 |
|
20-Jul-2011 |
Jeff Brown <jeffbrown@android.com> |
Add a dual clock profiler tracing mode. Dalvik previously supported using either the thread-cpu clock or the real time clock as the timebase for profiler traces. This change adds a dual clock mode where both thread-cpu time and real time timestamps are collected. Using dual clock mode significantly improves TraceView ability to accurately reconstruct the global timeline of events, particularly on SMP systems. For now, thread-cpu mode remains the default. Dual clock mode can be enabled by running the following command and restarting the system server. adb shell setprop dalvik.vm.extra-opts -Xprofile:dualclock Change-Id: I14db2ae93325ac01efcc8ed02e8747cc0e834e29
|
949c3ec207a7720fb47f7b3ca1f84dfcfd70aaa9 |
|
25-Jun-2011 |
Jeff Brown <jeffbrown@google.com> |
Add a dual clock profiler tracing mode. Dalvik previously supported using either the thread-cpu clock or the real time clock as the timebase for profiler traces. This change adds a dual clock mode where both thread-cpu time and real time timestamps are collected. Using dual clock mode significantly improves TraceView ability to accurately reconstruct the global timeline of events, particularly on SMP systems. For now, thread-cpu mode remains the default. Dual clock mode can be enabled by running the following command and restarting the system server. adb shell setprop dalvik.vm.extra-opts -Xprofile:dualclock Change-Id: I8c0d91a99aa6829dadea328e54dc1225d9827391
|
375fb116bcb817b37509ab579dbd55cdbb765cbf |
|
15-Jun-2011 |
Carl Shapiro <cshapiro@google.com> |
Normalize the include guard style. An leading underscore followed by a capital letter is a reserved name space in C and C++. This change also moves any #include directives within the include guard in some of the compiler/codegen/arm header files. Change-Id: I9715e2c5301699d31886e61d0fe6e29483555a2a
|
d862faa2ceae186da5518607505eb942d634ced9 |
|
28-Apr-2011 |
Carl Shapiro <cshapiro@google.com> |
Get rid of uneeded extern, enum, typedef and struct qualifiers. Change-Id: I236c5a1553a51f82c9bc3eaaab042046c854d3b4
|
ae188c676c681e47a93ade7fdf0144099b470e03 |
|
08-Apr-2011 |
Carl Shapiro <cshapiro@google.com> |
Compile the garbage collector and heap profiler as C++. Change-Id: I25d8fa821987a3dd6d7109d07fd42dbf2fe0e589
|
9a3147c7412f4794434b4c2604aa2ba784867774 |
|
03-Mar-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring This is a restructuring of the Dalvik ARM and x86 interpreters: o Combine the old portstd and portdbg interpreters into a single portable interpreter. o Add debug/profiling support to the fast (mterp) interpreters. o Delete old mechansim of switching between interpreters. Now, once you choose an interpreter at startup, you stick with it. o Allow JIT to co-exist with profiling & debugging (necessary for first-class support of debugging with the JIT active). o Adds single-step capability to the fast assembly interpreters without slowing them down (and, in fact, measurably improves their performance). o Remove old "polling for safe point" mechanism. Breakouts now achieved via modifying base of interpreter handler table. o Simplify interpeter control mechanism. o Allow thread-granularity control for profiling & debugging The primary motivation behind this change was to improve the responsiveness of debugging and profiling and to make it easier to add new debugging and profiling capabilities in the future. Instead of always bailing out to the slow debug portable interpreter, we can now stay in the fast interpreter. A nice side effect of the change is that the fast interpreters got a healthy speed boost because we were able to replace the polling safepoint check that involved a dozen or so instructions with a single table-base reload. When combined with the two earlier CLs related to this restructuring, we show a 5.6% performance improvement using libdvm_interp.so on the Checkers benchmark relative to Honeycomb. Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
|
2ff04ab635eeba79c2dad82850c34188abcdfe62 |
|
09-Mar-2011 |
Andy McFadden <fadden@android.com> |
Fix method profiling Moved a couple of things out of MethodTraceState so they don't get zeroed after being initialized. Also, rearranged the native method invocation path slightly so the common case runs uninterrupted. Change-Id: I0dad007a7f344d93f30444156e67f20bed6606a4
|
9f601a917c8878204482c37aec7005054b6776fa |
|
12-Feb-2011 |
buzbee <buzbee@google.com> |
Interpreter restructuring: eliminate InterpState The key datastructure for the interpreter is InterpState. This change eliminates it, merging its data with the Thread structure. Here's why: In principio creavit Fadden Thread et InterpState. And it was good. Thread holds thread-private state, while InterpState captures data associated with a Dalvik interpreter activation. Because JNI calls can result in nested interpreter invocations, we can have more than one InterpState for each actual thread. InterpState was relatively small, and it all worked well. It was used enough that in the Arm version a register (rGLUE) was dedicated to it. Then, along came the JIT guys, who saw InterpState as a convenient place to dump all sorts of useful data that they wanted quick access to through that dedicated register. InterpState grew and grew. In terms of space, this wasn't a big problem - but it did mean that the initialization cost of each interpreter activation grew as well. For applications that do a lot of callbacks from native code into Dalvik, this is measurable. It's also mostly useless cost because much of the JIT-related InterpState initialization was setting up useful constants - things that don't need to be saved and restored all the time. The biggest problem, though, deals with thread control. When something interesting is happening that needs all threads to be stopped (such as GC and debugger attach), we have access to all of the Thread structures, but we don't have access to all of the InterpState structures (which may be buried/nested on the native stack). As a result, polling for thread suspension is done via a one-indirection pointer chase. InterpState itself can't hold the stop bits because we can't always find it, so instead it holds a pointer to the global or thread-specific stop control. Yuck. With this change, we eliminate InterpState and merge all needed data into Thread. Further, we replace the decidated rGLUE register with a pointer to the Thread structure (rSELF). The small subset of state data that needs to be saved and restored across nested interpreter activations is collected into a record that is saved to the interpreter frame, and restored on exit. Further, these small records are linked together to allow tracebacks to show nested activations. Old InterpState variables that simply contain useful constants are initialized once at thread creation time. This CL is large enough by itself that the new ability to streamline suspend checks is not done here - that will happen in a future CL. Here we just focus on consolidation. Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
18fba346582c08d81aa96d9508c0e935bad5f36f |
|
20-Jan-2011 |
buzbee <buzbee@google.com> |
Support traceview-style profiling in all builds This change builds on an earlier bccheng change that allowed JIT'd code to avoid reverting to the debug portable interpeter when doing traceview-style method profiling. That CL introduced a new traceview build (libdvm_traceview) because the performance delta was too great to enable the capability for all builds. In this CL, we remove the libdvm_traceview build and provide full-speed method tracing in all builds. This is done by introducing "_PROF" versions of invoke and return templates used by the JIT. Normally, these templates are not used, and performace in unaffected. However, when method profiling is enabled, all existing translation are purged and new translations are created using the _PROF templates. These templates introduce a smallish performance penalty above and beyond the actual tracing cost, but again are only used when tracing has been enabled. Strictly speaking, there is a slight burden that is placed on invokes and returns in the non-tracing case - on the order of an additional 3 or 4 cycles per invoke/return. Those operations are already heavyweight enough that I was unable to measure the added cost in benchmarks. Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
|
cb3081f675109049e63380170b60871e8275f9a8 |
|
14-Jan-2011 |
buzbee <buzbee@google.com> |
Consolidate mterp's debug/profile/suspend control This is a step towards full debug & profiling support in JIT'd code. Previously, the interpreter made multiple distinct checks for pending suspend requests, debugger and profiler checks at each safe point. This CL moves the individual controls into a single control word, significantly speeding up the safe-point check code path in the common fast case. In short, any time some VM component wants control to break at a safe point it will set a bit in gDvm.interpBreak, which will be examined at the safe point check in footer.S. In the old code, the safe point check consisted of 11 instructions (including 6 loads). The new sequence is 6 instructions (4 loads - two of which are needed and two are speculative to fill otherwise stalling slots). This code path is hot enough in the interpreter that we actually see some measureable speedups in benchmarks. The old sieve benchmark improves from 252 to 256 (~1.5%). As part of the change, global debuggerActive and activeProfilers variables have been eliminated as redundant. Note also that there is a subtle change in thread suspension. Thread suspend request counts are kept on a per-thread basis, and previously each thread would only examine its own suspend count. With this change, a bit has been allocated in interpBreak to signify that at least one suspend request is active across all threads. This bit is treated as "some thread is supposed to suspend, check to see if it's me". Change-Id: I527dc918f58d1486ef3324136080ef541a775ba8
|
dafced82f3cd926daecd14fb99cf4a7bbc11994f |
|
22-Dec-2010 |
Carl Shapiro <cshapiro@google.com> |
Restore a few external allocation constants for compatibility. Aspects of the external allocation facility were exposed through the VMDebug getAllocCount method. In a previous change I removed all of the references to external allocation from getAllocCount. This had the unfortunate side effect of breaking some CTS tests and causing the VM to abort if the old constants were provided to getAllocCount on an asserts enabled dalvik build. The straight forward workaround seems to be to restore the special treatment of these values in getAllocCount for as long as we support the public interfaces of the external allocation facility. An easier way out may have been to make the failure case of getAllocCount return 0 instead of -1 and aborting on an asserts build. Without some analysis of API usage in market I would prefer to not change the -1 return value to 0 as it seems the thread counts currently return -1. This change also eliminates the conditional export of the enum values related to external allocation. Those values are published API so it makes no sense to maintain a way to guard their inclusion. Change-Id: I49c173e0ec305536760c7aec15eebdc29213fc56
|
e7bdd8b8c6f3aae552b333d0bd9664ef5e63f0a0 |
|
18-Dec-2010 |
Carl Shapiro <cshapiro@google.com> |
Remove the external allocation facility. Change-Id: Iff508a9173382f29c67ca9e6eb6f65855dce0be4
|
5cc61d70ec727aa22f58463bf7940cc717cf3eb1 |
|
31-Aug-2010 |
Ben Cheng <bccheng@android.com> |
Collect method traces with the fast interpreter and the JIT'ed code. Insert inline code instead of switching to the debug interpreter in the hope that the time stamps collected in traceview are more close to the real world behavior with minimal profiling overhead. Because the inline polling still introduces additional overhead (20% ~ 100%), it is only enabled in the special VM build called "libdvm_traceview.so". It won't work on the emulator because it is not implemented to collect the detailed instruction traces. Here are some performance numbers using the FibonacciSlow microbenchmark (ie recursive workloads / the shorter the faster): time: configuration 8,162,602: profiling off/libdvm.so/JIT off 2,801,829: profiling off/libdvm.so/JIT on 9,952,236: profiling off/libdvm_traceview.so/JIT off 4,465,701: profiling off/libdvm_traceview.so/JIT on 164,786,585: profiling on/libdvm.so/JIT off 164,664,634: profiling on/libdvm.so/JIT on 11,231,707: profiling on/libdvm_traceview.so/JIT off 8,427,846: profiling on/libdvm_traceview.so/JIT on Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline performance number of 2,801,829, the new libdvm_traceview.so improves the time skew from 58x to 3x. Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
|
0d615c3ce5bf97ae65b9347ee77968f38620d5e8 |
|
18-Aug-2010 |
Andy McFadden <fadden@android.com> |
Always support debugging and profiling. This eliminates the use of the WITH_DEBUGGER and WITH_PROFILER conditional compilation flags. We've never shipped a device without these features, and it's unlikely we ever will. They're not worth the code clutter they cause. As usual, since I can't test the x86-atom code I left that alone and added an item to the TODO list. Bug 2923442. Change-Id: I335ebd5193bc86f7641513b1b41c0378839be1fe
|
fc3d31683a0120ba005f45f98dcbe1001064dafb |
|
05-Aug-2010 |
Andy McFadden <fadden@android.com> |
More SMP fixes. Convert some ANDROID_MEMBAR_FULL uses into equivalent atomic ops. A couple of "bool" had to convert to "int" since we don't have atomic ops for bools. Replaced a local implementation of atomic inc with a call to the atomic inc function. Change-Id: I948b8080d743552bde014d3a6e716ed2c30ebef8
|
e15a8eb2653da80c1c3816ddce8186746b57b4a3 |
|
23-Feb-2010 |
Andy McFadden <fadden@android.com> |
Add class init stats to alloc counters (API change). Add calls to retrieve class initialization stats via the allocation count mechanism. Also: deprecate a method that is never used, and a redundantly declared default filename that begins with "/sdcard". For bug 2461549.
|
0171812e59e2520a4345b9bbadd4f7afa0a1de16 |
|
23-Jan-2010 |
Andy McFadden <fadden@android.com> |
Add streaming method profiling support. The goal is to allow DDMS to start/stop method profiling in apps that don't have permission to write to /sdcard. Instead of writing the profiling data to disk and then pulling it off, we just blast the whole thing straight from memory. This includes: - New method tracing start call (startMethodTracingDdms). - Rearrangement of existing VMDebug method tracing calls for sanity. - Addition of "vector" chunk send function, with corresponding update to the JDWP transport function. - Reshuffled the method trace start interlock, which seemed racy. - Post new method-trace-profiling-streaming feature to DDMS. Also: - Added an internal exception-throw function that allows a printf format string, so we can put useful detail into exception messages. For bug 2160407.
|
0f0ae023a3a53f7c9e254283b50a0099781acb79 |
|
24-Jun-2009 |
Dianne Hackborn <hackbod@google.com> |
Add FileDescriptor variation of startMethodTracing(). This is for bug #1829561 ("am profile" with bad filename kills process), which will allow the am command to take care of opening the file and handing the resulting fd over to the process to be profiled.
|
8c880b9e903504fa9c61d9964ba2379f0e060af5 |
|
25-Mar-2009 |
Andy McFadden <> |
Automated import from //branches/donutburger/...@140700,140700
|
99409883d9c4c0ffb49b070ce307bb33a9dfe9f1 |
|
19-Mar-2009 |
The Android Open Source Project <initial-contribution@android.com> |
auto import //branches/master/...@140412
|
f6c387128427e121477c1b32ad35cdcaa5101ba3 |
|
04-Mar-2009 |
The Android Open Source Project <initial-contribution@android.com> |
auto import from //depot/cupcake/@135843
|
f72d5de56a522ac3be03873bdde26f23a5eeeb3c |
|
04-Mar-2009 |
The Android Open Source Project <initial-contribution@android.com> |
auto import from //depot/cupcake/@135843
|
2ad60cfc28e14ee8f0bb038720836a4696c478ad |
|
21-Oct-2008 |
The Android Open Source Project <initial-contribution@android.com> |
Initial Contribution
|