History log of /dalvik/vm/Profile.h
Revision Date Author Comments
9f640af99bebc6e96f6e1e9903557e2c8f567483 20-Jul-2011 Jeff Brown <jeffbrown@android.com> Add a dual clock profiler tracing mode.

Dalvik previously supported using either the thread-cpu clock
or the real time clock as the timebase for profiler traces.
This change adds a dual clock mode where both thread-cpu time
and real time timestamps are collected.

Using dual clock mode significantly improves TraceView ability
to accurately reconstruct the global timeline of events,
particularly on SMP systems.

For now, thread-cpu mode remains the default.

Dual clock mode can be enabled by running the following command
and restarting the system server.
adb shell setprop dalvik.vm.extra-opts -Xprofile:dualclock

Change-Id: I14db2ae93325ac01efcc8ed02e8747cc0e834e29
949c3ec207a7720fb47f7b3ca1f84dfcfd70aaa9 25-Jun-2011 Jeff Brown <jeffbrown@google.com> Add a dual clock profiler tracing mode.

Dalvik previously supported using either the thread-cpu clock
or the real time clock as the timebase for profiler traces.
This change adds a dual clock mode where both thread-cpu time
and real time timestamps are collected.

Using dual clock mode significantly improves TraceView ability
to accurately reconstruct the global timeline of events,
particularly on SMP systems.

For now, thread-cpu mode remains the default.

Dual clock mode can be enabled by running the following command
and restarting the system server.
adb shell setprop dalvik.vm.extra-opts -Xprofile:dualclock

Change-Id: I8c0d91a99aa6829dadea328e54dc1225d9827391
375fb116bcb817b37509ab579dbd55cdbb765cbf 15-Jun-2011 Carl Shapiro <cshapiro@google.com> Normalize the include guard style.

An leading underscore followed by a capital letter is a reserved
name space in C and C++.

This change also moves any #include directives within the include
guard in some of the compiler/codegen/arm header files.

Change-Id: I9715e2c5301699d31886e61d0fe6e29483555a2a
d862faa2ceae186da5518607505eb942d634ced9 28-Apr-2011 Carl Shapiro <cshapiro@google.com> Get rid of uneeded extern, enum, typedef and struct qualifiers.

Change-Id: I236c5a1553a51f82c9bc3eaaab042046c854d3b4
ae188c676c681e47a93ade7fdf0144099b470e03 08-Apr-2011 Carl Shapiro <cshapiro@google.com> Compile the garbage collector and heap profiler as C++.

Change-Id: I25d8fa821987a3dd6d7109d07fd42dbf2fe0e589
9a3147c7412f4794434b4c2604aa2ba784867774 03-Mar-2011 buzbee <buzbee@google.com> Interpreter restructuring

This is a restructuring of the Dalvik ARM and x86 interpreters:

o Combine the old portstd and portdbg interpreters into a single
portable interpreter.
o Add debug/profiling support to the fast (mterp) interpreters.
o Delete old mechansim of switching between interpreters. Now, once
you choose an interpreter at startup, you stick with it.
o Allow JIT to co-exist with profiling & debugging (necessary for
first-class support of debugging with the JIT active).
o Adds single-step capability to the fast assembly interpreters without
slowing them down (and, in fact, measurably improves their performance).
o Remove old "polling for safe point" mechanism. Breakouts now achieved
via modifying base of interpreter handler table.
o Simplify interpeter control mechanism.
o Allow thread-granularity control for profiling & debugging

The primary motivation behind this change was to improve the responsiveness
of debugging and profiling and to make it easier to add new debugging and
profiling capabilities in the future. Instead of always bailing out to the
slow debug portable interpreter, we can now stay in the fast interpreter.

A nice side effect of the change is that the fast interpreters
got a healthy speed boost because we were able to replace the
polling safepoint check that involved a dozen or so instructions
with a single table-base reload. When combined with the two earlier CLs
related to this restructuring, we show a 5.6% performance improvement
using libdvm_interp.so on the Checkers benchmark relative to Honeycomb.

Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
2ff04ab635eeba79c2dad82850c34188abcdfe62 09-Mar-2011 Andy McFadden <fadden@android.com> Fix method profiling

Moved a couple of things out of MethodTraceState so they don't get
zeroed after being initialized.

Also, rearranged the native method invocation path slightly so the
common case runs uninterrupted.

Change-Id: I0dad007a7f344d93f30444156e67f20bed6606a4
9f601a917c8878204482c37aec7005054b6776fa 12-Feb-2011 buzbee <buzbee@google.com> Interpreter restructuring: eliminate InterpState

The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.

Here's why:

In principio creavit Fadden Thread et InterpState. And it was good.

Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.

Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.

The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.

Yuck.

With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.

This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.

Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
18fba346582c08d81aa96d9508c0e935bad5f36f 20-Jan-2011 buzbee <buzbee@google.com> Support traceview-style profiling in all builds

This change builds on an earlier bccheng change that allowed JIT'd code
to avoid reverting to the debug portable interpeter when doing traceview-style
method profiling. That CL introduced a new traceview build (libdvm_traceview)
because the performance delta was too great to enable the capability for
all builds.

In this CL, we remove the libdvm_traceview build and provide full-speed
method tracing in all builds. This is done by introducing "_PROF"
versions of invoke and return templates used by the JIT. Normally, these
templates are not used, and performace in unaffected. However, when method
profiling is enabled, all existing translation are purged and new translations
are created using the _PROF templates. These templates introduce a
smallish performance penalty above and beyond the actual tracing cost, but
again are only used when tracing has been enabled.

Strictly speaking, there is a slight burden that is placed on invokes and
returns in the non-tracing case - on the order of an additional 3 or 4
cycles per invoke/return. Those operations are already heavyweight enough
that I was unable to measure the added cost in benchmarks.

Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
cb3081f675109049e63380170b60871e8275f9a8 14-Jan-2011 buzbee <buzbee@google.com> Consolidate mterp's debug/profile/suspend control

This is a step towards full debug & profiling support in JIT'd code.
Previously, the interpreter made multiple distinct checks for pending
suspend requests, debugger and profiler checks at each safe point.
This CL moves the individual controls into a single control word,
significantly speeding up the safe-point check code path in the common
fast case.

In short, any time some VM component wants control to break at a safe
point it will set a bit in gDvm.interpBreak, which will be examined
at the safe point check in footer.S. In the old code, the safe point
check consisted of 11 instructions (including 6 loads). The new sequence
is 6 instructions (4 loads - two of which are needed and two are
speculative to fill otherwise stalling slots).

This code path is hot enough in the interpreter that we actually see
some measureable speedups in benchmarks. The old sieve benchmark
improves from 252 to 256 (~1.5%).

As part of the change, global debuggerActive and activeProfilers variables
have been eliminated as redundant. Note also that there is a subtle
change in thread suspension. Thread suspend request counts are kept on
a per-thread basis, and previously each thread would only examine its own
suspend count. With this change, a bit has been allocated in interpBreak
to signify that at least one suspend request is active across all
threads. This bit is treated as "some thread is supposed to
suspend, check to see if it's me".

Change-Id: I527dc918f58d1486ef3324136080ef541a775ba8
dafced82f3cd926daecd14fb99cf4a7bbc11994f 22-Dec-2010 Carl Shapiro <cshapiro@google.com> Restore a few external allocation constants for compatibility.

Aspects of the external allocation facility were exposed through the
VMDebug getAllocCount method. In a previous change I removed all of
the references to external allocation from getAllocCount. This had
the unfortunate side effect of breaking some CTS tests and causing the
VM to abort if the old constants were provided to getAllocCount on an
asserts enabled dalvik build.

The straight forward workaround seems to be to restore the special
treatment of these values in getAllocCount for as long as we support
the public interfaces of the external allocation facility. An easier
way out may have been to make the failure case of getAllocCount return
0 instead of -1 and aborting on an asserts build. Without some
analysis of API usage in market I would prefer to not change the -1
return value to 0 as it seems the thread counts currently return -1.

This change also eliminates the conditional export of the enum values
related to external allocation. Those values are published API so it
makes no sense to maintain a way to guard their inclusion.

Change-Id: I49c173e0ec305536760c7aec15eebdc29213fc56
e7bdd8b8c6f3aae552b333d0bd9664ef5e63f0a0 18-Dec-2010 Carl Shapiro <cshapiro@google.com> Remove the external allocation facility.

Change-Id: Iff508a9173382f29c67ca9e6eb6f65855dce0be4
5cc61d70ec727aa22f58463bf7940cc717cf3eb1 31-Aug-2010 Ben Cheng <bccheng@android.com> Collect method traces with the fast interpreter and the JIT'ed code.

Insert inline code instead of switching to the debug interpreter in the hope
that the time stamps collected in traceview are more close to the real
world behavior with minimal profiling overhead.

Because the inline polling still introduces additional overhead (20% ~ 100%),
it is only enabled in the special VM build called "libdvm_traceview.so".
It won't work on the emulator because it is not implemented to collect the
detailed instruction traces.

Here are some performance numbers using the FibonacciSlow microbenchmark
(ie recursive workloads / the shorter the faster):

time: configuration
8,162,602: profiling off/libdvm.so/JIT off
2,801,829: profiling off/libdvm.so/JIT on
9,952,236: profiling off/libdvm_traceview.so/JIT off
4,465,701: profiling off/libdvm_traceview.so/JIT on
164,786,585: profiling on/libdvm.so/JIT off
164,664,634: profiling on/libdvm.so/JIT on
11,231,707: profiling on/libdvm_traceview.so/JIT off
8,427,846: profiling on/libdvm_traceview.so/JIT on

Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline
performance number of 2,801,829, the new libdvm_traceview.so improves the time
skew from 58x to 3x.

Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
0d615c3ce5bf97ae65b9347ee77968f38620d5e8 18-Aug-2010 Andy McFadden <fadden@android.com> Always support debugging and profiling.

This eliminates the use of the WITH_DEBUGGER and WITH_PROFILER
conditional compilation flags. We've never shipped a device without
these features, and it's unlikely we ever will. They're not worth
the code clutter they cause.

As usual, since I can't test the x86-atom code I left that alone and
added an item to the TODO list.

Bug 2923442.

Change-Id: I335ebd5193bc86f7641513b1b41c0378839be1fe
fc3d31683a0120ba005f45f98dcbe1001064dafb 05-Aug-2010 Andy McFadden <fadden@android.com> More SMP fixes.

Convert some ANDROID_MEMBAR_FULL uses into equivalent atomic ops. A
couple of "bool" had to convert to "int" since we don't have atomic
ops for bools.

Replaced a local implementation of atomic inc with a call to the
atomic inc function.

Change-Id: I948b8080d743552bde014d3a6e716ed2c30ebef8
e15a8eb2653da80c1c3816ddce8186746b57b4a3 23-Feb-2010 Andy McFadden <fadden@android.com> Add class init stats to alloc counters (API change).

Add calls to retrieve class initialization stats via the allocation
count mechanism.

Also: deprecate a method that is never used, and a redundantly declared
default filename that begins with "/sdcard".

For bug 2461549.
0171812e59e2520a4345b9bbadd4f7afa0a1de16 23-Jan-2010 Andy McFadden <fadden@android.com> Add streaming method profiling support.

The goal is to allow DDMS to start/stop method profiling in apps that
don't have permission to write to /sdcard. Instead of writing the
profiling data to disk and then pulling it off, we just blast the whole
thing straight from memory.

This includes:

- New method tracing start call (startMethodTracingDdms).
- Rearrangement of existing VMDebug method tracing calls for sanity.
- Addition of "vector" chunk send function, with corresponding
update to the JDWP transport function.
- Reshuffled the method trace start interlock, which seemed racy.
- Post new method-trace-profiling-streaming feature to DDMS.

Also:

- Added an internal exception-throw function that allows a printf
format string, so we can put useful detail into exception messages.

For bug 2160407.
0f0ae023a3a53f7c9e254283b50a0099781acb79 24-Jun-2009 Dianne Hackborn <hackbod@google.com> Add FileDescriptor variation of startMethodTracing().

This is for bug #1829561 ("am profile" with bad filename kills process), which
will allow the am command to take care of opening the file and handing the
resulting fd over to the process to be profiled.
8c880b9e903504fa9c61d9964ba2379f0e060af5 25-Mar-2009 Andy McFadden <> Automated import from //branches/donutburger/...@140700,140700
99409883d9c4c0ffb49b070ce307bb33a9dfe9f1 19-Mar-2009 The Android Open Source Project <initial-contribution@android.com> auto import //branches/master/...@140412
f6c387128427e121477c1b32ad35cdcaa5101ba3 04-Mar-2009 The Android Open Source Project <initial-contribution@android.com> auto import from //depot/cupcake/@135843
f72d5de56a522ac3be03873bdde26f23a5eeeb3c 04-Mar-2009 The Android Open Source Project <initial-contribution@android.com> auto import from //depot/cupcake/@135843
2ad60cfc28e14ee8f0bb038720836a4696c478ad 21-Oct-2008 The Android Open Source Project <initial-contribution@android.com> Initial Contribution