History log of /dalvik/vm/compiler/template/armv5te/TEMPLATE_INVOKE_METHOD_NATIVE.S
Revision Date Author Comments
8b095215a4d5bde723819087f3455bdcc250a78f 20-Jun-2012 David Butcher <david.butcher@arm.com> Switched code to blx <reg>

ldr ip,<addr> blx ip is preferred over mov lr,pc ldr pc,<addr> from armv5te,
and will typically perform better on later ARM processors.

Change-Id: I8f2e5e794c644faafd767037ad56579f2934de47
30bc0d46ae730d78c42c39cfa56a59ba3025380b 22-Apr-2011 buzbee <buzbee@google.com> Consolidate curFrame fields in thread storage

We ended up with two locations in the Thread structure for saved
Dalvik frame pointer. This change consolidates them.

Change-Id: I78f288e4e57e232f29663be930101e775bfe370f
9a3147c7412f4794434b4c2604aa2ba784867774 03-Mar-2011 buzbee <buzbee@google.com> Interpreter restructuring

This is a restructuring of the Dalvik ARM and x86 interpreters:

o Combine the old portstd and portdbg interpreters into a single
portable interpreter.
o Add debug/profiling support to the fast (mterp) interpreters.
o Delete old mechansim of switching between interpreters. Now, once
you choose an interpreter at startup, you stick with it.
o Allow JIT to co-exist with profiling & debugging (necessary for
first-class support of debugging with the JIT active).
o Adds single-step capability to the fast assembly interpreters without
slowing them down (and, in fact, measurably improves their performance).
o Remove old "polling for safe point" mechanism. Breakouts now achieved
via modifying base of interpreter handler table.
o Simplify interpeter control mechanism.
o Allow thread-granularity control for profiling & debugging

The primary motivation behind this change was to improve the responsiveness
of debugging and profiling and to make it easier to add new debugging and
profiling capabilities in the future. Instead of always bailing out to the
slow debug portable interpreter, we can now stay in the fast interpreter.

A nice side effect of the change is that the fast interpreters
got a healthy speed boost because we were able to replace the
polling safepoint check that involved a dozen or so instructions
with a single table-base reload. When combined with the two earlier CLs
related to this restructuring, we show a 5.6% performance improvement
using libdvm_interp.so on the Checkers benchmark relative to Honeycomb.

Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
9f601a917c8878204482c37aec7005054b6776fa 12-Feb-2011 buzbee <buzbee@google.com> Interpreter restructuring: eliminate InterpState

The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.

Here's why:

In principio creavit Fadden Thread et InterpState. And it was good.

Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.

Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.

The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.

Yuck.

With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.

This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.

Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
18fba346582c08d81aa96d9508c0e935bad5f36f 20-Jan-2011 buzbee <buzbee@google.com> Support traceview-style profiling in all builds

This change builds on an earlier bccheng change that allowed JIT'd code
to avoid reverting to the debug portable interpeter when doing traceview-style
method profiling. That CL introduced a new traceview build (libdvm_traceview)
because the performance delta was too great to enable the capability for
all builds.

In this CL, we remove the libdvm_traceview build and provide full-speed
method tracing in all builds. This is done by introducing "_PROF"
versions of invoke and return templates used by the JIT. Normally, these
templates are not used, and performace in unaffected. However, when method
profiling is enabled, all existing translation are purged and new translations
are created using the _PROF templates. These templates introduce a
smallish performance penalty above and beyond the actual tracing cost, but
again are only used when tracing has been enabled.

Strictly speaking, there is a slight burden that is placed on invokes and
returns in the non-tracing case - on the order of an additional 3 or 4
cycles per invoke/return. Those operations are already heavyweight enough
that I was unable to measure the added cost in benchmarks.

Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
d88756df5b4dbc6fd450afd0019a5f64ebe4432d 22-Oct-2010 Elliott Hughes <enh@google.com> Remove junk from platform.S now armv4t is gone.

Change-Id: I30079aacc753c89cfc3a3f64bd900a0cc858d65f
c8293e7dfe856ca95e27aef1ac2e64d750d60662 12-Oct-2010 Ben Cheng <bccheng@android.com> Fine-tune the instructions on the method invocation path.

1) Initialize the register and out sizes for callee methods through
constant moves.
2) Eliminate an unnecessary load of Dalvik PC for chained and
native callees.

Improved method invocation performance by ~3%.

Change-Id: Iead1276eed0ba527e82eb876f08d169ab9b496b2
5cc61d70ec727aa22f58463bf7940cc717cf3eb1 31-Aug-2010 Ben Cheng <bccheng@android.com> Collect method traces with the fast interpreter and the JIT'ed code.

Insert inline code instead of switching to the debug interpreter in the hope
that the time stamps collected in traceview are more close to the real
world behavior with minimal profiling overhead.

Because the inline polling still introduces additional overhead (20% ~ 100%),
it is only enabled in the special VM build called "libdvm_traceview.so".
It won't work on the emulator because it is not implemented to collect the
detailed instruction traces.

Here are some performance numbers using the FibonacciSlow microbenchmark
(ie recursive workloads / the shorter the faster):

time: configuration
8,162,602: profiling off/libdvm.so/JIT off
2,801,829: profiling off/libdvm.so/JIT on
9,952,236: profiling off/libdvm_traceview.so/JIT off
4,465,701: profiling off/libdvm_traceview.so/JIT on
164,786,585: profiling on/libdvm.so/JIT off
164,664,634: profiling on/libdvm.so/JIT on
11,231,707: profiling on/libdvm_traceview.so/JIT off
8,427,846: profiling on/libdvm_traceview.so/JIT on

Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline
performance number of 2,801,829, the new libdvm_traceview.so improves the time
skew from 58x to 3x.

Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
7365493ad8d360c1dcf9cd8b6eee62747af01cae 09-Jun-2010 Carl Shapiro <cshapiro@google.com> Remove repeated newlines at the end of files.

Change-Id: I1e3d103a7b932ef21acedb6438c0f26b315df28f
fbdcfb9ea9e2a78f295834424c3f24986ea45dac 29-May-2010 Brian Carlstrom <bdc@google.com> Merge remote branch 'goog/dalvik-dev' into dalvik-dev-to-master

Change-Id: I0c0edb3ebf0d5e040d6bbbf60269fab0deb70ef9
978738d2cbf9d08fa78c65762eaac3351ab76b9a 13-May-2010 Ben Cheng <bccheng@android.com> Add counters to track JIT inline cache hit rate and code cache patch counts.

Also did some WITH_JIT_TUNING cleanup.

Change-Id: I8bb2d681a06b0f2af1f976a007326825a88cea38
a62475ecfcc80c58add8f153c9605762dafb8227 30-Apr-2010 Ben Cheng <bccheng@android.com> Use unsigned comparison for stack pointers.

Bug: 2613607
Change-Id: I6a8abd69fbf9cb9f8ec9d9febf1ea42fd631fe9c
86717f79d9b018f4d69cc991075fa36611f234e5 06-Mar-2010 Ben Cheng <bccheng@android.com> Collect more JIT stats in the assert build.

New stuff includes breakdown of callsite types (ie monomorphic vs polymorphic
vs monoporphic resolved to native), total time spent in JIT'ing, and average
JIT time per compilation.

Example output:
D/dalvikvm( 840): 4042 compilations using 1976 + 329108 bytes
D/dalvikvm( 840): Compiler arena uses 10 blocks (8100 bytes each)
D/dalvikvm( 840): Compiler work queue length is 0/36
D/dalvikvm( 840): size if 8192, entries used is 4137
D/dalvikvm( 840): JIT: 4137 traces, 8192 slots, 1099 chains, 40 thresh, Non-blocking
D/dalvikvm( 840): JIT: Lookups: 1128780 hits, 168564 misses; 179520 normal, 6 punt
D/dalvikvm( 840): JIT: noChainExit: 528464 IC miss, 194708 interp callsite, 0 switch overflow
D/dalvikvm( 840): JIT: Invoke: 507 mono, 988 poly, 72 native, 1038 return
D/dalvikvm( 840): JIT: Total compilation time: 2342 ms
D/dalvikvm( 840): JIT: Avg unit compilation time: 579 us
D/dalvikvm( 840): JIT: 3357 Translation chains, 97 interp stubs
D/dalvikvm( 840): dalvik.vm.jit.op = 0-2,4-5,7-8,a-c,e-16,19-1a,1c-23,26,28-29,2b-2f,31-3d,44-4b,4d-51,60,62-63,68-69,70-72,76-78,7b,81-82,84,87,89,8d-93,95-98,a1,a3,a6,a8-a9,b0-b3,b5-b6,bb-bf,c6-c8,d0,d2-d6,d8,da-e2,ee-f0,f2-fb,
D/dalvikvm( 840): Code size stats: 50666/105126 (compiled/total Dalvik), 329108 (native)
40094c16d9727cc1e047a7d4bddffe04dd566211 25-Feb-2010 Ben Cheng <bccheng@android.com> Tweak the interpreter entries and 2nd level trace filter to capture more traces.

Real changes:
1) Add a new entry point from JIT to the interpreter to request hot traces w/o
doing chaining.
2) Increase the granularity of the secondary profile filter to match 64-byte
chunks using 64 entries.

The remaining are just cosmetic changes.
7a0bcd0de6c4da6499a088a18d1750e51204c2a6 23-Jan-2010 Ben Cheng <bccheng@android.com> Tighten the safe points for code cache resets to happen.

Add a new flag in the Thread struct to track the whereabout of the top frame
in each Java thread. It is not safe to blow away the code cache if any thread
is in the JIT'ed land.
60c24f436d603c564d5351a6f81821f12635733c 04-Jan-2010 Ben Cheng <bccheng@google.com> Tear down the code cache when it is full and restart from scratch.

Because the code cache may be wiped out after safe points now the patching of
inline cache for predicted chains is done through the compiler thread's work
queue.
909b418219f63c0d0b2bde8a0835dbf27d5061b8 03-Dec-2009 Bill Buzbee <buzbee@google.com> Jit: Fix for 2187020, bad exception recovery from native invoke static
72e93344b4d1ffc71e9c832ec23de0657e5b04a5 13-Nov-2009 Jean-Baptiste Queru <jbq@google.com> eclair snapshot
d5ab726b65d7271be261864c7e224fb90bfe06e0 25-Aug-2009 Andy McFadden <fadden@android.com> Another round of scary indirect ref changes.

This change adds a not-really-working implementation to Jni.c, with
various changes #ifdefed throughout the code. The ifdef is currently
disabled, so the old behavior should continue. Eventually the old
version will be stripped out and the ifdefs removed.

This renames the stack's "localRefTop" field, which nudged a bunch of
code. The name wasn't really right before (it's the *bottom* of the
local references), and it's even less right now. This and one other
mterp-visible constant were changed, which caused some ripples through
mterp and the JIT, but the ifdeffing was limited to one in
asm-constants.h (and the constant is the same both ways, so toggling the
ifdef won't require rebuilding asm sources).

Some comments and arg names in ReferenceTable were updated for the
correct orientation of bottom vs. top.

Some adjustments were made to the JNI code, e.g. dvmCallMethod now needs
to understand if it needs to convert reference arguments from
local/global refs to pointers (it's called from various places
throughout the VM).
97319a8a234e9fe1cf90ca39aa6eca37d729afd5 13-Aug-2009 Jeff Hao <jeffhao@google.com> New changes to enable self verification mode.
38329f5678fd7a4879528b02a0ab60322d38a897 07-Jul-2009 Ben Cheng <bccheng@android.com> Improved method invocation performance: 1.5x for virtual and 2.8x for interface.

- Implemented predicted chaining for invoke virtual and interface.
- Eliminated a little bit of fat for invoke native.
- Added 078-polymorphic-virtual for stress tests.