History log of /dalvik/vm/compiler/template/out/CompilerTemplateAsm-armv7-a-neon.S
Revision Date Author Comments
547a2b20f442ac0310e3e78cbf614bb2ed6f1e58 04-Apr-2013 Bill Buzbee <buzbee@google.com> Revert "Tiny optimization for complier templates for arm."

This reverts commit 87bc7988cdb4e61421a3e701e84f7070f603635d

No obvious problems with this change - but reverting to aid in tracking down bug:

8543495 NCs in Play Store : >>> com.android.vending

Change-Id: I8bd6dbe6a7b3a4650a5e857a5a529cde6569b987
4afb260cf1f312382541e30cab5766bff890e6fe 04-Apr-2013 Bill Buzbee <buzbee@google.com> Revert "Tiny optimization for complier templates for arm."

This reverts commit 87bc7988cdb4e61421a3e701e84f7070f603635d

No obvious problems with this change - but reverting to aid in tracking down bug:

8543495 NCs in Play Store : >>> com.android.vending

Change-Id: I8bd6dbe6a7b3a4650a5e857a5a529cde6569b987
87bc7988cdb4e61421a3e701e84f7070f603635d 02-Apr-2013 You Kim <you.kim72@gmail.com> Tiny optimization for complier templates for arm.

1. Remove possible bubble in TEMPLATE_STRING_INDEXOF.S
2. Remove 1 instruction and reorder the opcodes
TEMPLATE_MUL_LONG.S
3. Reorder ldr r2 instruction in TEMPLATE_RETURN.S

(cherry-pick of a2dc68acd954827cdc67929a859354e5ed9b5713.)

Change-Id: I78b9797aff3c2255c5d34a8391b1a94a1b09b613
5dfcc78af479937ba8dafceefd9b1931a88dfaaf 11-Aug-2012 Ard Biesheuvel <ard.biesheuvel@gmail.com> hardening: eliminate all text relocations from lidbvm

This patch consists of:
- changes to mterp/ that turn all literals from absolute
to PC relative, so the relocations can be resolved at
(build) link time
- changes to compiler/template/ that result in the
compiler templates to live in the non-executable
.data.rel.ro section (this code is never executed
directly, only from the jit heap, so there is no
reason to put it in the .text section)

Change-Id: I2dc97bd4720b393a74b7277a188f0c7b681fc932
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
8b095215a4d5bde723819087f3455bdcc250a78f 20-Jun-2012 David Butcher <david.butcher@arm.com> Switched code to blx <reg>

ldr ip,<addr> blx ip is preferred over mov lr,pc ldr pc,<addr> from armv5te,
and will typically perform better on later ARM processors.

Change-Id: I8f2e5e794c644faafd767037ad56579f2934de47
389e258a5b9b2afb7bfaee3344c615d3310fae4e 23-Apr-2011 buzbee <buzbee@google.com> InterpBreak cleanup (part 1)

Moved the suspend count variables from the interpBreak structure. These
are already protected by a mutex, and we need the space in interpBreak
for additional subMode flags. This CL just does the move and expands
the width of subMode to 16-bits.

Change-Id: I4a6070b1ba4fb08a0f6e0aba6f150b30f9159eed
30bc0d46ae730d78c42c39cfa56a59ba3025380b 22-Apr-2011 buzbee <buzbee@google.com> Consolidate curFrame fields in thread storage

We ended up with two locations in the Thread structure for saved
Dalvik frame pointer. This change consolidates them.

Change-Id: I78f288e4e57e232f29663be930101e775bfe370f
d5f6ef487f2fc6edb8c1e6394d2a82712169f491 18-Apr-2011 buzbee <buzbee@google.com> [JIT] Clear inJitCodeCache flag on return

This CL plugs a hole in which control could return to the interpreter
from JIT'd code without resetting the inJitCodeCache flag.

Change-Id: Id0241bf3490f5bef9b274483af694c81f33334cf
99e3e6e72e3471eb85fc2e405866392b01c080fe 29-Mar-2011 buzbee <buzbee@google.com> Fix interpreter debug attach

Fix a few miscellaneous bugs from the interpreter restructuring that were
causing a segfault on debugger attach.

Added a sanity checking routine for debugging.

Fixed a problem in which the JIT's threshold and on/off switch
wouldn't get initialized properly on thread creation.

Renamed dvmCompilerStateRefresh() to dvmCompilerUpdateGlobalState() to
better reflect its function.

Change-Id: I5b8af1ce2175e3c6f53cda19dd8e052a5f355587
9a3147c7412f4794434b4c2604aa2ba784867774 03-Mar-2011 buzbee <buzbee@google.com> Interpreter restructuring

This is a restructuring of the Dalvik ARM and x86 interpreters:

o Combine the old portstd and portdbg interpreters into a single
portable interpreter.
o Add debug/profiling support to the fast (mterp) interpreters.
o Delete old mechansim of switching between interpreters. Now, once
you choose an interpreter at startup, you stick with it.
o Allow JIT to co-exist with profiling & debugging (necessary for
first-class support of debugging with the JIT active).
o Adds single-step capability to the fast assembly interpreters without
slowing them down (and, in fact, measurably improves their performance).
o Remove old "polling for safe point" mechanism. Breakouts now achieved
via modifying base of interpreter handler table.
o Simplify interpeter control mechanism.
o Allow thread-granularity control for profiling & debugging

The primary motivation behind this change was to improve the responsiveness
of debugging and profiling and to make it easier to add new debugging and
profiling capabilities in the future. Instead of always bailing out to the
slow debug portable interpreter, we can now stay in the fast interpreter.

A nice side effect of the change is that the fast interpreters
got a healthy speed boost because we were able to replace the
polling safepoint check that involved a dozen or so instructions
with a single table-base reload. When combined with the two earlier CLs
related to this restructuring, we show a 5.6% performance improvement
using libdvm_interp.so on the Checkers benchmark relative to Honeycomb.

Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
9f601a917c8878204482c37aec7005054b6776fa 12-Feb-2011 buzbee <buzbee@google.com> Interpreter restructuring: eliminate InterpState

The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.

Here's why:

In principio creavit Fadden Thread et InterpState. And it was good.

Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.

Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.

The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.

Yuck.

With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.

This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.

Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
d72564ca7aa66c6d95b6ca34299258b65ecfd1cb 09-Feb-2011 Ben Cheng <bccheng@android.com> Misc goodies in the JIT in preparation for more aggressive code motion.

- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE

Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
18fba346582c08d81aa96d9508c0e935bad5f36f 20-Jan-2011 buzbee <buzbee@google.com> Support traceview-style profiling in all builds

This change builds on an earlier bccheng change that allowed JIT'd code
to avoid reverting to the debug portable interpeter when doing traceview-style
method profiling. That CL introduced a new traceview build (libdvm_traceview)
because the performance delta was too great to enable the capability for
all builds.

In this CL, we remove the libdvm_traceview build and provide full-speed
method tracing in all builds. This is done by introducing "_PROF"
versions of invoke and return templates used by the JIT. Normally, these
templates are not used, and performace in unaffected. However, when method
profiling is enabled, all existing translation are purged and new translations
are created using the _PROF templates. These templates introduce a
smallish performance penalty above and beyond the actual tracing cost, but
again are only used when tracing has been enabled.

Strictly speaking, there is a slight burden that is placed on invokes and
returns in the non-tracing case - on the order of an additional 3 or 4
cycles per invoke/return. Those operations are already heavyweight enough
that I was unable to measure the added cost in benchmarks.

Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
2e152baec01433de9c63633ebc6f4adf1cea3a87 16-Dec-2010 buzbee <buzbee@google.com> [JIT] Trace profiling support

In preparation for method compilation, this CL causes all traces to
include two entry points: profiling and non-profiling. For now, the
profiling entry will only be used if dalvik is run with -Xjitprofile,
and largely works like it did before. The difference is that profiling
support no longer requires the "assert" build - it's always there now.

This will enable us to do a form of sampling profiling of
traces in order to identify hot methods or hot trace groups,
while keeping the overhead low by only switching profiling on periodically.

To turn the periodic profiling on and off, we simply unchain all existing
translations and set the appropriate global profile state. The underlying
translation lookup and chaining utilties will examine the profile state to
determine which entry point to use (i.e. - profiling or non-profiling) while
the traces naturally rechain during further execution.

Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
13fbc2e4bfa04cce8e181ac37d7f2b13a54aa037 14-Dec-2010 buzbee <buzbee@google.com> Stamp out some x86/host mode warnings

Nuked a void* cast warnings and moved cacheflush into a target-specific
utility wrapper.

Change-Id: I36c841288b9ec7e03c0cb29b2e89db344f36fad1
8c9ac9ab0ab6fd75b73cb0d99005da3aa90c167c 22-Oct-2010 Ben Cheng <bccheng@android.com> Avoid conditional loads if WORKAROUND_CORTEX_A9_745320 is defined.

No noticeable performance impact by this change.

Bug: 3117632
Change-Id: I31c6adc6cb9999498bb456f1e87f6f04f33e4144
c8293e7dfe856ca95e27aef1ac2e64d750d60662 12-Oct-2010 Ben Cheng <bccheng@android.com> Fine-tune the instructions on the method invocation path.

1) Initialize the register and out sizes for callee methods through
constant moves.
2) Eliminate an unnecessary load of Dalvik PC for chained and
native callees.

Improved method invocation performance by ~3%.

Change-Id: Iead1276eed0ba527e82eb876f08d169ab9b496b2
5cc61d70ec727aa22f58463bf7940cc717cf3eb1 31-Aug-2010 Ben Cheng <bccheng@android.com> Collect method traces with the fast interpreter and the JIT'ed code.

Insert inline code instead of switching to the debug interpreter in the hope
that the time stamps collected in traceview are more close to the real
world behavior with minimal profiling overhead.

Because the inline polling still introduces additional overhead (20% ~ 100%),
it is only enabled in the special VM build called "libdvm_traceview.so".
It won't work on the emulator because it is not implemented to collect the
detailed instruction traces.

Here are some performance numbers using the FibonacciSlow microbenchmark
(ie recursive workloads / the shorter the faster):

time: configuration
8,162,602: profiling off/libdvm.so/JIT off
2,801,829: profiling off/libdvm.so/JIT on
9,952,236: profiling off/libdvm_traceview.so/JIT off
4,465,701: profiling off/libdvm_traceview.so/JIT on
164,786,585: profiling on/libdvm.so/JIT off
164,664,634: profiling on/libdvm.so/JIT on
11,231,707: profiling on/libdvm_traceview.so/JIT off
8,427,846: profiling on/libdvm_traceview.so/JIT on

Comparing the 8,427,846 vs 164,664,634 numbers againt the true baseline
performance number of 2,801,829, the new libdvm_traceview.so improves the time
skew from 58x to 3x.

Change-Id: I48611a3a4ff9c4950059249e5503c26abd6b138e
7a2697d327936e20ef5484f7819e2e4bf91c891f 07-Jun-2010 Ben Cheng <bccheng@android.com> Implement method inlining for getters/setters

Changes include:
1) Force the trace that ends with an invoke instruction to include
the next instruction if it is a move-result (because both need
to be turned into no-ops if callee is inlined).
2) Interpreter entry point/trace builder changes so that return
target won't automatically be considered as trace starting points
(to avoid duplicate traces that include the move result
instructions).
3) Codegen changes to handle getters/setters invoked from both
monomorphic and polymorphic callsites.
4) Extend/fix self-verification to form identical trace regions and
handle traces with inlined callees.
5) Apply touchups to the method based parsing - still not in use.

Change-Id: I116b934df01bf9ada6d5a25187510e352bccd13c
7365493ad8d360c1dcf9cd8b6eee62747af01cae 09-Jun-2010 Carl Shapiro <cshapiro@google.com> Remove repeated newlines at the end of files.

Change-Id: I1e3d103a7b932ef21acedb6438c0f26b315df28f
fbdcfb9ea9e2a78f295834424c3f24986ea45dac 29-May-2010 Brian Carlstrom <bdc@google.com> Merge remote branch 'goog/dalvik-dev' into dalvik-dev-to-master

Change-Id: I0c0edb3ebf0d5e040d6bbbf60269fab0deb70ef9
b88ec3cbb419b5eac23508dc6b73de2620d7521a 17-May-2010 Ben Cheng <bccheng@android.com> Remove the write permission for the JIT code cache when not needed

To support the feature, redesigned the predicted chaining mechanism so that the
profile count is shared globally in InterpState.

Bug: 2690371
Change-Id: Ifed427e8b1fa4f6c670f19e0761e45e2d4afdbb6
bd0472480c6e876198fe19c4ffa22350c0ce57da 13-May-2010 Bill Buzbee <buzbee@google.com> JIT: Fix for [Issue 2675245] FRF40 monkey crash in jit-cache

The JIT's chaining mechanism suffered from a narrow window that
could result in i-cache inconsistency. One of the forms of chaining
cell consisted of a two 16-bit thumb instruction sequence. If a thread were
interrupted between the execution of those two instructions *and*
another thread picked that moment to convert that cell's
chained/unchained state, then bad things happen.

This CL alters the chain/unchain model somewhat to avoid this case.
Chainable chaining cells grow by 4 bytes each, and instead of rewriting
a 32-bit cell to chain/unchain, we switch between chained and unchained
state by [re]writing the first 16-bits of the cell as either a 16-bit
Thumb unconditional branch (unchained mode) or the first half of a
32-bit Thumb branch. The 2nd 16-bits of the cell will never change once
the cell moves from its inital state - thus avoiding the possibility of it
becoming inconsistent.

This adds a trivial execution penalty on the slow path, but will add
about a kByte of memory usage to a typical process.

Change-Id: Id8b99802e11386cfbab23da6abae10e2d9fc4065
978738d2cbf9d08fa78c65762eaac3351ab76b9a 13-May-2010 Ben Cheng <bccheng@android.com> Add counters to track JIT inline cache hit rate and code cache patch counts.

Also did some WITH_JIT_TUNING cleanup.

Change-Id: I8bb2d681a06b0f2af1f976a007326825a88cea38
a62475ecfcc80c58add8f153c9605762dafb8227 30-Apr-2010 Ben Cheng <bccheng@android.com> Use unsigned comparison for stack pointers.

Bug: 2613607
Change-Id: I6a8abd69fbf9cb9f8ec9d9febf1ea42fd631fe9c
11d8f14eef83d1b7bfa8f116de56a92d5ba9e71e 24-Mar-2010 Ben Cheng <bccheng@android.com> Fix for the JIT blocking mode plus some code cleanup.

Bug: 2517606
Change-Id: I2b5aa92ceaf23d484329330ae20de5966704280b
fd7e221cce6d3c63fd26599d58e0a35db7f5d1fa 09-Mar-2010 Colin Cross <ccross@android.com> Add armv7-a-neon build target

Change-Id: I981d55b53f6b3c185fe93384924bdbe18057132c