History log of /art/compiler/optimizing/intrinsics.cc
Revision Date Author Comments
0e54c0160c84894696c05af6cad9eae3690f9496 04-Mar-2016 Aart Bik <ajcbik@google.com> Unsafe: Recognize intrinsics for 1.8 java.util.concurrent
With unit test.

Rationale:
Recognizing the 1.8 methods as intrinsics is the first step
towards providing efficient implementation on all architectures.
Where not implemented (everywhere for now), the methods fall back
to the JNI native or reference implementation.

NOTE: needs iam's CL first!

bug=26264765

Change-Id: Ife65e81689821a16cbcdd2bb2d35641c6de6aeb6
2a6aad9d388bd29bff04aeec3eb9429d436d1873 25-Feb-2016 Aart Bik <ajcbik@google.com> Implement fp to bits methods as intrinsics.

Rationale:
Better optimization, better performance.

Results on libcore benchmark:

Most gain is from moving the invariant call out of the loop
after we detect everything is a side-effect free intrinsic.
But generated code in general case is much cleaner too.

Before:
timeFloatToIntBits() in 181 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 208 ms.
timeDoubleToRawLongBits() in 35 ms.

After:
timeFloatToIntBits() in 36 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 35 ms.
timeDoubleToRawLongBits() in 34 ms.

bug=11548336

Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
38e9e8046ea2196284bdb4638771c31108a30a4a 18-Feb-2016 Jean-Philippe Halimi <jean-philippe.halimi@intel.com> Add statistics support for some optimizations

This patch adds support for the --dump-stats facility with some
optimizations
and fixes all build issues introduced by the patch:
I68751b119a030952a11057cb651a3c63e87e73ea (which got reverted)

Change-Id: I5af1f2a8cced0a1a55c2bb4d8c88e6f0a24ec879
Signed-off-by: Jean-Philippe Halimi <jean-philippe.halimi@intel.com>
f8b3b8bc37fb04d8ae113ae6bfcf4de2f5a700d4 04-Feb-2016 Vladimir Marko <vmarko@google.com> Try to substitute constructor chains for IPUTs.

Match a constructor chain where each constructor either
forwards some or all of its arguments to the next (i.e.
superclass constructor or a constructor in the same class)
and may pass extra zeros (of any type, including null),
followed by any number of IPUTs on "this", storing either
arguments or zeros, until we reach the contructor of
java.lang.Object.

When collecting IPUTs from the constructor chain, remove
any IPUTs that store the same field as an IPUT that comes
later. This is safe in this case even if those IPUTs store
volatile fields because the uninitialized object reference
wasn't allowed to escape yet. Also remove any IPUTs that
store zero values as the allocated object is already zero
initialized.

(cherry picked from commit 354efa6cdf558b2331e8fec539893fa51763806e)

Change-Id: I691e3b82e550e7a3272ce6a81647c7fcd02c01b1
354efa6cdf558b2331e8fec539893fa51763806e 04-Feb-2016 Vladimir Marko <vmarko@google.com> Try to substitute constructor chains for IPUTs.

Match a constructor chain where each constructor either
forwards some or all of its arguments to the next (i.e.
superclass constructor or a constructor in the same class)
and may pass extra zeros (of any type, including null),
followed by any number of IPUTs on "this", storing either
arguments or zeros, until we reach the contructor of
java.lang.Object.

When collecting IPUTs from the constructor chain, remove
any IPUTs that store the same field as an IPUT that comes
later. This is safe in this case even if those IPUTs store
volatile fields because the uninitialized object reference
wasn't allowed to escape yet. Also remove any IPUTs that
store zero values as the allocated object is already zero
initialized.

Change-Id: If93022310bf04fe38ee741665ac4a65d4c2bb25f
59c9454b92c2096a30a2bbdffb64edf33dbdd916 25-Jan-2016 Aart Bik <ajcbik@google.com> Recognize common utilities as intrinsics.

Rationale:
Recognizing these method calls as intrinsics already has
major advantages (compiler knows about no-side-effects/no-throw
properties). Next step is, of course, to implement these
with native instructions on each architecture.

Change-Id: I06fd12973238caec00d67b31b195d7f8807a538e
3f67e692860d281858485d48a4f1f81b907f1444 15-Jan-2016 Aart Bik <ajcbik@google.com> Implemented BitCount as an intrinsic. With unit test.

Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.

Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%

Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
5d75afe333f57546786686d9bee16b52f1bbe971 14-Dec-2015 Aart Bik <ajcbik@google.com> Improved side-effects/can-throw information on intrinsics.

Rationale: improved side effect and exception analysis gives
many more opportunities for GVN/LICM/BCE.

Change-Id: I8aa9b757d77c7bd9d58271204a657c2c525195b5
a4f1220c1518074db18ca1044e9201492975750b 06-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> Optimizing: Add direct calls to math intrinsics

Support the double forms of:
cos, sin, acos, asin, atan, atan2, cbrt, cosh, exp, expm1,
hypot, log, log10, nextAfter, sinh, tan, tanh

Add these entries to the vector addressed off the thread pointer. Call
the libc routines directly, which means that we have to implement the
native ABI, not the ART one. For x86_64, that includes saving XMM12-15
as the native ABI considers them caller-save, while the ART ABI
considers them callee-save. We save them by marking them as used by the
call to the math function. For x86, this is not an issue, as all the XMM
registers are caller-save.

Other architectures will call Java as before until they are ready to
implement the new intrinsics.

Bump the OAT version since we are incompatible with old boot.oat files.

Change-Id: Ic6332c3555c09393a17d1ad4daf62932488722fb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e523423a053af5cb55837f07ceae9ff2fd581712 02-Dec-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Don't use the compiler driver for method resolution.""

This reverts commit c88ef3a10c474045a3476a02ae75d07ddd3230b7.

Change-Id: I0ed88a48b313a8d28bc39fae40631123aadb13ef
c88ef3a10c474045a3476a02ae75d07ddd3230b7 01-Dec-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Don't use the compiler driver for method resolution."

Fails 425 in debuggable mode.

This reverts commit 4db0bf9c4db6a09716c3388b7d2f88d534470339.

Change-Id: I346df8f75674564fc4fb241c60f23e250fc7f0a7
4db0bf9c4db6a09716c3388b7d2f88d534470339 23-Nov-2015 Nicolas Geoffray <ngeoffray@google.com> Don't use the compiler driver for method resolution.

The compiler driver makes assumptions that don't hold for
the optimizing compiler, and will for example always go to
slow path for an invoke-super when there's no verified method.

Also fix GenerateInvokeVirtual in the presence of intrinsics.

Next change will address some of the TODOs in sharpening.cc.

Change-Id: I2b0e543ee9b9bebcadb2d26de29e850c59ad58b9
e34648dec914453f7e8b6c517dd272823319cd6d 23-Nov-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Add stats support for existing optimizations"

Breaks the build. Please ensure your changes build.

This reverts commit 06241b1b07fb031b7d2cf55f4b78d3444d07cc2d.

Change-Id: I68b18f99a9882719bf6654d3313531a7965b8483
06241b1b07fb031b7d2cf55f4b78d3444d07cc2d 03-Sep-2015 Jean-Philippe Halimi <jean-philippe.halimi@intel.com> Add stats support for existing optimizations

This patch adds support for the --dump-stats facility with existing
optimizations.

Change-Id: I68751b119a030952a11057cb651a3c63e87e73ea
Signed-off-by: Jean-Philippe Halimi <jean-philippe.halimi@intel.com>
16ba2b4726cafc2d83cae4a65132aac15f372689 02-Nov-2015 Chris Larsen <chris.larsen@imgtec.com> MIPS32: java.lang.String.equals

Add intrinsic support for String.equals on MIPS32.

Change-Id: I2d184aa4d5dae7cdd4a89c2c902535692c9e7393
ee3cf0731d0ef0787bc2947c8e3ca432b513956b 06-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify System.arraycopy.

Currently on x64, will do the other architectures in
different changes.

Change-Id: I15fbbadb450dd21787809759a8b14b21b1e42624
3039e381b79ac1ef01c420511f6629f639d40ab4 26-Aug-2015 Chris Larsen <chris.larsen@imgtec.com> MIPS64: Implement miscellaneous bit manipulation intrinsics

// java.lang.Double
- doubleToRawLongBits(double) - longBitsToDouble(long)

// java.lang.Float
- floatToRawIntBits(float) - intBitsToFloat(int)

// java.lang.Integer
- numberOfLeadingZeros(int) - reverseBytes(int)
- reverse(int)

// java.lang.Long
- numberOfLeadingZeros(long) - reverseBytes(long)
- reverse(long)

// java.lang.Short
- reverseBytes(short)

Change-Id: Ic8f8c4e7b584132e2282b4fd267453870fefbaaa
9ee23f4273efed8d6378f6ad8e63c65e30a17139 23-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight

Change-Id: I2a07c279756ee804fb7c129416bdc4a3962e93ed
bfb5ba90cd6425ce49c2125a87e3b12222cc2601 01-Sep-2015 Andreas Gampe <agampe@google.com> Revert "Revert "Do a second check for testing intrinsic types.""

This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7.

When an intrinsic with invoke-type virtual is recognized, replace
the instruction with a new HInvokeStaticOrDirect.

Minimal update for dex-cache rework. Fix includes.

Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
a14b9fef395b94fa9a32147862c198fe7c22e3d7 25-Aug-2015 Andreas Gampe <agampe@google.com> Revert "Do a second check for testing intrinsic types."

This reverts commit 4daa0b4c21eee46362b5114fb2c3800c0c7e7a36.

If the intrinsic has a slow-path, like charAt, the slow-path logic will complain as it only understands direct slow-paths, not virtual calls.

We should either override that decision in the slow-path, or replace the HInvokeVirtual when we're overriding the intrinsic choice.

Bug: 23475673
Change-Id: If55fbc8c82d52e0e7a7aec2674ae2bd2b74b5c77
4daa0b4c21eee46362b5114fb2c3800c0c7e7a36 20-Aug-2015 Nicolas Geoffray <ngeoffray@google.com> Do a second check for testing intrinsic types.

This allows to intrinsify calls made in a different dex file.

Can't easily write a test because it depends on having inlined
a method from boot classpath that calls an intrinsic. Once
String.equals is implemented with the hybrid approach we can write one.

Change-Id: I591d9496e236429943d6bfa7f8b20f576b1cfb9a
05f2056b4f11e0b2bac92b2655abe7030771f5dc 19-Aug-2015 Agi Csaki <agicsaki@google.com> Add support to indicate whether intrinsics require a dex cache

A structural change to indicate whether a given intrinsic requires access
to a dex cache. I updated the needs_environment_ field to indicate
whether an HInvoke needs an environment or a dex cache, and if an HInvoke
represents an intrisified method, we utilize this field to determine if
the HInvoke needs a dex cache.

Bug: 21481923
Change-Id: I9dd25a385e1a1397603da6c4c43f6c1aea511b32
7da072feb160079734331e994ea52760cb2a3243 13-Aug-2015 agicsaki <agicsaki@google.com> Structure for String.Equals intrinsic

Added structure for implementing String.Equals intrinsics. There is no
functional change at this point- the intrinsic is marked as unimplemented
for all instruction sets and compilers.

Bug: 21481923
Change-Id: Ic2a1e22a113ff6091581126f12e926478c011340
6cff09a873e0179f2a8d28727d4cd2447bd1bf16 13-Aug-2015 agicsaki <agicsaki@google.com> Intrinsics recognizer returns kNone for MIPS, MIPS64 instruction sets

Since no intrinsics are implemented in MIPS or MIPS64, the intrinsics
recognizer now does not mark methods as being intrinsified if the
current instruction set is either MIPS or MIPS64.

Change-Id: I9819ccd11d280e548623ad18add057eefefbf6d5
57b81ecbe74138992dd447251e94ed06cd5eb802 12-Aug-2015 agicsaki <agicsaki@google.com> Add support to indicate whether intrinsics require an environment

A structural change to indicate whether a given intrinsic requires
access to an environment. I added a field to HInvoke objects to indicate
if they need an environment whose default value is true and is only updated
if an intrinsic is marked as not requiring an environment. At this point
there is no functional change, as all intrinsics are marked as requiring
an environment. This change adds the structure for future inliner work
which will allow us to inline more intrinsified calls.

Change-Id: I2930e3cef7b785384bf95b95a542d34af442f3b9
611d3395e9efc0ab8dbfa4a197fa022fbd8c7204 10-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Implement numberOfLeadingZeros intrinsic.

Change-Id: I4042fb7a0b75140475dcfca23e8f79d310f5333b
aabdf8ad2e8d3de953dff5c7591e7b3df4d4f60b 03-Aug-2015 Roland Levillain <rpl@google.com> Revert "Optimizing String.Equals as an intrinsic (x86)"

Reverted as it breaks the compilation of boot.{oat,art} on x86 (although this CL may not be the culprit, as the issue seems to come from Optimizing's register allocator).

This reverts commit 8ab7bd6c8b10ad58758c33a1dc9326212bd200e9.

Change-Id: If7c8b6258d1e690f4d2a06bcc82c92563ac6cdef
8ab7bd6c8b10ad58758c33a1dc9326212bd200e9 27-Jul-2015 agicsaki <agicsaki@google.com> Optimizing String.Equals as an intrinsic (x86)

The third implementation of String.Equals. I added an intrinsic
in x86 which is similar to the original java implementation of
String.equals: an instanceof check, null check, length check, and
reference equality check followed by a loop comparing strings
character by character.

Interesting Benchmarking Values:

Optimizing Compiler on Nexus Player
Intrinsic 15-30 Character Strings: 177 ns
Original 15-30 Character Strings: 275 ns
Intrinsic Null Argument: 59 ns
Original Null Argument: 137 ns
Intrinsic 100-1000 Character Strings: 1812 ns
Original 100-1000 Character Strings: 6334 ns

Bug: 21481923
Change-Id: Ia386e19b9dbfe0dac688b20ec93d8f90f67af47e
109c89a8e3b5023d123f8c1313f5843a0ba2e15e 31-Jul-2015 David Brazdil <dbrazdil@google.com> ART: Change stream output kNone intrinsic

Name of intrinsics is dumped with C1visualizer and checked with
Checker whose attributes should not contain whitespace. This patch
changes the output printed for non-intrinsified invokes.

Change-Id: I3e565e8c9e26eb61026e7a13823eab20409dd63a
41b175aba41c9365a1c53b8a1afbd17129c87c14 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
80afd02024d20e60b197d3adfbb43cc303cf29e0 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192
Change-Id: I704884dab15b41ecf7a1c47d397ab1c3fc7ee0f7
d5111bf05fc0a9974280a80eeb43db6d5227a81e 22-May-2015 Nicolas Geoffray <ngeoffray@google.com> Do not use dex_compilation_unit after inlining.

It's incompatible with inlining, as inlined invokes/load class/new
can be from another dex file.

Change-Id: I8897b6a012942bc8e136f2bea70252d3fb3a7fa5
ec525fc30848189051b888da53ba051bc0878b78 28-Apr-2015 Roland Levillain <rpl@google.com> Factor MoveArguments methods in Optimizing's intrinsics handlers.

Also add a precondition similar to the one present in code
generators, regarding static invoke related explicit clinit
check elimination in non-baseline compilations.

Change-Id: I26f4dcb5d02824d7556f90b4b0c85b08b737fa53
848f70a3d73833fc1bf3032a9ff6812e429661d9 15-Jan-2014 Jeff Hao <jeffhao@google.com> Replace String CharArray with internal uint16_t array.

Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter

Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
65b798ea10dd716c1bb3dda029f9bf255435af72 06-Apr-2015 Andreas Gampe <agampe@google.com> ART: Enable more Clang warnings

Change-Id: Ie6aba02f4223b1de02530e1515c63505f37e184c
3e90a96f403cbc353731e6687fe12a088f996cee 27-Mar-2015 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> [optimizing] Do not inline intrinsics

The intrinsics generally have specialized code and the code for them
may be faster than what can be achieved with inlining. Thus inliner
should skip intrinsics.

At the same time, easy methods are not worth intrinsifying: ie String
length and isEmpty. Those can be handled by inliner with no problem
and can actually lead to better code since call is not kept around
through all of the optimizations.

Change-Id: Iab38e6c33f79efa54d845d4871cf26fa9b235ab0
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
878d58cbaf6b17a9e3dcab790754527f3ebc69e5 16-Jan-2015 Andreas Gampe <agampe@google.com> ART: Arm64 optimizing compiler intrinsics

Implement most intrinsics for the optimizing compiler for Arm64.

Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
71fb52fee246b7d511f520febbd73dc7a9bbca79 30-Dec-2014 Andreas Gampe <agampe@google.com> ART: Optimizing compiler intrinsics

Add intrinsics infrastructure to the optimizing compiler.

Add almost all intrinsics supported by Quick to the x86-64 backend.
Further intrinsics require more assembler support.

Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807