History log of /art/compiler/optimizing/intrinsics_list.h
Revision Date Author Comments
0e54c0160c84894696c05af6cad9eae3690f9496 04-Mar-2016 Aart Bik <ajcbik@google.com> Unsafe: Recognize intrinsics for 1.8 java.util.concurrent
With unit test.

Rationale:
Recognizing the 1.8 methods as intrinsics is the first step
towards providing efficient implementation on all architectures.
Where not implemented (everywhere for now), the methods fall back
to the JNI native or reference implementation.

NOTE: needs iam's CL first!

bug=26264765

Change-Id: Ife65e81689821a16cbcdd2bb2d35641c6de6aeb6
2f9fcc999fab4ba6cd86c30e664325b47b9618e5 02-Mar-2016 Aart Bik <ajcbik@google.com> Simplified intrinsic macro mechanism.

Rationale:
Reduces boiler-plate code in all intrinsics code generators.
Also, the newly introduced "unreachable" macro provides a
static verifier that we do not have unreachable and thus
redundant code in the generators. In fact, this change
exposes that the MIPS32 and MIPS64 rotation intrinsics
(IntegerRotateRight, LongRotateRight, IntegerRotateLeft,
LongRotateLeft) are unreachable, since they are handled
as HIR constructs for all architectures. Thus the code
can be removed.

Change-Id: I0309799a0db580232137ded72bb8a7bbd45440a8
2a6aad9d388bd29bff04aeec3eb9429d436d1873 25-Feb-2016 Aart Bik <ajcbik@google.com> Implement fp to bits methods as intrinsics.

Rationale:
Better optimization, better performance.

Results on libcore benchmark:

Most gain is from moving the invariant call out of the loop
after we detect everything is a side-effect free intrinsic.
But generated code in general case is much cleaner too.

Before:
timeFloatToIntBits() in 181 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 208 ms.
timeDoubleToRawLongBits() in 35 ms.

After:
timeFloatToIntBits() in 36 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 35 ms.
timeDoubleToRawLongBits() in 34 ms.

bug=11548336

Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
59c9454b92c2096a30a2bbdffb64edf33dbdd916 25-Jan-2016 Aart Bik <ajcbik@google.com> Recognize common utilities as intrinsics.

Rationale:
Recognizing these method calls as intrinsics already has
major advantages (compiler knows about no-side-effects/no-throw
properties). Next step is, of course, to implement these
with native instructions on each architecture.

Change-Id: I06fd12973238caec00d67b31b195d7f8807a538e
3f67e692860d281858485d48a4f1f81b907f1444 15-Jan-2016 Aart Bik <ajcbik@google.com> Implemented BitCount as an intrinsic. With unit test.

Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.

Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%

Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
5d75afe333f57546786686d9bee16b52f1bbe971 14-Dec-2015 Aart Bik <ajcbik@google.com> Improved side-effects/can-throw information on intrinsics.

Rationale: improved side effect and exception analysis gives
many more opportunities for GVN/LICM/BCE.

Change-Id: I8aa9b757d77c7bd9d58271204a657c2c525195b5
a4f1220c1518074db18ca1044e9201492975750b 06-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> Optimizing: Add direct calls to math intrinsics

Support the double forms of:
cos, sin, acos, asin, atan, atan2, cbrt, cosh, exp, expm1,
hypot, log, log10, nextAfter, sinh, tan, tanh

Add these entries to the vector addressed off the thread pointer. Call
the libc routines directly, which means that we have to implement the
native ABI, not the ART one. For x86_64, that includes saving XMM12-15
as the native ABI considers them caller-save, while the ART ABI
considers them callee-save. We save them by marking them as used by the
call to the math function. For x86, this is not an issue, as all the XMM
registers are caller-save.

Other architectures will call Java as before until they are ready to
implement the new intrinsics.

Bump the OAT version since we are incompatible with old boot.oat files.

Change-Id: Ic6332c3555c09393a17d1ad4daf62932488722fb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ee3cf0731d0ef0787bc2947c8e3ca432b513956b 06-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify System.arraycopy.

Currently on x64, will do the other architectures in
different changes.

Change-Id: I15fbbadb450dd21787809759a8b14b21b1e42624
9ee23f4273efed8d6378f6ad8e63c65e30a17139 23-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight

Change-Id: I2a07c279756ee804fb7c129416bdc4a3962e93ed
05f2056b4f11e0b2bac92b2655abe7030771f5dc 19-Aug-2015 Agi Csaki <agicsaki@google.com> Add support to indicate whether intrinsics require a dex cache

A structural change to indicate whether a given intrinsic requires access
to a dex cache. I updated the needs_environment_ field to indicate
whether an HInvoke needs an environment or a dex cache, and if an HInvoke
represents an intrisified method, we utilize this field to determine if
the HInvoke needs a dex cache.

Bug: 21481923
Change-Id: I9dd25a385e1a1397603da6c4c43f6c1aea511b32
7da072feb160079734331e994ea52760cb2a3243 13-Aug-2015 agicsaki <agicsaki@google.com> Structure for String.Equals intrinsic

Added structure for implementing String.Equals intrinsics. There is no
functional change at this point- the intrinsic is marked as unimplemented
for all instruction sets and compilers.

Bug: 21481923
Change-Id: Ic2a1e22a113ff6091581126f12e926478c011340
57b81ecbe74138992dd447251e94ed06cd5eb802 12-Aug-2015 agicsaki <agicsaki@google.com> Add support to indicate whether intrinsics require an environment

A structural change to indicate whether a given intrinsic requires
access to an environment. I added a field to HInvoke objects to indicate
if they need an environment whose default value is true and is only updated
if an intrinsic is marked as not requiring an environment. At this point
there is no functional change, as all intrinsics are marked as requiring
an environment. This change adds the structure for future inliner work
which will allow us to inline more intrinsified calls.

Change-Id: I2930e3cef7b785384bf95b95a542d34af442f3b9
611d3395e9efc0ab8dbfa4a197fa022fbd8c7204 10-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Implement numberOfLeadingZeros intrinsic.

Change-Id: I4042fb7a0b75140475dcfca23e8f79d310f5333b
aabdf8ad2e8d3de953dff5c7591e7b3df4d4f60b 03-Aug-2015 Roland Levillain <rpl@google.com> Revert "Optimizing String.Equals as an intrinsic (x86)"

Reverted as it breaks the compilation of boot.{oat,art} on x86 (although this CL may not be the culprit, as the issue seems to come from Optimizing's register allocator).

This reverts commit 8ab7bd6c8b10ad58758c33a1dc9326212bd200e9.

Change-Id: If7c8b6258d1e690f4d2a06bcc82c92563ac6cdef
8ab7bd6c8b10ad58758c33a1dc9326212bd200e9 27-Jul-2015 agicsaki <agicsaki@google.com> Optimizing String.Equals as an intrinsic (x86)

The third implementation of String.Equals. I added an intrinsic
in x86 which is similar to the original java implementation of
String.equals: an instanceof check, null check, length check, and
reference equality check followed by a loop comparing strings
character by character.

Interesting Benchmarking Values:

Optimizing Compiler on Nexus Player
Intrinsic 15-30 Character Strings: 177 ns
Original 15-30 Character Strings: 275 ns
Intrinsic Null Argument: 59 ns
Original Null Argument: 137 ns
Intrinsic 100-1000 Character Strings: 1812 ns
Original 100-1000 Character Strings: 6334 ns

Bug: 21481923
Change-Id: Ia386e19b9dbfe0dac688b20ec93d8f90f67af47e
848f70a3d73833fc1bf3032a9ff6812e429661d9 15-Jan-2014 Jeff Hao <jeffhao@google.com> Replace String CharArray with internal uint16_t array.

Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter

Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
3e90a96f403cbc353731e6687fe12a088f996cee 27-Mar-2015 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> [optimizing] Do not inline intrinsics

The intrinsics generally have specialized code and the code for them
may be faster than what can be achieved with inlining. Thus inliner
should skip intrinsics.

At the same time, easy methods are not worth intrinsifying: ie String
length and isEmpty. Those can be handled by inliner with no problem
and can actually lead to better code since call is not kept around
through all of the optimizations.

Change-Id: Iab38e6c33f79efa54d845d4871cf26fa9b235ab0
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
878d58cbaf6b17a9e3dcab790754527f3ebc69e5 16-Jan-2015 Andreas Gampe <agampe@google.com> ART: Arm64 optimizing compiler intrinsics

Implement most intrinsics for the optimizing compiler for Arm64.

Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
71fb52fee246b7d511f520febbd73dc7a9bbca79 30-Dec-2014 Andreas Gampe <agampe@google.com> ART: Optimizing compiler intrinsics

Add intrinsics infrastructure to the optimizing compiler.

Add almost all intrinsics supported by Quick to the x86-64 backend.
Further intrinsics require more assembler support.

Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807