History log of /art/compiler/optimizing/intrinsics_x86_64.cc
Revision Date Author Comments
c15a2f4f45661a7f5f542e406282c146ea1a968d 21-Apr-2017 Andreas Gampe <agampe@google.com> ART: Add object-readbarrier-inl.h

Move some read-barrier code into a new header. This prunes the
include tree for the concurrent-copying collector. Clean up other
related includes.

Test: mmma art
Change-Id: I40ce4e74f2e5d4c692529ffb4df933230b6fd73e
c6ea7d00ad069a2736f603daa3d8eaa9a1f8ea11 02-Feb-2017 Andreas Gampe <agampe@google.com> ART: Clean up art_method.h

Clean up the header. Fix up other headers including the -inl file,
in an effort to prune the include graph. Fix broken transitive
includes by making includes explicit. Introduce new -inl files
for method handles and reference visiting.

Test: source build/envsetup.sh && lunch aosp_angler-userdebug && mmma art
Test: source build/envsetup.sh && lunch aosp_mips64-userdebug && mmma art
Change-Id: I8f60f1160c2a702fdf3598149dae38f6fa6bc851
9cc0ea8140e0106e132efc3c1c5c458fa196ae41 16-Mar-2017 Roland Levillain <rpl@google.com> Refactor SystemArrayCopy intrinsics.

Test: m test-art-host
Test: m test-art-target
Change-Id: I2f9ccdbb831030e670996b97e0c422f505b3abf6
0bd97173fab66572c95ce18fa785e00271adc014 16-Mar-2017 Colin Cross <ccross@android.com> Fix sign extension issues in x86_64 code generation

movl expects an Immediate int64_t that is in the range -2GB to
2GB. Cast uint32_t addresses to int32_t before passing as an
Immediate to movl.

In VisitIntegerValueOf, the base address may not fit in the disp32
field. Fall back to storing the base address in a temporary
register if it is larger than 2GB.

Bug: 36281983
Test: m -j test-art-host with LibartImgHostBaseAddress == 0xa0000000
Change-Id: I5f8cc4f5a6220afc577707e3831113b0ead1d2b2
331605a7ba842573b3876e14c933175382b923c8 01-Mar-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Intrinsify Integer.valueOf.""

Fix heap poisoning.
LOG INFO instead of ERROR to avoid run-test failures with --no-image.

bug:30933338
Test: ART_HEAP_POISONING=true test-art-host test-art-target

This reverts commit db7b44ac3ea80a722aaed12e913ebc1661a57998.

Change-Id: I0b7d4f1eb11c62c9a3df8e0de0b1a5d8af760181
db7b44ac3ea80a722aaed12e913ebc1661a57998 28-Feb-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "Intrinsify Integer.valueOf."

Heap poisoning missing
jit-gcstress not optimizing it.

bug:30933338

This reverts commit cd0b27287843cfd904dd163056322579ab4bbf27.

Change-Id: I5ece1818afbca5214babb6803f62614a649aedeb
cd0b27287843cfd904dd163056322579ab4bbf27 23-Feb-2017 Nicolas Geoffray <ngeoffray@google.com> Intrinsify Integer.valueOf.

Improves performance of ArrayListStress and Ritz by ~10% and ~3%.

Test: test-art-host test-art-target
bug: 30933338

Change-Id: I639046e3a18dae50069d3a7ecb538a900bb590a1
71bf7b43380eb445973f32a7f789d9670f8cc97d 16-Nov-2016 Aart Bik <ajcbik@google.com> Optimizations around escape analysis. With tests.

Details:
(1) added new intrinsics
(2) implemented optimizations
more !can be null information
more null check removals
replace return-this uses with incoming parameter
remove dead StringBuffer/Builder calls (with escape analysis)
(3) Fixed exposed bug in CanBeMoved()

Performance gain:
This improves CafeineString by about 360%
(removes null check from first loop, eliminates second loop completely)

Test: test-art-host

Change-Id: Iaf16a1b9cab6a7386f43d71c6b51dd59600e81c1
ff7d89c0364f6ebd0f0798eb18ef8bd62917de6a 07-Nov-2016 Aart Bik <ajcbik@google.com> Allow read side effects for removing dead instructions.

Rationale:
Instructions that only have the harmless read side effect may
be removed when dead as well, we were too strict previously.
As proof of concept, this cl also provides more accurate information
on a few string related intrinsics. This removes the dead indexOf
from CaffeineString (17% performance improvement, big bottleneck
of the StringBuffer's toString() still remains in loop).

Test: test-art-host
Change-Id: Id835a8e287e13e1f09be6b46278a039b8865802e
fdaf0f45510374d3a122fdc85d68793e2431175e 13-Oct-2016 Vladimir Marko <vmarko@google.com> Change string compression encoding.

Encode the string compression flag as the least significant
bit of the "count" field, with 0 meaning compressed and 1
meaning uncompressed.

The main vdex file is a tiny bit larger (+28B for prebuilt
boot images, +32 for on-device built images) and the oat
file sizes change. Measured on Nexus 9, AOSP ToT, these
changes are insignificant when string compression is
disabled (-200B for the 32-bit boot*.oat for prebuilt boot
image, -4KiB when built on the device attributable to
rounding, -16B for 64-bit boot*.oat for prebuilt boot image,
no change when built on device) but with string compression
enabled we get significant differences:
prebuilt multi-part boot image:
- 32-bit boot*.oat: -28KiB
- 64-bit boot*.oat: -24KiB
on-device built single boot image:
- 32-bit boot.oat: -32KiB
- 64-bit boot.oat: -28KiB
The boot image oat file overhead for string compression:
prebuilt multi-part boot image:
- 32-bit boot*.oat: before: ~80KiB after: ~52KiB
- 64-bit boot*.oat: before: ~116KiB after: ~92KiB
on-device built single boot image:
- 32-bit boot.oat: before: 92KiB after: 60KiB
- 64-bit boot.oat: before: 116KiB after: 92KiB

The differences in the SplitStringBenchmark seem to be lost
in the noise.

Test: Run ART test suite on host and Nexus 9 with Optimizing.
Test: Run ART test suite on host and Nexus 9 with interpreter.
Test: All of the above with string compression enabled.
Bug: 31040547

Change-Id: I7570c2b700f1a31004a2d3c18b1cc30046d35a74
12b58b23de974232e991c650405f929f8b0dcc9f 01-Nov-2016 Hiroshi Yamauchi <yamauchi@google.com> Clean up the runtime read barrier and fix fake address dependency.

- Rename GetReadBarrierPointer to GetReadBarrierState.
- Change its return type to uint32_t.
- Fix the runtime fake address dependency for arm/arm64 using inline
asm.
- Drop ReadBarrier::black_ptr_ and some brooks code.

Bug: 12687968
Test: test-art with CC, Ritz EAAC, libartd boot on N9.
Change-Id: I595970db825db5be2e98ee1fcbd7696d5501af55
a1aa3b1f40e496d6f8b3b305a4f956ddf2e425fc 26-Oct-2016 Roland Levillain <rpl@google.com> Add support for Baker read barriers in UnsafeCASObject intrinsics.

Prior to doing the compare-and-swap operation, ensure the
expected reference stored in the holding object's field is
in the to-space by loading it, emitting a read barrier and
updating that field with a strong compare-and-set operation
with relaxed memory synchronization ordering (if needed).

Test: ART host and target tests and Nexus 5X boot test with Baker read barriers.
Bug: 29516905
Bug: 12687968
Change-Id: I480f6a9b59547f11d0a04777406b9bfeb905bfd2
4877b7986c9ba5c69be8f80692c260b4952f69be 09-Sep-2016 jessicahandojo <jessicahandojo@google.com> String compression on intrinsics x86 and x86_64

Changes on intrinsics and Code Generation (x86 and x86_64)
for string compression feature. Currently the feature is off.

The size of boot.oat and boot.art for x86 before and after the
changes (feature OFF) are still. When the feature ON,
boot.oat increased by 0.83% and boot.art decreased by 19.32%.

Meanwhile for x86_64, size of boot.oat and boot.art before and
after changes (feature OFF) are still. When the feature ON,
boot.oat increased by 0.87% and boot.art decreased by 6.59%.

Turn feature on: runtime/mirror/string.h (kUseStringCompression = true)
runtime/asm_support.h (STRING_COMPRESSION_FEATURE 1)

Test: m -j31 test-art-host
All tests passed both when the mirror::kUseStringCompression
is ON and OFF.

The jni_internal_test changed to assert an empty string length
to be equal -(1 << 31) as it is compressed.

Bug: 31040547
Change-Id: Ia447c9b147cabb6a69e6ded86be1fe0c46d9638d
804b03ffb9b9dc6cc3153e004c2cd38667508b13 14-Sep-2016 Vladimir Marko <vmarko@google.com> Change remaining slow path throw entrypoints to save everything.

Change DivZeroCheck, BoundsCheck and explicit NullCheck
slow path entrypoints to conform to kSaveEverything.

On Nexus 9, AOSP ToT, the boot.oat size reduction is
prebuilt multi-part boot image:
- 32-bit boot.oat: -12KiB (-0.04%)
- 64-bit boot.oat: -24KiB (-0.06%)
on-device built single boot image:
- 32-bit boot.oat: -8KiB (-0.03%)
- 64-bit boot.oat: -16KiB (-0.04%)

Test: Run ART test suite including gcstress on host and Nexus 9.
Test: Manually disable implicit null checks and test as above.
Change-Id: If82a8082ea9ae571c5d03b5e545e67fcefafb163
70e97462116a47ef2e582ea29a037847debcc029 09-Aug-2016 Vladimir Marko <vmarko@google.com> Avoid excessive spill slots for slow paths.

Reducing the frame size makes stack maps smaller as we need
fewer bits for stack masks and some dex register locations
may use short location kind rather than long. On Nexus 9,
AOSP ToT, the boot.oat size reduction is
prebuilt multi-part boot image:
- 32-bit boot.oat: -416KiB (-0.6%)
- 64-bit boot.oat: -635KiB (-0.9%)
prebuilt multi-part boot image with read barrier:
- 32-bit boot.oat: -483KiB (-0.7%)
- 64-bit boot.oat: -703KiB (-0.9%)
on-device built single boot image:
- 32-bit boot.oat: -380KiB (-0.6%)
- 64-bit boot.oat: -632KiB (-0.9%)
on-device built single boot image with read barrier:
- 32-bit boot.oat: -448KiB (-0.6%)
- 64-bit boot.oat: -692KiB (-0.9%)

The other benefit is that at runtime, threads may need fewer
pages for their stacks, reducing overall memory usage.

We defer the calculation of the maximum spill size from
the main register allocator (linear scan or graph coloring)
to the RegisterAllocationResolver and do it based on the
live registers at slow path safepoints. The old notion of
an artificial slow path safepoint interval is removed as
it is no longer needed.

Test: Run ART test suite on host and Nexus 9.
Bug: 30212852
Change-Id: I40b3d114e278e2c5807982904fa49bf6642c6275
ba45db072c48783e19a2a73ab4e45ae143c1c7c9 12-Jul-2016 Serban Constantinescu <serban.constantinescu@linaro.org> Extend the InvokeRuntime() changes to x86 and x86_64.

Also fix the LocationSummary for intrinsics that call on main
and slowpath.

Test: test-art-host

Change-Id: I437ffd433ee87b1754dbd8c075ec54f00d7d4ccb
953437bd51059801d92079295f728d0260efca31 24-Aug-2016 Vladimir Marko <vmarko@google.com> Revert "Revert "x86/x86-64: Avoid temporary for read barrier field load.""

Fixed the fault handler recognizing the TEST instruction and
fault address within the lock word. Added tests to 439-npe.

Bug: 29966877
Bug: 12687968
Test: Tested with ART_USE_READ_BARRIER=true on host.
Test: Tested with ART_USE_READ_BARRIER=true ART_HEAP_POISONING=true on host.

This reverts commit ccf15bca330f9a23337b1a4b5850f7fcc6c1bf15.

Change-Id: I8990def5f719c9205bf6e5fdba32027fa82bec50
ccf15bca330f9a23337b1a4b5850f7fcc6c1bf15 23-Aug-2016 Vladimir Marko <vmarko@google.com> Revert "x86/x86-64: Avoid temporary for read barrier field load."

Fault handler does not recognize the instruction
F6 /0 ib TEST r/m8, imm8
so we get crashes instead of NPEs.

Bug: 29966877
Bug: 12687968

This reverts commit ccf06d8f19a37432de4a3b768747090adfbd18ec.

Change-Id: Ib7db3b59f44c0d3ed5e24a20b6c6ee596a89d709
ccf06d8f19a37432de4a3b768747090adfbd18ec 12-Aug-2016 Vladimir Marko <vmarko@google.com> x86/x86-64: Avoid temporary for read barrier field load.

Add TEST instructions for memory and immediate. Use the byte
version to avoid a temporary in read barrier field load.

Test: Tested with ART_USE_READ_BARRIER=true on host.
Test: Tested with ART_USE_READ_BARRIER=true ART_HEAP_POISONING=true on host.
Bug: 29966877
Bug: 12687968
Change-Id: Ia415d3c2e1ae1ff6dff11d72bbb7d96d5deed6ee
0b671c0408e98824e1f92b1ee951b210c090fe7a 19-Aug-2016 Roland Levillain <rpl@google.com> Add support for Baker read barriers in SystemArrayCopy intrinsics.

Benchmarks (ARM64) score variations on Nexus 5X with CPU
cores clamped at 960000 Hz (aosp_bullhead-userdebug build):
- Ritzperf - average (lower is better): -3.03% (slightly better)
- CaffeineMark - average (higher is better): +1.26% (slightly better)
- DeltaBlue (lower is better): -10.50% (better)
- Richards - average (lower is better): -3.36% (slightly better)
- SciMark2 - average (higher is better): +0.26% (virtually unchanged)

Details about Ritzperf benchmarks with meaningful variations
(lower is better):
- FormulaEvaluationActions.EvaluateAndApplyChanges: -13.26% (better)
- FormulaEvaluationActions.EvaluateCascadingSums: -10.94% (better)
- FormulaEvaluationActions.EvaluateComplexFormulas: -15.50% (better)
- FormulaEvaluationActions.EvaluateFibonacci: -10.41% (better)
- FormulaEvaluationActions.EvaluateLargeSums: +6.02% (worse)

Boot image code size variation on Nexus 5X
(aosp_bullhead-userdebug build):
- total ARM64 framework Oat files size change:
107047632 bytes -> 107154128 bytes (+0.10%)
- total ARM framework Oat files size change:
90932028 bytes -> 91009852 bytes (+0.09%)

Test: ART host and target (ARM, ARM64) tests + Nexus 5X boot.
Bug: 29516905
Bug: 29506760
Bug: 12687968
Change-Id: I85431368d09965687a0301ae2eb3c991f276ce5d
0cf8d9c08bb77b0f527121b83e6a9dbb36d602f3 10-Aug-2016 Aart Bik <ajcbik@google.com> Full enable new round implementation on x86/x86_64

Rationale:
Running JIT on Fugu does not always provide a constant area.
In such cases, we need to construct FP constants through stack.
This only applies to x86.

Test: 580-checker-round

BUG=26327751

Change-Id: I7e2c80dafbafbe647cfe9ecb039920bb534c666a
7ad310dabb9c9b60f35f72c7b0736da602e7541a 04-Aug-2016 Aart Bik <ajcbik@google.com> Temporary disable new round implementation on x86/x86_64

Rationale:
FUGU is not happy

Test: 580-checker-round

BUG=26327751

Change-Id: If0ddea47a88e14b86d37080b8a18a6f8defcc8e6
349f388b8c75c674f3337e6474affb2bff91d507 03-Aug-2016 Aart Bik <ajcbik@google.com> Implement single-/double-precision round intrinsic in x86_64

Rationale:
X86_64 does not provide a direct instruction for the
required rounding and NaN and large positive numbers
must be dealt with too. This CL generates code that
correctly implements SP and DP round.

Test: 580-checker-round

BUG=26327751

Change-Id: Ia7518e2c30afafba4e037e2d0c21e0ce926f0425
542451cc546779f5c67840e105c51205a1b0a8fd 26-Jul-2016 Andreas Gampe <agampe@google.com> ART: Convert pointer size to enum

Move away from size_t to dedicated enum (class).

Bug: 30373134
Bug: 30419309
Test: m test-art-host
Change-Id: Id453c330f1065012e7d4f9fc24ac477cc9bb9269
806f0122e923581f559043e82cf958bab5defc87 09-Mar-2016 Serban Constantinescu <serban.constantinescu@linaro.org> Add support for CallKind::kCallOnMainAndSlowPath

Some of the intrinsics call on both the main and slowpath. This patch
adds support for such a CallKind and marks the intrinsics accordingly.

This will be exercised by a later patch that refactors all the runtime
calls to use InvokeRuntime().

Please note that without this patch, the calls to ValidateInvokeRuntime()
exercised by the following patches would fail.

Change-Id: I450571b8b47280a004b714996189ba6db13fb57d
54ff482710910929900f8348a19c5b875e519237 07-Jul-2016 Serban Constantinescu <serban.constantinescu@linaro.org> Rename kCall to kCallOnMainOnly

This patch renames kCall to kCallOnMainOnly in preparation for
the next patch in this series which will be adding kCallOnMainAndSlowPath.

Note: With this patch there will be places where we use kCallOnMainOnly
even though we call on the slow path too. The next patch in this series
will fix that.

Test: ART host tests.
Change-Id: Iabfdb0901990d163be5d780f3bdd2fab6fa17b32
b198b013ae7bd2da85e007414fc028cd51a13883 07-Jul-2016 Nicolas Geoffray <ngeoffray@google.com> Fix System.arraycopy when doing same array copying.

At compile time, if constant source < constant destination, and we don't
know if the arrays are the same, then we must emit code that checks
if the two arrays are the same. If so, we jump to the slow path.

test:610-arraycopy

bug:30030084

(cherry picked from commit 9f65db89353c46f6b189656f7f55a99054e5cfce)

Change-Id:Ida67993d472b0ba4056d9c21c68f6e5239421f7d
fea1abd660cc89b31d121c8700fae8d804178391 06-Jul-2016 Nicolas Geoffray <ngeoffray@google.com> Implement System.arraycopy intrinsic on x86.

Also remove wrong comments when doing the raw copying.

test:run-test, 537-checker-arraycopy, 610-arraycopy

Change-Id: I2495bc03cde8ccad03c93f7722dd29bf85138525
9f65db89353c46f6b189656f7f55a99054e5cfce 07-Jul-2016 Nicolas Geoffray <ngeoffray@google.com> Fix System.arraycopy when doing same array copying.

At compile time, if constant source < constant destination, and we don't
know if the arrays are the same, then we must emit code that checks
if the two arrays are the same. If so, we jump to the slow path.

test:610-arraycopy

Change-Id: Ida67993d472b0ba4056d9c21c68f6e5239421f7d
7f7f6dae98b0b1959348ffc84d8e347a26561226 21-Jun-2016 Pavel Vyssotski <pavel.n.vyssotski@intel.com> ART: OneBit intrinsics should use 1ULL for 64-bit shift

Change-Id: I91cbe769081045e6a45a95154a8a8acf1ec352ef
Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
3d31242300c3525e5c85665d9e34acf87002db63 23-Jun-2016 Roland Levillain <rpl@google.com> Re-enable most intrinsics with read barriers.

Also extend sun.misc.Unsafe test coverage to exercise
sun.misc.Unsafe.{get,put}{Int,Long,Object}Volatile.

Bug: 26205973
Bug: 29516905
Change-Id: I4d8da7cee5c8a310c8825c1631f71e5cb2b80b30
Test: Covered by ART's run-tests.
0fcd2b84210db2bcf8b2d7a2b98a1a2bca367cac 05-Apr-2016 Sang, Chunlei <chunlei.sang@intel.com> Fix x86 & x86-64 UnsafeGetObject intrinsics with read barriers.

The implementation was incorrectly interpreting the 'offset'
input as an index in a (4-byte) object reference array,
whereas it is a (1-byte) offset to an object reference field
within the 'base' (object) input.

Bug: 29516905
Change-Id: Idfbead8289222b55069816a81284401eff791e85
Test: Covered by test/004-UnsafeTest.
87f3fcbd0db352157fc59148e94647ef21b73bce 28-Apr-2016 Vladimir Marko <vmarko@google.com> Replace String.charAt() with HIR.

Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.

Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
53b52005ddae649e6f1cf475de7c259bc6885e6d 24-May-2016 Vladimir Marko <vmarko@google.com> Apply String.equals() optimizations on arm, arm64 and x86-64.

This is a follow-up to
https://android-review.googlesource.com/174192

Change-Id: Ie71197df22548d6eb0ca773de6f19fcbb975f065
da05108b17b020b28555a8d5db5caa42e4435981 17-May-2016 Vladimir Marko <vmarko@google.com> Clean up String.indexOf() intrinsics.

Additional cleanup after
https://android-review.googlesource.com/223260

Bug: 28330359
Change-Id: I88def196babec70123896ef581ec8d61bb1b9a9a
288c7a8664e516d7486ab85267050e676e84cc39 16-May-2016 Serguei Katkov <serguei.i.katkov@intel.com> Revert "Revert "ART: Reference.getReferent intrinsic for x86 and x86_64""

This reverts commit 0997d24e67d78f2146ebae2888eda0d7d254789a.

ART_HEAP_POISONING=true mode is fixed.

Change-Id: I83f6d5c101ea6a86802753f81b3e4348a263fb21
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
0997d24e67d78f2146ebae2888eda0d7d254789a 13-May-2016 Nicolas Geoffray <ngeoffray@google.com> Revert "ART: Reference.getReferent intrinsic for x86 and x86_64"

Fails heap poisoning configuration.

This reverts commit afdc97ebcb4e58afb7cf54d846d30314e6499d83.

Change-Id: I50e53756a2b85059b89cfb8950f8c9e2b032743c
afdc97ebcb4e58afb7cf54d846d30314e6499d83 05-May-2016 Serguei Katkov <serguei.i.katkov@intel.com> ART: Reference.getReferent intrinsic for x86 and x86_64

Change-Id: I7a7ac9244847dd80d9fa4e4b5ebc5bf451c628ff
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
fb6c90a1d7c8333c74c09f64f7eb2f402c3ea002 06-May-2016 Vladimir Marko <vmarko@google.com> Improve String.indexOf() intrinsics.

If the code point input is a char, we don't need the slow
path. Also improve the slow-path check (if we do need it)
on arm and arm64 to avoid loading 0xffff into a register.

Bug: 28330359
Change-Id: Ie6514c16126717bb0b11e3c7ab2b60eaa70fed4c
ebea3d2cce6aa34216502bb6b83d155d4c92e4ff 12-Apr-2016 Roland Levillain <rpl@google.com> Small changes in ARM and x86-64 SystemArrayCopy intrinsics.

Have these intrinsics share a more uniform style with the
ARM64 SystemArrayCopy intrinsic.

Also make some changes/improvements in:
- art::IntrinsicOptimizations
- art::arm64::GenSystemArrayCopyAddresses

Change-Id: Ieeb224795229580f8e5f7219c586d04786d8c705
fa3912edfac60a9f0a9b95a5862c7361b403fcc2 01-Apr-2016 Roland Levillain <rpl@google.com> Fix BitCount intrinsics assertions.

Bug: 27852035
Change-Id: Iba43039aadd9ba288b476d53cc2306a58356465f
f969a209c30e3af636342d2fb7851d82a2529bf7 09-Mar-2016 Roland Levillain <rpl@google.com> Fix and enable java.lang.StringFactory intrinsics.

The following intrinsics were not considered by the
intrinsics recognizer:
- StringNewStringFromBytes
- StringNewStringFromChars
- StringNewStringFromString
This CL enables them and add tests for them.

This CL also:
- Fixes the locations of the ARM64 & MIPS64
StringNewStringFromString intrinsics.
- Fixes the definitions of the FOUR_ARG_DOWNCALL macros on
ARM and x86, which are used to implement the
art_quick_alloc_string_from_bytes* runtime entry points.
- Fixes PC info (stack maps) recording in the
StringNewStringFromBytes, StringNewStringFromChars and
StringNewStringFromString ARM, ARM64 & MIPS64 intrinsics.

Bug: 27425743
Change-Id: I38c00d3f0b2e6b64f7d3fe9146743493bef9e45c
1193259cb37c9763a111825aa04718a409d07145 08-Mar-2016 Aart Bik <ajcbik@google.com> Implement the 1.8 unsafe memory fences directly in HIR.

Rationale:
More efficient since it exposes full semantics to
all operations on the graph and allows for proper
code generation for all architectures.

bug=26264765

Change-Id: Ic435886cf0645927a101a8502f0623fa573989ff
0e54c0160c84894696c05af6cad9eae3690f9496 04-Mar-2016 Aart Bik <ajcbik@google.com> Unsafe: Recognize intrinsics for 1.8 java.util.concurrent
With unit test.

Rationale:
Recognizing the 1.8 methods as intrinsics is the first step
towards providing efficient implementation on all architectures.
Where not implemented (everywhere for now), the methods fall back
to the JNI native or reference implementation.

NOTE: needs iam's CL first!

bug=26264765

Change-Id: Ife65e81689821a16cbcdd2bb2d35641c6de6aeb6
2f9fcc999fab4ba6cd86c30e664325b47b9618e5 02-Mar-2016 Aart Bik <ajcbik@google.com> Simplified intrinsic macro mechanism.

Rationale:
Reduces boiler-plate code in all intrinsics code generators.
Also, the newly introduced "unreachable" macro provides a
static verifier that we do not have unreachable and thus
redundant code in the generators. In fact, this change
exposes that the MIPS32 and MIPS64 rotation intrinsics
(IntegerRotateRight, LongRotateRight, IntegerRotateLeft,
LongRotateLeft) are unreachable, since they are handled
as HIR constructs for all architectures. Thus the code
can be removed.

Change-Id: I0309799a0db580232137ded72bb8a7bbd45440a8
cc3839c15555a2751e13980638fc40e4d3da633e 29-Feb-2016 Roland Levillain <rpl@google.com> Improve documentation about StringFactory.newStringFromChars.

Make it clear that the native method requires its third
argument to be non-null, and therefore that the intrinsics
do not need a null check for it.

Bug: 27378573
Change-Id: Id2f78ceb0f7674f1066bc3f216b738358ca25542
2a6aad9d388bd29bff04aeec3eb9429d436d1873 25-Feb-2016 Aart Bik <ajcbik@google.com> Implement fp to bits methods as intrinsics.

Rationale:
Better optimization, better performance.

Results on libcore benchmark:

Most gain is from moving the invariant call out of the loop
after we detect everything is a side-effect free intrinsic.
But generated code in general case is much cleaner too.

Before:
timeFloatToIntBits() in 181 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 208 ms.
timeDoubleToRawLongBits() in 35 ms.

After:
timeFloatToIntBits() in 36 ms.
timeFloatToRawIntBits() in 35 ms.
timeDoubleToLongBits() in 35 ms.
timeDoubleToRawLongBits() in 34 ms.

bug=11548336

Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
75a38b24801bd4d27c95acef969930f626dd11da 17-Feb-2016 Aart Bik <ajcbik@google.com> Implement isNaN intrinsic through HIR equivalent.

Rationale:
Efficient implementation on all platforms.
Subject to better compiler optimizations.

Change-Id: Ie8876bf5943cbe1138491a25d32ee9fee554043c
a19616e3363276e7f2c471eb2839fb16f1d43f27 02-Feb-2016 Aart Bik <ajcbik@google.com> Implemented compare/signum intrinsics as HCompare
(with all code generation for all)

Rationale:
At HIR level, many more optimizations are possible, while ultimately
generated code can take advantage of full semantics.

Change-Id: I6e2ee0311784e5e336847346f7f3c4faef4fd17e
c5d4754198aadb2ada2d3f5daacd10d79bc13f38 28-Jan-2016 Aart Bik <ajcbik@google.com> Implementation of integer intrinsics on x86_64

Rationale:
Efficient implementations of common integer operations.
Already tested in:
564-checker-bitcount
565-checker-rotate:
566-checker-signum
567-checker-compare
568-checker-onebit (extended to deal with run-time zero)

Change-Id: Ib48c76eee751e7925056d7f26797e9a9b5ae60dd
59c9454b92c2096a30a2bbdffb64edf33dbdd916 25-Jan-2016 Aart Bik <ajcbik@google.com> Recognize common utilities as intrinsics.

Rationale:
Recognizing these method calls as intrinsics already has
major advantages (compiler knows about no-side-effects/no-throw
properties). Next step is, of course, to implement these
with native instructions on each architecture.

Change-Id: I06fd12973238caec00d67b31b195d7f8807a538e
3f67e692860d281858485d48a4f1f81b907f1444 15-Jan-2016 Aart Bik <ajcbik@google.com> Implemented BitCount as an intrinsic. With unit test.

Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.

Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%

Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
e6d0d8de85f79c8702ee722a04cd89ee7e89aeb7 28-Dec-2015 Andreas Gampe <agampe@google.com> ART: Disable Math.round intrinsics

The move to OpenJDK means that Android has caught up with the
definition change of Math.round. Disable intrinsics.

Bug: 26327751
Change-Id: I00dc6cfca12bd7c95e56a4ab76ffee707d3822dc
391b866ce55b8e78b1f9a6b98321d837256e8d66 18-Dec-2015 Roland Levillain <rpl@google.com> Disable the UnsafeCASObject intrinsic with read barriers.

The current implementations of the UnsafeCASObject
intrinsics are missing a read barrier. Temporarily disable
them when read barriers are enabled.

Also re-enable the jsr166.LinkedTransferQueueTest tests that
were failing on the concurrent collector configuration, as
the UnsafeCASObject JNI implementation now correctly
implements the read barrier which was missing.

Bug: 25883050
Bug: 26205973
Change-Id: Iaf5d515532949662d0ac6702c9452a00aa0a23e6
17077d888a6752a2e5f8161eee1b2c3285783d12 16-Dec-2015 Mark P Mendell <mark.p.mendell@intel.com> Revert "Revert "X86: Use locked add rather than mfence""

This reverts commit 0da3b9117706760e8722029f407da6d0297cc943.

Fix a compilation failure that slipped in somehow.

Change-Id: Ide8681cdc921febb296ea47aa282cc195f154049
0da3b9117706760e8722029f407da6d0297cc943 16-Dec-2015 Aart Bik <ajcbik@google.com> Revert "X86: Use locked add rather than mfence"

This reverts commit 7b3e4f99b25c31048a33a08688557b133ad345ab.

Reason: build error on sdk (linux) in git_mirror-aosp-master-with-vendor , please fix first

art/compiler/optimizing/code_generator_x86_64.cc:4032:7: error: use of
undeclared identifier 'codegen_'
codegen_->MemoryFence();

Change-Id: I91f8542cfd944b7425d1981c35872dcdcb901e18
7b3e4f99b25c31048a33a08688557b133ad345ab 19-Nov-2015 Mark Mendell <mark.p.mendell@intel.com> X86: Use locked add rather than mfence

Java semantics for memory ordering can be satisfied using
lock addl $0,0(SP)
rather than mfence. The locked add synchronizes the memory caches, but
doesn't affect device memory.

Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop
with 600000000 iterations:
time ./mfence
real 0m5.411s
user 0m5.408s
sys 0m0.000s

time ./locked_add
real 0m3.552s
user 0m3.550s
sys 0m0.000s

Implement this as an instruction-set-feature lock_add. This is off by
default (uses mfence), and enabled for atom & silvermont variants.
Generation of mfence can be forced by a parameter to MemoryFence.

Change-Id: I5cb4fded61f4cbbd7b7db42a1b6902e43e458911
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
1e7f8db01a929ac816ca122868edc067c3c6cd17 15-Dec-2015 Roland Levillain <rpl@google.com> x86-64 Baker's read barrier fast path implementation.

Introduce an x86-64 fast path implementation in Optimizing
for Baker's read barriers (for both heap reference loads and
GC root loads). The marking phase of the read barrier is
performed by a slow path, invoking the runtime entry point
artReadBarrierMark.

Other read barrier algorithms continue to use the original
slow path based implementation, which has been renamed as
GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow.

Bug: 12687968
Change-Id: I9329293ddca7f9bcb512132bde6675aa202b98b2
40a04bf64e5837fa48aceaffe970c9984c94084a 11-Dec-2015 Scott Wakeling <scott.wakeling@linaro.org> Replace rotate patterns and invokes with HRor IR.

Replace constant and register version bitfield rotate patterns, and
rotateRight/Left intrinsic invokes, with new HRor IR.

Where k is constant and r is a register, with the UShr and Shl on
either side of a |, +, or ^, the following patterns are replaced:

x >>> #k OP x << #(reg_size - k)
x >>> #k OP x << #-k

x >>> r OP x << (#reg_size - r)
x >>> (#reg_size - r) OP x << r

x >>> r OP x << -r
x >>> -r OP x << r

Implemented for ARM/ARM64 & X86/X86_64.

Tests changed to not be inlined to prevent optimization from folding
them out. Additional tests added for constant rotate amounts.

Change-Id: I5847d104c0a0348e5792be6c5072ce5090ca2c34
a4f1220c1518074db18ca1044e9201492975750b 06-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> Optimizing: Add direct calls to math intrinsics

Support the double forms of:
cos, sin, acos, asin, atan, atan2, cbrt, cosh, exp, expm1,
hypot, log, log10, nextAfter, sinh, tan, tanh

Add these entries to the vector addressed off the thread pointer. Call
the libc routines directly, which means that we have to implement the
native ABI, not the ART one. For x86_64, that includes saving XMM12-15
as the native ABI considers them caller-save, while the ART ABI
considers them callee-save. We save them by marking them as used by the
call to the math function. For x86, this is not an issue, as all the XMM
registers are caller-save.

Other architectures will call Java as before until they are ready to
implement the new intrinsics.

Bump the OAT version since we are incompatible with old boot.oat files.

Change-Id: Ic6332c3555c09393a17d1ad4daf62932488722fb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bf84a3d2aa29c0975b4ac0f6f983d56724b2cc57 04-Dec-2015 Roland Levillain <rpl@google.com> Annotate Boolean literals more uniformly in Optimizing's intrinsics.

Change-Id: Ida40309b4bc170a18b4e5db552b77f021a7b89df
0d5a281c671444bfa75d63caf1427a8c0e6e1177 13-Nov-2015 Roland Levillain <rpl@google.com> x86/x86-64 read barrier support for concurrent GC in Optimizing.

This first implementation uses slow paths to instrument heap
reference loads and GC root loads for the concurrent copying
collector, respectively calling the artReadBarrierSlow and
artReadBarrierForRootSlow (new) runtime entry points.

Notes:
- This implementation does not instrument HInvokeVirtual
nor HInvokeInterface instructions (for class reference
loads), as the corresponding read barriers are not stricly
required with the current concurrent copying collector.
- Intrinsics which may eventually call (on slow path) are
disabled when read barriers are enabled, as the current
slow path infrastructure does not support this case.
- When read barriers are enabled, the code generated for a
HArraySet instruction always go into the array set slow
path for object arrays (delegating the operation to the
runtime), as we are lacking a mechanism to keep a
temporary register live accross a runtime call (needed for
the instrumentation of type checking code, which requires
two successive read barriers).

Bug: 12687968
Change-Id: I14cd6107233c326389120336f93955b28ffbb329
ea5af68d6dda832bdfb5978a0c5d6f86a3f67e80 22-Oct-2015 Mark Mendell <mark.p.mendell@intel.com> X86-64: Split long/double constant array/field set

A long constant needs to be in a register to store to memory.
By allowing stores of constants that are outside of the range of
int32_t, we reduce register usage.

Also support sets of float/double constants by using integer stores.

Rename RegisterOrInt32LongConstant to RegisterOrInt32Constant as it
now handles any type of constant.

Change-Id: I025d9ef889a5a433e45aa03b376bae40f14197d2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b488b7864b7bf9cade82d45c8bdda2372f48a10c 22-Oct-2015 Roland Levillain <rpl@google.com> Fix heap poisoning in UnsafeCASObject x86/x86-64 intrinsic.

Properly handle the case when the same object is passed to
sun.misc.Unsafe.compareAndSwapObject for the `obj` and
`newValue` arguments (named `base` and `value` in the
intrinsic implementation) and re-enable this intrinsic.

Also convert some reinterpret_casts to down_casts.

Bug: 12687968
Change-Id: I82167cfa77840ae2cdb45b9f19f5f530858fe7e8
cfea7d54dc8902d93c3fd535294d6c364f823887 20-Oct-2015 Roland Levillain <rpl@google.com> Disable the x86 & x86-64 UnsafeCASObject intrinsic with heap poisoning.

The current heap poisoning instrumentation of this intrinsic
does not always work properly when heap poisoning in
enabled, hence this quick fix to let the build & test
infrastructure turn green again.

Bug: 12687968
Change-Id: I03702a057fb6f07134e926e2c1c2780f47e3a50a
5bd05a5c9492189ec28edaf6396d6a39ddf03367 13-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Implement System.arraycopy intrinsic for arm.

Change-Id: I58ae1af5103e281fe59fbe022b718d6d8f293a5e
ee3cf0731d0ef0787bc2947c8e3ca432b513956b 06-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify System.arraycopy.

Currently on x64, will do the other architectures in
different changes.

Change-Id: I15fbbadb450dd21787809759a8b14b21b1e42624
cde4d272fdb1ac4d4eb8a0b58090b375a1fb50b5 18-Sep-2015 Mark Mendell <mark.p.mendell@intel.com> Fix x86_64 round intrinsic duplicate load

When I changed the code to use Load64BitValue, I forgot to delete the
original load instruction(s). Remove them now.

Change-Id: I76aeccf88576507f2fbcf463ae1e503827a20fe2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
85b62f23fc6dfffe2ddd3ddfa74611666c9ff41d 09-Sep-2015 Andreas Gampe <agampe@google.com> ART: Refactor intrinsics slow-paths

Refactor slow paths so that there is a default implementation for
common cases (only arm64 with vixl is special). Write a generic
intrinsic slow-path that can be reused for the specific architectures.
Move helper functions into CodeGenerator so that they are accessible.

Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
8f8926a5c7ea332ab387c2b3ebc6fd378a5761bc 17-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> Implement StringGetCharsNoCheck intrinsic for X86

Generate inline code for String.GetChars internal no checking form for
X86 and X86_64. Use REP MOVSW to copy the characters, rather than
memcpy as Quick does.

Change-Id: Ia67aff248461b394f97c48053f216880381945ff
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2d554795420be0be88bb4600ea81d1ec293217c4 16-Sep-2015 Mark Mendell <mark.p.mendell@intel.com> X86/X86_64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight

Implement {Long,Integer}NumberOfTrailingZeros and
{Long,Integer}Rotate{Left,Right}.

X86 32 bit mode doesn't implement the LongRotate{Left,Right} intrinsics
at this time.

Change-Id: Ie25c1dca15ee2d17fbdf0c15c758bde431034d35
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
9ee23f4273efed8d6378f6ad8e63c65e30a17139 23-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight

Change-Id: I2a07c279756ee804fb7c129416bdc4a3962e93ed
bfb5ba90cd6425ce49c2125a87e3b12222cc2601 01-Sep-2015 Andreas Gampe <agampe@google.com> Revert "Revert "Do a second check for testing intrinsic types.""

This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7.

When an intrinsic with invoke-type virtual is recognized, replace
the instruction with a new HInvokeStaticOrDirect.

Minimal update for dex-cache rework. Fix includes.

Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
4ab02352db4051d590b793f34d166a0b5c633c4a 12-Aug-2015 Serban Constantinescu <serban.constantinescu@linaro.org> Use CodeGenerator::RecordPcInfo instead of SlowPathCode::RecordPcInfo.

Part of a clean-up and refactoring series. SlowPathCode::RecordPcInfo
is currently just a wrapper around CodGenerator::RecordPcInfo.

Change-Id: Iffabef4ef37c365051130bf98a6aa6dc0a0fb254
Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
0c9497da9485ba688c592e5f452b7b1305a519c0 21-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> X86: Use short forward jumps if possible

The optimizing compiler uses 32 bit relative jumps for all forward
jumps, just in case the offset is too large to fit in one byte. Some of
the generated code knows that the jumps will in fact fit.

Use the 'NearLabel' class for the code generator and intrinsics.

Use the jecxz/jrcxz instruction for string intrinsics.

Unfortunately, conditional jumps to basic blocks don't know enough to
use this, as we don't know how much code will be generated.

This saves a whopping 0.24% for core.oat and boot.oat sizes, but
every little bit helps, and it reduces icache footprint slightly.

Change-Id: I633fe3b2e0e810b4ce12fdad8c02135644b63506
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d5897678eb555a797d4e84e07814d79f0e0bb465 13-Aug-2015 Mark Mendell <mark.p.mendell@intel.com> Implement CountLeadingZeros for x86

Generate Long and Integer numberOfLeadingZeros for x86 and x86_64. Uses
'bsr' instruction to find the first one bit, and then corrects the
result.

Added some more tests with constant values to test constant folding.
Also add a runtime test with 0 as the input.

Change-Id: I920b21bb00069bccf5f921f8f87a77e334114926
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
f8cfb20cfa00f8987227204211e99486bc38572f 14-Aug-2015 Agi Csaki <agicsaki@google.com> Optimizing String.Equals as an intrinsic (x86_64)

The fourth implementation of String.Equals. I added an intrinsic
in x86_64 which is similar to the original java implementation
of String.equals: an instanceof check, null check,length check,
and reference equality check followed by a loop comparing strings
four characters at a time.

Interesting Benchmarking Values:

Optimizing Compiler on 64-bit Emulator
Intrinsic 1-5 Character Strings: 48 ns
Original 1-5 Character Strings: 56 ns
Intrinsic 1000+ Character Strings: 4009 ns
Original 1000+ Character Strings: 4704 ns
Intrinsic Non-String Argument: 35 ns
Original Non-String Argument: 42 ns

Bug: 21481923
Change-Id: I17d0d2e24a670a898ab1729669d3990403b9a853
6bc53a9d884265e0a0b14c4383bef0aa47824e64 01-Jul-2015 Mark Mendell <mark.p.mendell@intel.com> Support X86 intrinsic System.arraycopy char

This is an implementation of arraycopy for X86 and X86_64 versions
of intrinsic System.arraycopy(char[], int, char[], int, int).

The implementations use rep movsw to copy the chars.

Change-Id: Icf9d0efb9986bc3e0794238a74f94fe02f9b42be
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
7da072feb160079734331e994ea52760cb2a3243 13-Aug-2015 agicsaki <agicsaki@google.com> Structure for String.Equals intrinsic

Added structure for implementing String.Equals intrinsics. There is no
functional change at this point- the intrinsic is marked as unimplemented
for all instruction sets and compilers.

Bug: 21481923
Change-Id: Ic2a1e22a113ff6091581126f12e926478c011340
cfa410b0ea561318f74a76c5323f0f6cd8eaaa50 25-May-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] More x86_64 code improvements

Use the constant area some more, use 32-bit immediates in movq
instructions when possible, and other small tweaks.

Remove the commented out code for Math.Abs(float/double) as it would
fail for baseline compiler due to the output being the same as the
input.

Change-Id: Ifa39f1865b94cec2e1c0a99af3066a645e9d3618
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ce4b1329ca903d6b98734a27a46b54bb9cfd6d5b 31-Jul-2015 Pavel Vyssotski <pavel.n.vyssotski@intel.com> ART: x86_64 RoundDouble/Float intrinsics should initialize out value.

x86_64 RoundDouble intrinsic should initialize output register for the case of
"inPlusPointFive >= maxLong" as expected. The same for the RoundFloat intrinsic.
Fixed also the out register type in CreateSSE41FPToIntLocations provoking
a DCHECK failure.

Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>

(cherry picked from commit 9ca257196b46fd7629bce0b338580e571e4113a8)

Bug: 22973442
Change-Id: If974e79d33311587d0b541a01ca8a4c9c11b9468
9ca257196b46fd7629bce0b338580e571e4113a8 31-Jul-2015 Pavel Vyssotski <pavel.n.vyssotski@intel.com> ART: x86_64 RoundDouble/Float intrinsics should initialize out value.

x86_64 RoundDouble intrinsic should initialize output register for the case of
"inPlusPointFive >= maxLong" as expected. The same for the RoundFloat intrinsic.
Fixed also the out register type in CreateSSE41FPToIntLocations provoking
a DCHECK failure.

Change-Id: I0a910682e2917214861683c678ffba8e0f4bfed8
Signed-off-by: Pavel Vyssotski <pavel.n.vyssotski@intel.com>
611d3395e9efc0ab8dbfa4a197fa022fbd8c7204 10-Jul-2015 Scott Wakeling <scott.wakeling@linaro.org> ARM/ARM64: Implement numberOfLeadingZeros intrinsic.

Change-Id: I4042fb7a0b75140475dcfca23e8f79d310f5333b
aabdf8ad2e8d3de953dff5c7591e7b3df4d4f60b 03-Aug-2015 Roland Levillain <rpl@google.com> Revert "Optimizing String.Equals as an intrinsic (x86)"

Reverted as it breaks the compilation of boot.{oat,art} on x86 (although this CL may not be the culprit, as the issue seems to come from Optimizing's register allocator).

This reverts commit 8ab7bd6c8b10ad58758c33a1dc9326212bd200e9.

Change-Id: If7c8b6258d1e690f4d2a06bcc82c92563ac6cdef
8ab7bd6c8b10ad58758c33a1dc9326212bd200e9 27-Jul-2015 agicsaki <agicsaki@google.com> Optimizing String.Equals as an intrinsic (x86)

The third implementation of String.Equals. I added an intrinsic
in x86 which is similar to the original java implementation of
String.equals: an instanceof check, null check, length check, and
reference equality check followed by a loop comparing strings
character by character.

Interesting Benchmarking Values:

Optimizing Compiler on Nexus Player
Intrinsic 15-30 Character Strings: 177 ns
Original 15-30 Character Strings: 275 ns
Intrinsic Null Argument: 59 ns
Original Null Argument: 137 ns
Intrinsic 100-1000 Character Strings: 1812 ns
Original 100-1000 Character Strings: 6334 ns

Bug: 21481923
Change-Id: Ia386e19b9dbfe0dac688b20ec93d8f90f67af47e
4d02711ea578dbb789abb30cbaf12f9926e13d81 01-Jul-2015 Roland Levillain <rpl@google.com> Implement heap poisoning in ART's Optimizing compiler.

- Instrument ARM, ARM64, x86 and x86-64 code generators.
- Note: To turn heap poisoning on in Optimizing, set the
environment variable `ART_HEAP_POISONING' to "true"
before compiling ART.

Bug: 12687968
Change-Id: Ib3120b38cf805a8a50207a314b9ccc90c8d93740
9931f319cf86c56c2855d800339a3410697633a6 19-Jun-2015 Alexandre Rames <alexandre.rames@linaro.org> Opt compiler: Add a description to slow paths.

Change-Id: I22160d90de3fe0ab3e6a2acc440bda8daa00e0f0
94015b939060f5041d408d48717f22443e55b6ad 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect.""

Fix was to special case baseline for x86, which does not have enough
registers to allocate the current method.

This reverts commit c345f141f11faad177aa9635a78088d00cf66086.

Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
c345f141f11faad177aa9635a78088d00cf66086 04-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Use HCurrentMethod in HInvokeStaticOrDirect."

Fails on baseline/x86.

This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a.

Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
38207af82afb6f99c687f64b15601ed20d82220a 01-Jun-2015 Nicolas Geoffray <ngeoffray@google.com> Use HCurrentMethod in HInvokeStaticOrDirect.

Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
3d21bdf8894e780d349c481e5c9e29fe1556051c 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
e401d146407d61eeb99f8d6176b2ac13c4df1e33 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Move mirror::ArtMethod to native

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
07276db28d654594e0e86e9e467cad393f752e6e 18-May-2015 Nicolas Geoffray <ngeoffray@google.com> Don't do a null test in MarkGCCard if the value cannot be null.

Change-Id: I45687f6d3505178e2fc3689eac9cb6ab1b2c1e29
21030dd59b1e350f6f43de39e3c4ce0886ff539c 07-May-2015 Andreas Gampe <agampe@google.com> ART: x86 indexOf intrinsics for the optimizing compiler

Add intrinsics implementations for indexOf in the optimizing
compiler. These are mostly ported from Quick. Add instruction
support to assemblers where necessary.

Change-Id: Ife90ed0245532a5c436a26fe84715dc357f353c8
92e83bf8c0b2df8c977ffbc527989631d94b1819 07-May-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Tune some x86_64 moves

Generate Moves of constant FP values by loading from the constant table.

Use 'movl' to load a 64 bit register for positive 32-bit values, saving
a byte in the generated code by taking advantage of the implicit
zero extension.

Change a couple of xorq(reg, reg) to xorl to (potentially) save a byte
of code per xor.

Change-Id: I5b2a807f0d3b29294fd4e7b8ef6d654491fa0b01
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ec525fc30848189051b888da53ba051bc0878b78 28-Apr-2015 Roland Levillain <rpl@google.com> Factor MoveArguments methods in Optimizing's intrinsics handlers.

Also add a precondition similar to the one present in code
generators, regarding static invoke related explicit clinit
check elimination in non-baseline compilations.

Change-Id: I26f4dcb5d02824d7556f90b4b0c85b08b737fa53
2d27c8e338af7262dbd4aaa66127bb8fa1758b86 28-Apr-2015 Roland Levillain <rpl@google.com> Refactor InvokeDexCallingConventionVisitor in Optimizing.

Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
3e3d73349a2de81d14e2279f60ffbd9ab3f3ac28 28-Apr-2015 Roland Levillain <rpl@google.com> Have HInvoke instructions know their number of actual arguments.

Add an art::HInvoke::GetNumberOfArguments routine so that
art::HInvoke and its subclasses can return the number of
actual arguments of the called method. Use it in code
generators and intrinsics handlers.

Consequently, no longer remove a clinit check as last input
of a static invoke if it is still present during baseline
code generation, but ensure that static invokes have no such
check as last input in optimized compilations.

Change-Id: Iaf9e07d1057a3b15b83d9638538c02b70211e476
848f70a3d73833fc1bf3032a9ff6812e429661d9 15-Jan-2014 Jeff Hao <jeffhao@google.com> Replace String CharArray with internal uint16_t array.

Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter

Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
4c0eb42259d790fddcd9978b66328dbb3ab65615 24-Apr-2015 Roland Levillain <rpl@google.com> Ensure inlined static calls perform clinit checks in Optimizing.

Calls to static methods have implicit class initialization
(clinit) checks of the method's declaring class in
Optimizing. However, when such a static call is inlined,
the implicit clinit check vanishes, possibly leading to an
incorrect behavior.

To ensure that inlining static methods does not change the
behavior of a program, add explicit class initialization
checks (art::HClinitCheck) as well as load class
instructions (art::HLoadClass) as last input of static
calls (art::HInvokeStaticOrDirect) in Optimizing' control
flow graphs, when the declaring class is reachable and not
known to be already initialized. Then when considering the
inlining of a static method call, proceed only if the method
has no implicit clinit check requirement.

The added explicit clinit checks are already removed by the
art::PrepareForRegisterAllocation visitor. This CL also
extends this visitor to turn explicit clinit checks from
static invokes into implicit ones after the inlining step,
by removing the added art::HLoadClass nodes mentioned
hereinbefore.

Change-Id: I9ba452b8bd09ae1fdd9a3797ef556e3e7e19c651
641547a5f18ca2ea54469cceadcfef64f132e5e0 21-Apr-2015 Calin Juravle <calin@google.com> [optimizing] Fix a bug in moving the null check to the user.

When taking the decision to move a null check to the user we did not
verify if the next instruction checks the same object.

Change-Id: I2f4533a4bb18aa4b0b6d5e419f37dcccd60354d2
40741f394b2737e503f2c08be0ae9dd490fb106b 21-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Use more X86_64 addressing modes

Allow constant and memory addresses to more X86_64 instructions.

Add memory formats to X86_64 instructions to match.

Fix a bug in cmpq(CpuRegister, const Address&).

Allow mov <addr>,immediate (instruction 0xC7) to be a valid faulting
instruction.

Change-Id: I5b8a409444426633920cd08e09f687a7afc88a39
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
9021825d1e73998b99c81e89c73796f6f2845471 15-Apr-2015 Nicolas Geoffray <ngeoffray@google.com> Type MoveOperands.

The ParallelMoveResolver implementation needs to know if a move
is for 64bits or not, to handle swaps correctly.

Bug found, and test case courtesy of Serguei I. Katkov.

Change-Id: I9a0917a1cfed398c07e57ad6251aea8c9b0b8506
39dcf55a56da746e04f477f89e7b00ba1de03880 10-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Address x86_64 RIP patch comments

Nicolas had some comments after the patch
https://android-review.googlesource.com/#/c/144100 had merged. Fix the
problems that he found.

Change-Id: I40e8a4273997860db7511dc8f1986281b72bead2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
f55c3e0825cdfc4c5a27730031177d1a0198ec5a 27-Mar-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Add RIP support for x86_64

Support a constant area addressed using RIP on x86_64. Use it for FP
operations to avoid loading constants into a CPU register and moving
to a XMM register.

Change-Id: I58421759ef2a8475538876c20e696ec787015a72
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
58d25fd052e999a24734b0cf856a1563e3d1b2d0 03-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Implement more x86/x86_64 intrinsics

Implement CAS and bit reverse and byte reverse intrinsics that were
missing from x86 and x86_64 implementations.

Add assembler tests and compareAndSwapLong test.

Change-Id: Iabb2ff46036645df0a91f640288ef06090a64ee3
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
fb8d279bc011b31d0765dc7ca59afea324fd0d0c 01-Apr-2015 Mark Mendell <mark.p.mendell@intel.com> [optimizing] Implement x86/x86_64 math intrinsics

Implement floor/ceil/round/RoundFloat on x86 and x86_64.
Implement RoundDouble on x86_64.

Add support for roundss and roundsd on both architectures. Support them
in the disassembler as well.

Add the instruction set features for x86, as the 'round' instruction is
only supported if SSE4.1 is supported.

Fix the tests to handle the addition of passing the instruction set
features to x86 and x86_64.

Add assembler tests for roundsd and roundss to x86_64 assembler tests.

Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
3e90a96f403cbc353731e6687fe12a088f996cee 27-Mar-2015 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> [optimizing] Do not inline intrinsics

The intrinsics generally have specialized code and the code for them
may be faster than what can be achieved with inlining. Thus inliner
should skip intrinsics.

At the same time, easy methods are not worth intrinsifying: ie String
length and isEmpty. Those can be handled by inliner with no problem
and can actually lead to better code since call is not kept around
through all of the optimizations.

Change-Id: Iab38e6c33f79efa54d845d4871cf26fa9b235ab0
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
512e04d1ea7fb33e3992715fe55be8a834d4a79c 27-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Fix typos spotted by Andreas.

Change-Id: I564b4bc5995d91f4c6c4e4f2427ed7c279cb8740
d75948ac93a4a317feaf136cae78823071234ba5 27-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Intrinsify String.compareTo.

Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
a8ac9130b872c080299afacf5dcaab513d13ea87 13-Mar-2015 Nicolas Geoffray <ngeoffray@google.com> Refactor code in preparation of correct stack maps in slow path.

Move the logic of saving/restoring live registers in slow path
in the SlowPathCode method. Also add a RecordPcInfo helper to
SlowPathCode, that will act as the placeholder of saving correct
stack maps.

Change-Id: I25c2bc7a642ef854bbc8a3eb570e5c8c8d2d030c
878d58cbaf6b17a9e3dcab790754527f3ebc69e5 16-Jan-2015 Andreas Gampe <agampe@google.com> ART: Arm64 optimizing compiler intrinsics

Implement most intrinsics for the optimizing compiler for Arm64.

Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
42d1f5f006c8bdbcbf855c53036cd50f9c69753e 16-Jan-2015 Nicolas Geoffray <ngeoffray@google.com> Do not use register pair in a parallel move.

The ParallelMoveResolver does not work with pairs. Instead,
decompose the pair into two individual moves.

Change-Id: Ie9d3f0b078cef8dc20640c98b20bb20cc4971a7f
71fb52fee246b7d511f520febbd73dc7a9bbca79 30-Dec-2014 Andreas Gampe <agampe@google.com> ART: Optimizing compiler intrinsics

Add intrinsics infrastructure to the optimizing compiler.

Add almost all intrinsics supported by Quick to the x86-64 backend.
Further intrinsics require more assembler support.

Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807