Cross Reference: /art/compiler/optimizing/intrinsics

History log of /art/compiler/optimizing/intrinsics_x86.cc
Revision	Date	Author	Comments
fa3912edfac60a9f0a9b95a5862c7361b403fcc2	01-Apr-2016	Roland Levillain <rpl@google.com>	Fix BitCount intrinsics assertions. Bug: 27852035 Change-Id: Iba43039aadd9ba288b476d53cc2306a58356465f
f969a209c30e3af636342d2fb7851d82a2529bf7	09-Mar-2016	Roland Levillain <rpl@google.com>	Fix and enable java.lang.StringFactory intrinsics. The following intrinsics were not considered by the intrinsics recognizer: - StringNewStringFromBytes - StringNewStringFromChars - StringNewStringFromString This CL enables them and add tests for them. This CL also: - Fixes the locations of the ARM64 & MIPS64 StringNewStringFromString intrinsics. - Fixes the definitions of the FOUR_ARG_DOWNCALL macros on ARM and x86, which are used to implement the art_quick_alloc_string_from_bytes* runtime entry points. - Fixes PC info (stack maps) recording in the StringNewStringFromBytes, StringNewStringFromChars and StringNewStringFromString ARM, ARM64 & MIPS64 intrinsics. Bug: 27425743 Change-Id: I38c00d3f0b2e6b64f7d3fe9146743493bef9e45c
1193259cb37c9763a111825aa04718a409d07145	08-Mar-2016	Aart Bik <ajcbik@google.com>	Implement the 1.8 unsafe memory fences directly in HIR. Rationale: More efficient since it exposes full semantics to all operations on the graph and allows for proper code generation for all architectures. bug=26264765 Change-Id: Ic435886cf0645927a101a8502f0623fa573989ff
0e54c0160c84894696c05af6cad9eae3690f9496	04-Mar-2016	Aart Bik <ajcbik@google.com>	Unsafe: Recognize intrinsics for 1.8 java.util.concurrent With unit test. Rationale: Recognizing the 1.8 methods as intrinsics is the first step towards providing efficient implementation on all architectures. Where not implemented (everywhere for now), the methods fall back to the JNI native or reference implementation. NOTE: needs iam's CL first! bug=26264765 Change-Id: Ife65e81689821a16cbcdd2bb2d35641c6de6aeb6
2f9fcc999fab4ba6cd86c30e664325b47b9618e5	02-Mar-2016	Aart Bik <ajcbik@google.com>	Simplified intrinsic macro mechanism. Rationale: Reduces boiler-plate code in all intrinsics code generators. Also, the newly introduced "unreachable" macro provides a static verifier that we do not have unreachable and thus redundant code in the generators. In fact, this change exposes that the MIPS32 and MIPS64 rotation intrinsics (IntegerRotateRight, LongRotateRight, IntegerRotateLeft, LongRotateLeft) are unreachable, since they are handled as HIR constructs for all architectures. Thus the code can be removed. Change-Id: I0309799a0db580232137ded72bb8a7bbd45440a8
cc3839c15555a2751e13980638fc40e4d3da633e	29-Feb-2016	Roland Levillain <rpl@google.com>	Improve documentation about StringFactory.newStringFromChars. Make it clear that the native method requires its third argument to be non-null, and therefore that the intrinsics do not need a null check for it. Bug: 27378573 Change-Id: Id2f78ceb0f7674f1066bc3f216b738358ca25542
2a6aad9d388bd29bff04aeec3eb9429d436d1873	25-Feb-2016	Aart Bik <ajcbik@google.com>	Implement fp to bits methods as intrinsics. Rationale: Better optimization, better performance. Results on libcore benchmark: Most gain is from moving the invariant call out of the loop after we detect everything is a side-effect free intrinsic. But generated code in general case is much cleaner too. Before: timeFloatToIntBits() in 181 ms. timeFloatToRawIntBits() in 35 ms. timeDoubleToLongBits() in 208 ms. timeDoubleToRawLongBits() in 35 ms. After: timeFloatToIntBits() in 36 ms. timeFloatToRawIntBits() in 35 ms. timeDoubleToLongBits() in 35 ms. timeDoubleToRawLongBits() in 34 ms. bug=11548336 Change-Id: I6e001bd3708e800bd75a82b8950fb3a0fc01766e
75a38b24801bd4d27c95acef969930f626dd11da	17-Feb-2016	Aart Bik <ajcbik@google.com>	Implement isNaN intrinsic through HIR equivalent. Rationale: Efficient implementation on all platforms. Subject to better compiler optimizations. Change-Id: Ie8876bf5943cbe1138491a25d32ee9fee554043c
9779307ce8f2dd40c429abb0f0cafc1415f70648	16-Feb-2016	Nicolas Geoffray <ngeoffray@google.com>	HInvokeStaticOrDirect may not have a special input. For irreducible loops, we disable the generation of HX86ComputeBaseMethodAddress, so intrinsics code should not assume it's there. bug:27149923 Change-Id: I78ba0ca7aefa4033227c77ba438b6eaca53dadd9
a19616e3363276e7f2c471eb2839fb16f1d43f27	02-Feb-2016	Aart Bik <ajcbik@google.com>	Implemented compare/signum intrinsics as HCompare (with all code generation for all) Rationale: At HIR level, many more optimizations are possible, while ultimately generated code can take advantage of full semantics. Change-Id: I6e2ee0311784e5e336847346f7f3c4faef4fd17e
2f10a5fb8c236a6786928f0323bd312c3ee9a4cc	25-Jan-2016	Mark P Mendell <mark.p.mendell@intel.com>	Revert "Revert "X86: Use the constant area for more operations."" This reverts commit cf8d1bb97e193e02b430d707d3b669565fababb4. Handle the case of an intrinsic where CurrentMethod is still an input. This will be the case when there are unresolved classes in the hierarchy. Add a test case to confirm that we don't crash when handling Math.abs, which wants to add a pointer to the constant area for the bitmask to be used to remove the sign bit. Enhance 565-checker-condition-liveness to check for the case of deeply nested EmitAtUseSite chains. Change-Id: I022e8b96a32f5bf464331d0c318c56b9d0ac3c9a
59c9454b92c2096a30a2bbdffb64edf33dbdd916	25-Jan-2016	Aart Bik <ajcbik@google.com>	Recognize common utilities as intrinsics. Rationale: Recognizing these method calls as intrinsics already has major advantages (compiler knows about no-side-effects/no-throw properties). Next step is, of course, to implement these with native instructions on each architecture. Change-Id: I06fd12973238caec00d67b31b195d7f8807a538e
cf8d1bb97e193e02b430d707d3b669565fababb4	25-Jan-2016	Nicolas Geoffray <ngeoffray@google.com>	Revert "X86: Use the constant area for more operations." Hits a DCHECK: dex2oatd F 19461 20411 art/compiler/optimizing/pc_relative_fixups_x86.cc:196] Check failed: !invoke_static_or_direct->HasCurrentMethodInput() This reverts commit dc00454f0b9a134f01f79b419200f4044c2af5c6. Change-Id: Idfcacf12eb9e1dd7e68d95e880fda0f76f90e9ed
dc00454f0b9a134f01f79b419200f4044c2af5c6	30-Oct-2015	Mark Mendell <mark.p.mendell@intel.com>	X86: Use the constant area for more operations. Allow FP HNeg to use the constant area to hold the constant to flip the sign bit. Enhance some math intrinsics to allow the use of the constant area: Abs{Float,Double}, {Min,Max}{FloatFloat,DoubleDouble}. Allow compares of floats/doubles to constants using the constant area. These eliminate almost all uses of loading constants from the stack. Change-Id: Ic4b831565825cbe9f0801b1b53c1013be7c87ae4 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2a946077daf5bfcaf613da49bed58bb0aba435bf	21-Jan-2016	Aart Bik <ajcbik@google.com>	Allow x86 memory operands for 64-bit popcnt on x86. By Mark's request. This allows: popcnt edx, [esp + 16] popcnt ebx, [esp + 20] add ebx, edx ..... instead of the more elaborate mov edx, [ecx + 16] mov ebx, [ecx + 20] popcnt eax, edx popcnt ebp, ebx add ebp, eax Change-Id: Iea30ad7b9a1aba8591e3dbb72e24ef12e81cc2ce
c39dac148cce137ffd78a8e43499fba10c5c79e0	21-Jan-2016	Aart Bik <ajcbik@google.com>	Support for x86 popcnt. Change-Id: I0fc4e745764f1749a6437a199a594f3d8ea53eef
3f67e692860d281858485d48a4f1f81b907f1444	15-Jan-2016	Aart Bik <ajcbik@google.com>	Implemented BitCount as an intrinsic. With unit test. Rationale: Recognizing this important operation as an intrinsic has various advantages: (1) having the no-side-effects/no-throw allows for much more GVN/LICM/BCE. (2) Some architectures, like x86_64, provide direct support for this operation. Performance improvements on X86_64: CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35% ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69% Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
e6d0d8de85f79c8702ee722a04cd89ee7e89aeb7	28-Dec-2015	Andreas Gampe <agampe@google.com>	ART: Disable Math.round intrinsics The move to OpenJDK means that Android has caught up with the definition change of Math.round. Disable intrinsics. Bug: 26327751 Change-Id: I00dc6cfca12bd7c95e56a4ab76ffee707d3822dc
391b866ce55b8e78b1f9a6b98321d837256e8d66	18-Dec-2015	Roland Levillain <rpl@google.com>	Disable the UnsafeCASObject intrinsic with read barriers. The current implementations of the UnsafeCASObject intrinsics are missing a read barrier. Temporarily disable them when read barriers are enabled. Also re-enable the jsr166.LinkedTransferQueueTest tests that were failing on the concurrent collector configuration, as the UnsafeCASObject JNI implementation now correctly implements the read barrier which was missing. Bug: 25883050 Bug: 26205973 Change-Id: Iaf5d515532949662d0ac6702c9452a00aa0a23e6
17077d888a6752a2e5f8161eee1b2c3285783d12	16-Dec-2015	Mark P Mendell <mark.p.mendell@intel.com>	Revert "Revert "X86: Use locked add rather than mfence"" This reverts commit 0da3b9117706760e8722029f407da6d0297cc943. Fix a compilation failure that slipped in somehow. Change-Id: Ide8681cdc921febb296ea47aa282cc195f154049
0da3b9117706760e8722029f407da6d0297cc943	16-Dec-2015	Aart Bik <ajcbik@google.com>	Revert "X86: Use locked add rather than mfence" This reverts commit 7b3e4f99b25c31048a33a08688557b133ad345ab. Reason: build error on sdk (linux) in git_mirror-aosp-master-with-vendor , please fix first art/compiler/optimizing/code_generator_x86_64.cc:4032:7: error: use of undeclared identifier 'codegen_' codegen_->MemoryFence(); Change-Id: I91f8542cfd944b7425d1981c35872dcdcb901e18
7b3e4f99b25c31048a33a08688557b133ad345ab	19-Nov-2015	Mark Mendell <mark.p.mendell@intel.com>	X86: Use locked add rather than mfence Java semantics for memory ordering can be satisfied using lock addl $0,0(SP) rather than mfence. The locked add synchronizes the memory caches, but doesn't affect device memory. Timing on a micro benchmark with a mfence or lock add $0,0(sp) in a loop with 600000000 iterations: time ./mfence real 0m5.411s user 0m5.408s sys 0m0.000s time ./locked_add real 0m3.552s user 0m3.550s sys 0m0.000s Implement this as an instruction-set-feature lock_add. This is off by default (uses mfence), and enabled for atom & silvermont variants. Generation of mfence can be forced by a parameter to MemoryFence. Change-Id: I5cb4fded61f4cbbd7b7db42a1b6902e43e458911 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
7c1559a06041c9c299d5ab514d54b2102f204a84	15-Dec-2015	Roland Levillain <rpl@google.com>	x86 Baker's read barrier fast path implementation. Introduce an x86 fast path implementation in Optimizing for Baker's read barriers (for both heap reference loads and GC root loads). The marking phase of the read barrier is performed by a slow path, invoking a new runtime entry point (artReadBarrierMark). Other read barrier algorithms continue to use the original slow path based implementation, which has been renamed as GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow. Bug: 12687968 Change-Id: Ie610c4befc19ff22378a8cba38b422dcacb54320
40a04bf64e5837fa48aceaffe970c9984c94084a	11-Dec-2015	Scott Wakeling <scott.wakeling@linaro.org>	Replace rotate patterns and invokes with HRor IR. Replace constant and register version bitfield rotate patterns, and rotateRight/Left intrinsic invokes, with new HRor IR. Where k is constant and r is a register, with the UShr and Shl on either side of a \|, +, or ^, the following patterns are replaced: x >>> #k OP x << #(reg_size - k) x >>> #k OP x << #-k x >>> r OP x << (#reg_size - r) x >>> (#reg_size - r) OP x << r x >>> r OP x << -r x >>> -r OP x << r Implemented for ARM/ARM64 & X86/X86_64. Tests changed to not be inlined to prevent optimization from folding them out. Additional tests added for constant rotate amounts. Change-Id: I5847d104c0a0348e5792be6c5072ce5090ca2c34
a4f1220c1518074db18ca1044e9201492975750b	06-Aug-2015	Mark Mendell <mark.p.mendell@intel.com>	Optimizing: Add direct calls to math intrinsics Support the double forms of: cos, sin, acos, asin, atan, atan2, cbrt, cosh, exp, expm1, hypot, log, log10, nextAfter, sinh, tan, tanh Add these entries to the vector addressed off the thread pointer. Call the libc routines directly, which means that we have to implement the native ABI, not the ART one. For x86_64, that includes saving XMM12-15 as the native ABI considers them caller-save, while the ART ABI considers them callee-save. We save them by marking them as used by the call to the math function. For x86, this is not an issue, as all the XMM registers are caller-save. Other architectures will call Java as before until they are ready to implement the new intrinsics. Bump the OAT version since we are incompatible with old boot.oat files. Change-Id: Ic6332c3555c09393a17d1ad4daf62932488722fb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bf84a3d2aa29c0975b4ac0f6f983d56724b2cc57	04-Dec-2015	Roland Levillain <rpl@google.com>	Annotate Boolean literals more uniformly in Optimizing's intrinsics. Change-Id: Ida40309b4bc170a18b4e5db552b77f021a7b89df
b9f811968606ca73c11a476db46ab04f25efd403	03-Dec-2015	Roland Levillain <rpl@google.com>	Fix art::x86::IntrinsicLocationsBuilderX86::VisitUnsafeGetLong. Change-Id: I7514a882ae73db53178f9ec00191619b871b77a6
0d5a281c671444bfa75d63caf1427a8c0e6e1177	13-Nov-2015	Roland Levillain <rpl@google.com>	x86/x86-64 read barrier support for concurrent GC in Optimizing. This first implementation uses slow paths to instrument heap reference loads and GC root loads for the concurrent copying collector, respectively calling the artReadBarrierSlow and artReadBarrierForRootSlow (new) runtime entry points. Notes: - This implementation does not instrument HInvokeVirtual nor HInvokeInterface instructions (for class reference loads), as the corresponding read barriers are not stricly required with the current concurrent copying collector. - Intrinsics which may eventually call (on slow path) are disabled when read barriers are enabled, as the current slow path infrastructure does not support this case. - When read barriers are enabled, the code generated for a HArraySet instruction always go into the array set slow path for object arrays (delegating the operation to the runtime), as we are lacking a mechanism to keep a temporary register live accross a runtime call (needed for the instrumentation of type checking code, which requires two successive read barriers). Bug: 12687968 Change-Id: I14cd6107233c326389120336f93955b28ffbb329
b488b7864b7bf9cade82d45c8bdda2372f48a10c	22-Oct-2015	Roland Levillain <rpl@google.com>	Fix heap poisoning in UnsafeCASObject x86/x86-64 intrinsic. Properly handle the case when the same object is passed to sun.misc.Unsafe.compareAndSwapObject for the `obj` and `newValue` arguments (named `base` and `value` in the intrinsic implementation) and re-enable this intrinsic. Also convert some reinterpret_casts to down_casts. Bug: 12687968 Change-Id: I82167cfa77840ae2cdb45b9f19f5f530858fe7e8
cfea7d54dc8902d93c3fd535294d6c364f823887	20-Oct-2015	Roland Levillain <rpl@google.com>	Disable the x86 & x86-64 UnsafeCASObject intrinsic with heap poisoning. The current heap poisoning instrumentation of this intrinsic does not always work properly when heap poisoning in enabled, hence this quick fix to let the build & test infrastructure turn green again. Bug: 12687968 Change-Id: I03702a057fb6f07134e926e2c1c2780f47e3a50a
ee3cf0731d0ef0787bc2947c8e3ca432b513956b	06-Oct-2015	Nicolas Geoffray <ngeoffray@google.com>	Intrinsify System.arraycopy. Currently on x64, will do the other architectures in different changes. Change-Id: I15fbbadb450dd21787809759a8b14b21b1e42624
a83a54d7f2322060f08480f8aabac5eb07268912	02-Oct-2015	Nicolas Geoffray <ngeoffray@google.com>	Add support for intrinsic optimizations. Change-Id: Ib5a4224022f9360e60c09a19ac8642270a7f3b64
85b62f23fc6dfffe2ddd3ddfa74611666c9ff41d	09-Sep-2015	Andreas Gampe <agampe@google.com>	ART: Refactor intrinsics slow-paths Refactor slow paths so that there is a default implementation for common cases (only arm64 with vixl is special). Write a generic intrinsic slow-path that can be reused for the specific architectures. Move helper functions into CodeGenerator so that they are accessible. Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550
8f8926a5c7ea332ab387c2b3ebc6fd378a5761bc	17-Aug-2015	Mark Mendell <mark.p.mendell@intel.com>	Implement StringGetCharsNoCheck intrinsic for X86 Generate inline code for String.GetChars internal no checking form for X86 and X86_64. Use REP MOVSW to copy the characters, rather than memcpy as Quick does. Change-Id: Ia67aff248461b394f97c48053f216880381945ff Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2d554795420be0be88bb4600ea81d1ec293217c4	16-Sep-2015	Mark Mendell <mark.p.mendell@intel.com>	X86/X86_64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight Implement {Long,Integer}NumberOfTrailingZeros and {Long,Integer}Rotate{Left,Right}. X86 32 bit mode doesn't implement the LongRotate{Left,Right} intrinsics at this time. Change-Id: Ie25c1dca15ee2d17fbdf0c15c758bde431034d35 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
9ee23f4273efed8d6378f6ad8e63c65e30a17139	23-Jul-2015	Scott Wakeling <scott.wakeling@linaro.org>	ARM/ARM64: Intrinsics - numberOfTrailingZeros, rotateLeft, rotateRight Change-Id: I2a07c279756ee804fb7c129416bdc4a3962e93ed
bfb5ba90cd6425ce49c2125a87e3b12222cc2601	01-Sep-2015	Andreas Gampe <agampe@google.com>	Revert "Revert "Do a second check for testing intrinsic types."" This reverts commit a14b9fef395b94fa9a32147862c198fe7c22e3d7. When an intrinsic with invoke-type virtual is recognized, replace the instruction with a new HInvokeStaticOrDirect. Minimal update for dex-cache rework. Fix includes. Change-Id: I1c8e735a2fa7cda4419f76ca0717125ef236d332
4ab02352db4051d590b793f34d166a0b5c633c4a	12-Aug-2015	Serban Constantinescu <serban.constantinescu@linaro.org>	Use CodeGenerator::RecordPcInfo instead of SlowPathCode::RecordPcInfo. Part of a clean-up and refactoring series. SlowPathCode::RecordPcInfo is currently just a wrapper around CodGenerator::RecordPcInfo. Change-Id: Iffabef4ef37c365051130bf98a6aa6dc0a0fb254 Signed-off-by: Serban Constantinescu <serban.constantinescu@linaro.org>
0c9497da9485ba688c592e5f452b7b1305a519c0	21-Aug-2015	Mark Mendell <mark.p.mendell@intel.com>	X86: Use short forward jumps if possible The optimizing compiler uses 32 bit relative jumps for all forward jumps, just in case the offset is too large to fit in one byte. Some of the generated code knows that the jumps will in fact fit. Use the 'NearLabel' class for the code generator and intrinsics. Use the jecxz/jrcxz instruction for string intrinsics. Unfortunately, conditional jumps to basic blocks don't know enough to use this, as we don't know how much code will be generated. This saves a whopping 0.24% for core.oat and boot.oat sizes, but every little bit helps, and it reduces icache footprint slightly. Change-Id: I633fe3b2e0e810b4ce12fdad8c02135644b63506 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d5897678eb555a797d4e84e07814d79f0e0bb465	13-Aug-2015	Mark Mendell <mark.p.mendell@intel.com>	Implement CountLeadingZeros for x86 Generate Long and Integer numberOfLeadingZeros for x86 and x86_64. Uses 'bsr' instruction to find the first one bit, and then corrects the result. Added some more tests with constant values to test constant folding. Also add a runtime test with 0 as the input. Change-Id: I920b21bb00069bccf5f921f8f87a77e334114926 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
6bc53a9d884265e0a0b14c4383bef0aa47824e64	01-Jul-2015	Mark Mendell <mark.p.mendell@intel.com>	Support X86 intrinsic System.arraycopy char This is an implementation of arraycopy for X86 and X86_64 versions of intrinsic System.arraycopy(char[], int, char[], int, int). The implementations use rep movsw to copy the chars. Change-Id: Icf9d0efb9986bc3e0794238a74f94fe02f9b42be Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d7138c813ad72a824fff19f8b10f3fb61f4f43cf	14-Aug-2015	Agi Csaki <agicsaki@google.com>	Revert "Revert "Optimizing String.Equals as an intrinsic (x86)"" This reverts commit aabdf8ad2e8d3de953d. The third implementation of String.Equals. I added an intrinsic in x86 which is similar to the original java implementation of String.equals: an instanceof check, null check, length check, and reference equality check followed by a loop comparing strings character by character. Interesting Benchmarking Values: Optimizing Compiler on Nexus Player Intrinsic 15-30 Character Strings: 177 ns Original 15-30 Character Strings: 275 ns Intrinsic Null Argument: 59 ns Original Null Argument: 137 ns Intrinsic 100-1000 Character Strings: 1812 ns Original 100-1000 Character Strings: 6334 ns Bug: 21481923 Change-Id: I93fa603c4bd22639143d29d0bfc7e773846f21d3
7da072feb160079734331e994ea52760cb2a3243	13-Aug-2015	agicsaki <agicsaki@google.com>	Structure for String.Equals intrinsic Added structure for implementing String.Equals intrinsics. There is no functional change at this point- the intrinsic is marked as unimplemented for all instruction sets and compilers. Bug: 21481923 Change-Id: Ic2a1e22a113ff6091581126f12e926478c011340
611d3395e9efc0ab8dbfa4a197fa022fbd8c7204	10-Jul-2015	Scott Wakeling <scott.wakeling@linaro.org>	ARM/ARM64: Implement numberOfLeadingZeros intrinsic. Change-Id: I4042fb7a0b75140475dcfca23e8f79d310f5333b
aabdf8ad2e8d3de953dff5c7591e7b3df4d4f60b	03-Aug-2015	Roland Levillain <rpl@google.com>	Revert "Optimizing String.Equals as an intrinsic (x86)" Reverted as it breaks the compilation of boot.{oat,art} on x86 (although this CL may not be the culprit, as the issue seems to come from Optimizing's register allocator). This reverts commit 8ab7bd6c8b10ad58758c33a1dc9326212bd200e9. Change-Id: If7c8b6258d1e690f4d2a06bcc82c92563ac6cdef
8ab7bd6c8b10ad58758c33a1dc9326212bd200e9	27-Jul-2015	agicsaki <agicsaki@google.com>	Optimizing String.Equals as an intrinsic (x86) The third implementation of String.Equals. I added an intrinsic in x86 which is similar to the original java implementation of String.equals: an instanceof check, null check, length check, and reference equality check followed by a loop comparing strings character by character. Interesting Benchmarking Values: Optimizing Compiler on Nexus Player Intrinsic 15-30 Character Strings: 177 ns Original 15-30 Character Strings: 275 ns Intrinsic Null Argument: 59 ns Original Null Argument: 137 ns Intrinsic 100-1000 Character Strings: 1812 ns Original 100-1000 Character Strings: 6334 ns Bug: 21481923 Change-Id: Ia386e19b9dbfe0dac688b20ec93d8f90f67af47e
4d02711ea578dbb789abb30cbaf12f9926e13d81	01-Jul-2015	Roland Levillain <rpl@google.com>	Implement heap poisoning in ART's Optimizing compiler. - Instrument ARM, ARM64, x86 and x86-64 code generators. - Note: To turn heap poisoning on in Optimizing, set the environment variable `ART_HEAP_POISONING' to "true" before compiling ART. Bug: 12687968 Change-Id: Ib3120b38cf805a8a50207a314b9ccc90c8d93740
9931f319cf86c56c2855d800339a3410697633a6	19-Jun-2015	Alexandre Rames <alexandre.rames@linaro.org>	Opt compiler: Add a description to slow paths. Change-Id: I22160d90de3fe0ab3e6a2acc440bda8daa00e0f0
94015b939060f5041d408d48717f22443e55b6ad	04-Jun-2015	Nicolas Geoffray <ngeoffray@google.com>	Revert "Revert "Use HCurrentMethod in HInvokeStaticOrDirect."" Fix was to special case baseline for x86, which does not have enough registers to allocate the current method. This reverts commit c345f141f11faad177aa9635a78088d00cf66086. Change-Id: I5997aa52f8d4df373ae5ff4d4150dac0c44c4c10
c345f141f11faad177aa9635a78088d00cf66086	04-Jun-2015	Nicolas Geoffray <ngeoffray@google.com>	Revert "Use HCurrentMethod in HInvokeStaticOrDirect." Fails on baseline/x86. This reverts commit 38207af82afb6f99c687f64b15601ed20d82220a. Change-Id: Ib71018367eb7c6046965494a7e996c22af3de403
38207af82afb6f99c687f64b15601ed20d82220a	01-Jun-2015	Nicolas Geoffray <ngeoffray@google.com>	Use HCurrentMethod in HInvokeStaticOrDirect. Change-Id: I0d15244b6b44c8b10079398c55da5071a3e3af66
3d21bdf8894e780d349c481e5c9e29fe1556051c	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
e401d146407d61eeb99f8d6176b2ac13c4df1e33	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Move mirror::ArtMethod to native Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
07276db28d654594e0e86e9e467cad393f752e6e	18-May-2015	Nicolas Geoffray <ngeoffray@google.com>	Don't do a null test in MarkGCCard if the value cannot be null. Change-Id: I45687f6d3505178e2fc3689eac9cb6ab1b2c1e29
21030dd59b1e350f6f43de39e3c4ce0886ff539c	07-May-2015	Andreas Gampe <agampe@google.com>	ART: x86 indexOf intrinsics for the optimizing compiler Add intrinsics implementations for indexOf in the optimizing compiler. These are mostly ported from Quick. Add instruction support to assemblers where necessary. Change-Id: Ife90ed0245532a5c436a26fe84715dc357f353c8
ec525fc30848189051b888da53ba051bc0878b78	28-Apr-2015	Roland Levillain <rpl@google.com>	Factor MoveArguments methods in Optimizing's intrinsics handlers. Also add a precondition similar to the one present in code generators, regarding static invoke related explicit clinit check elimination in non-baseline compilations. Change-Id: I26f4dcb5d02824d7556f90b4b0c85b08b737fa53
2d27c8e338af7262dbd4aaa66127bb8fa1758b86	28-Apr-2015	Roland Levillain <rpl@google.com>	Refactor InvokeDexCallingConventionVisitor in Optimizing. Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
3e3d73349a2de81d14e2279f60ffbd9ab3f3ac28	28-Apr-2015	Roland Levillain <rpl@google.com>	Have HInvoke instructions know their number of actual arguments. Add an art::HInvoke::GetNumberOfArguments routine so that art::HInvoke and its subclasses can return the number of actual arguments of the called method. Use it in code generators and intrinsics handlers. Consequently, no longer remove a clinit check as last input of a static invoke if it is still present during baseline code generation, but ensure that static invokes have no such check as last input in optimized compilations. Change-Id: Iaf9e07d1057a3b15b83d9638538c02b70211e476
848f70a3d73833fc1bf3032a9ff6812e429661d9	15-Jan-2014	Jeff Hao <jeffhao@google.com>	Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
4c0eb42259d790fddcd9978b66328dbb3ab65615	24-Apr-2015	Roland Levillain <rpl@google.com>	Ensure inlined static calls perform clinit checks in Optimizing. Calls to static methods have implicit class initialization (clinit) checks of the method's declaring class in Optimizing. However, when such a static call is inlined, the implicit clinit check vanishes, possibly leading to an incorrect behavior. To ensure that inlining static methods does not change the behavior of a program, add explicit class initialization checks (art::HClinitCheck) as well as load class instructions (art::HLoadClass) as last input of static calls (art::HInvokeStaticOrDirect) in Optimizing' control flow graphs, when the declaring class is reachable and not known to be already initialized. Then when considering the inlining of a static method call, proceed only if the method has no implicit clinit check requirement. The added explicit clinit checks are already removed by the art::PrepareForRegisterAllocation visitor. This CL also extends this visitor to turn explicit clinit checks from static invokes into implicit ones after the inlining step, by removing the added art::HLoadClass nodes mentioned hereinbefore. Change-Id: I9ba452b8bd09ae1fdd9a3797ef556e3e7e19c651
641547a5f18ca2ea54469cceadcfef64f132e5e0	21-Apr-2015	Calin Juravle <calin@google.com>	[optimizing] Fix a bug in moving the null check to the user. When taking the decision to move a null check to the user we did not verify if the next instruction checks the same object. Change-Id: I2f4533a4bb18aa4b0b6d5e419f37dcccd60354d2
d9b92403254225dd5ff84559886b93680ba0ed64	21-Apr-2015	Nicolas Geoffray <ngeoffray@google.com>	Fix another mistyped location. Change-Id: I52d5a8d34ddc882595da2b53bca0f7eb78d4b3a1
9021825d1e73998b99c81e89c73796f6f2845471	15-Apr-2015	Nicolas Geoffray <ngeoffray@google.com>	Type MoveOperands. The ParallelMoveResolver implementation needs to know if a move is for 64bits or not, to handle swaps correctly. Bug found, and test case courtesy of Serguei I. Katkov. Change-Id: I9a0917a1cfed398c07e57ad6251aea8c9b0b8506
58d25fd052e999a24734b0cf856a1563e3d1b2d0	03-Apr-2015	Mark Mendell <mark.p.mendell@intel.com>	[optimizing] Implement more x86/x86_64 intrinsics Implement CAS and bit reverse and byte reverse intrinsics that were missing from x86 and x86_64 implementations. Add assembler tests and compareAndSwapLong test. Change-Id: Iabb2ff46036645df0a91f640288ef06090a64ee3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e90db127277823e92398e112c9c23f2005554f95	04-Apr-2015	Mingyao Yang <mingyao@google.com>	Add missing RecordPcInfo's for intrinsics_x86. Change-Id: I5ad856e57f63e2bd350f62e7023911c276427edd
fb8d279bc011b31d0765dc7ca59afea324fd0d0c	01-Apr-2015	Mark Mendell <mark.p.mendell@intel.com>	[optimizing] Implement x86/x86_64 math intrinsics Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
3e90a96f403cbc353731e6687fe12a088f996cee	27-Mar-2015	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	[optimizing] Do not inline intrinsics The intrinsics generally have specialized code and the code for them may be faster than what can be achieved with inlining. Thus inliner should skip intrinsics. At the same time, easy methods are not worth intrinsifying: ie String length and isEmpty. Those can be handled by inliner with no problem and can actually lead to better code since call is not kept around through all of the optimizations. Change-Id: Iab38e6c33f79efa54d845d4871cf26fa9b235ab0 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
512e04d1ea7fb33e3992715fe55be8a834d4a79c	27-Mar-2015	Nicolas Geoffray <ngeoffray@google.com>	Fix typos spotted by Andreas. Change-Id: I564b4bc5995d91f4c6c4e4f2427ed7c279cb8740
d75948ac93a4a317feaf136cae78823071234ba5	27-Mar-2015	Nicolas Geoffray <ngeoffray@google.com>	Intrinsify String.compareTo. Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
09ed1a3125849ec6ac07cb886e3c502e1dcfada2	25-Mar-2015	Mark Mendell <mark.p.mendell@intel.com>	[optimizing] Implement X86 intrinsic support Implement the supported intrinsics for X86. Enhance the graph visualizer to print <U> for unallocated locations, to allow calling the graph dumper from within register allocation for debugging purposes. Change-Id: I3b0319eb70a9a4ea228f67065b4c52d13a1ae775 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>