Cross Reference: /art/compiler/dex/quick/x86/assemble

History log of /art/compiler/dex/quick/x86/assemble_x86.cc
Revision	Date	Author	Comments
41b175aba41c9365a1c53b8a1afbd17129c87c14	19-May-2015	Vladimir Marko <vmarko@google.com>	ART: Clean up arm64 kNumberOfXRegisters usage. Avoid undefined behavior for arm64 stemming from 1u << 32 in loops with upper bound kNumberOfXRegisters. Create iterators for enumerating bits in an integer either from high to low or from low to high and use them for <arch>Context::FillCalleeSaves() on all architectures. Refactor runtime/utils.{h,cc} by moving all bit-fiddling functions to runtime/base/bit_utils.{h,cc} (together with the new bit iterators) and all time-related functions to runtime/base/time_utils.{h,cc}. Improve test coverage and fix some corner cases for the bit-fiddling functions. Bug: 13925192 (cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0) Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
dd17bc3806e800d3b82d5cb27e85ccc1c4e2ee1d	27-Apr-2015	nikolay serdjuk <nikolay.y.serdjuk@intel.com>	Fix for incorrect encode and parse of PEXTRW instruction The instruction PEXTRW encoded by sequence 66 0F 3A 15 was incorrectly encoded in compiler table and incorrectly parsed by disassembler. Signed-off-by: nikolay serdjuk <nikolay.y.serdjuk@intel.com> (cherry picked from commit e0705f51fdc71e9670a29f8c3a47168f50724b35) Change-Id: I7f051e23789aa3745d6eb854c97f80c475748b74
e0705f51fdc71e9670a29f8c3a47168f50724b35	27-Apr-2015	nikolay serdjuk <nikolay.y.serdjuk@intel.com>	Fix for incorrect encode and parse of PEXTRW instruction The instruction PEXTRW encoded by sequence 66 0F 3A 15 was incorrectly encoded in compiler table and incorrectly parsed by disassembler. Change-Id: Ib4d4db923cb15a76e74f13f6b5514cb0d1cbe164 Signed-off-by: nikolay serdjuk <nikolay.y.serdjuk@intel.com>
c4013ea00d9e63533f3badeed0131bb2eb859c90	22-Apr-2015	Chao-ying Fu <chao-ying.fu@intel.com>	ART: Fix addpd opcode, add Quick x86 assembler test This patch fixes the addpd opcode that may be used by vectorizations, and adds an assembler test for the Quick x86 assembler, currently lightly testing addpd, subpd and mulpd. Change-Id: I29455a86212829c75fd75737679280f167da7b5b Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
2cebb24bfc3247d3e9be138a3350106737455918	22-Apr-2015	Mathieu Chartier <mathieuc@google.com>	Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
1961b609bfefaedb71cee3651c4f931cc3e7393d	08-Apr-2015	Vladimir Marko <vmarko@google.com>	Quick: PC-relative loads from dex cache arrays on x86. Rewrite all PC-relative addressing on x86 and implement PC-relative loads from dex cache arrays. Don't adjust the base to point to the start of the method, let it point to the anchor, i.e. the target of the "call +0" insn. Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
f6737f7ed741b15cfd60c2530dab69f897540735	23-Mar-2015	Vladimir Marko <vmarko@google.com>	Quick: Clean up Mir2Lir codegen. Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad(). Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
0b9203e7996ee1856f620f95d95d8a273c43a3df	23-Jan-2015	Andreas Gampe <agampe@google.com>	ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
27dee8bcd7b4a53840b60818da8d2c819ef199bd	02-Dec-2014	Mark Mendell <mark.p.mendell@intel.com>	X86_64 QBE: use RIP addressing Take advantage of RIP addressing in 64 bit mode to improve the code generation for accesses to the constant area as well as packed switches. Avoid computing the address of the start of the method, which is needed in 32 bit mode. To do this, we add a new 'pseudo-register' kRIPReg to minimize the changes needed to get the new addressing mode to be generated. Change-Id: Ia28c93f98b09939806d91ff0bd7392e58996d108 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
f18b92f1ae57e6eba6b18bee4c34ddbbd8bda74d	14-Nov-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	LSRA: Fix X86 shuffle flags The shuffle opcodes for X86 have incorrect flags. Fix them. Clean up a couple of the printable string names too to remove an extra "kX86". Change-Id: I52a0ebdb1334cf0904bc2399eaf28b7cda041112 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b28c1c06236751aa5c9e64dcb68b3c940341e496	08-Nov-2014	Ian Rogers <irogers@google.com>	Tidy RegStorage for X86. Don't use global variables initialized in constructors to hold onto constant values, instead use the TargetReg32 helper. Improve this helper with the use of lookup tables. Elsewhere prefer to use constexpr values as they will have less runtime cost. Add an ostream operator to RegStorage for CHECK_EQ and use. Change-Id: Ib8d092d46c10dac5909ecdff3cc1e18b7e9b1633
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f	31-Oct-2014	Ian Rogers <irogers@google.com>	Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5f70c79c81f171ff0aa126d58bfbe98772ab7e33	29-Oct-2014	Mark Mendell <mark.p.mendell@intel.com>	X86 QBE: Mark kX86StartOfMethod as defining reg 0 kX86StartOfMethod should be marked as writing to register 0, as that is the destination register for the instruction. Change-Id: I99cf24afa4c11d9ccdd4295f3481351e9eb4ee1f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
07140838a3ee44a6056cacdc78f2930e019107da	01-Oct-2014	Ian Rogers <irogers@google.com>	Enable -Wunreachable-code Caught bugs in DeoptimizeStackVisitor and assemble_x86 SIB encoding. Add UNREACHABLE macro to document code expected to be unreachable. Bug: 17731047 Change-Id: I2e363fe5b38a1246354d98be18c902a6031c0b9e
ae9f3e6ef9f97f47416f829448e5281e9a57d8b8	23-Sep-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	ART: Fix movnti assembler Movnti was receiving rex prefix before its opcode. Additionally, the 64-bit version was missing the rex.w prefix. Change-Id: Ie5c3bbe109765a0b990cafeeea1ee30329daabd0 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a870bc5495b20a830ebd8342b49ef148bbff72dd	09-Sep-2014	Haitao Feng <haitao.feng@intel.com>	ART: Address three issues with x86 assembler before enabling load store elimination. 1) Remove the IS_LOAD attribute from LEA instructions. 2) Change the attribute of fp stack instructions from IS_UNARY_OP to IS_BINARY_OP as operands[1] will be used to compute GetInstructionOffset. 3) Add IS_MOVE attribute for general register move instructions. Change-Id: I7054df47956f2acecf579ff7acfde385fd8ac194 Signed-off-by: Haitao Feng <haitao.feng@intel.com>
b3a84e2f308b3ed7d17b8e96fc7adfcac36ebe77	28-Jul-2014	Lupusoru, Razvan A <razvan.a.lupusoru@intel.com>	ART: Vectorization opcode implementation fixes This patch fixes the implementation of the x86 vectorization opcodes. Change-Id: I0028d54a9fa6edce791b7e3a053002d076798748 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com> Signed-off-by: Philbert Lin <philbert.lin@intel.com>
b5bce7cc9f1130ab4932ba8e6917c362bf871f24	25-Jul-2014	Jean Christophe Beyler <jean.christophe.beyler@intel.com>	ART: Add non-temporal store support Added non-temporal store support as a hint from the ME. Added the implementation of the memory barrier extended instruction that supports non-temporal stores by explicitly serializing all previous store-to-memory instructions. Change-Id: I8205a92083f9725253d8ce893671a133a0b6849d Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
f40f890ae3acd7b3275355ec90e2814bba8d4fd6	14-Aug-2014	Yixin Shou <yixin.shou@intel.com>	Implement inlined shift long for 32bit Added support for x86 inlined shift long for 32bit Change-Id: I6caef60dd7d80227c3057fd6f64b0ecb11025afa Signed-off-by: Yixin Shou <yixin.shou@intel.com>
e70f179aca4f13b15be8a47a4d9e5b6c2422c69a	09-Aug-2014	Haitao Feng <haitao.feng@intel.com>	ART: Fix two small DumpLIRInsn issues for x86_64 port. Change-Id: I81ef32380bfc73d6c2bfc37a7f4903d912a5d9c8 Signed-off-by: Haitao Feng <haitao.feng@intel.com>
cf8184164650d7686b9f685850463f5976bc3251	24-Jul-2014	Chao-ying Fu <chao-ying.fu@intel.com>	x86_64: Fix Test32RM This patch fixes Test32RM use flags and the format. Change-Id: I486cb7f27e65caeefccbd3bbcc38257ddca033c8 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
2bc477043b6ab2d7b4719ba8debf0a6a5b10c225	31-Jul-2014	Mark Mendell <mark.p.mendell@intel.com>	Set REG0_USED on X86 Set8R instruction Since this instruction only affects the low byte of the register, it is preceded by an XOR to zero the upper 3 bytes. Set8R isn't marked as using operand 0 as an input. In practice, this works for now, as the Xor sets the CC and Set8R uses the CC (although not that of the Xor, but of a Cmp generally). This just marks REG0 as using the previous contents of the register, as it is modifying only 1 byte of 4. Change-Id: I7a69cbdb06979da5d5d2ae17fabd7c22c5a17701 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
fd0c237e7d80ad567399e760203f3cda404bf4c5	31-Jul-2014	Mark Mendell <mark.p.mendell@intel.com>	X86: Assembler: Correct r8_form for some cases Set r8_form to false for instruction formats that don't reference registers. Change-Id: Ib01edef4ef7f22de25a31dc4207889bff97d163d Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
35690630b82dc1dbf4a7ada37225893a550ea1e0	16-Jul-2014	Serguei Katkov <serguei.i.katkov@intel.com>	x86: Fix assembler for Pextr Pextr family instructions use r/m part of rmod byte as destination. This should be handled appropriately. Disassembler works correctly. Change-Id: I89d00cb11ae792e9d28c178ba79a0bc1fa27e3c5 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
147eb41b53729ec8d5c188d1cac90964a51afb8a	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73 Conflicts: compiler/dex/quick/arm64/target_arm64.cc compiler/image_test.cc runtime/fault_handler.cc
1222c96fafe98061cfc57d3bd115f46edb64e624	15-Jul-2014	Alexei Zavjalov <alexei.zavjalov@intel.com>	ART: inline Math.Max/Min (float and double) This implements the inlined version of Math.Max/Min intrinsics. Change-Id: I2db8fa7603db3cdf01016ec26811a96f91b1e6ed Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com> Signed-off-by: Shou, Yixin <yixin.shou@intel.com>
69dfe51b684dd9d510dbcb63295fe180f998efde	11-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86"""" This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1. Bug: 16256184 Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e	10-Jul-2014	Dave Allison <dallison@google.com>	Revert "Revert "Add implicit null and stack checks for x86"" Fixes x86_64 cross compile issue. Removes command line options and property to set implicit checks - this is hard coded now. This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791. Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
0025a86411145eb7cd4971f9234fc21c7b4aced1	11-Jul-2014	Nicolas Geoffray <ngeoffray@google.com>	Revert "Revert "Revert "Add implicit null and stack checks for x86""" Broke the build. This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e. Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf	29-May-2014	Dave Allison <dallison@google.com>	Add implicit null and stack checks for x86 This adds compiler and runtime changes for x86 implicit checks. 32 bit only. Both host and target are supported. By default, on the host, the implicit checks are null pointer and stack overflow. Suspend is implemented but not switched on. Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
021b60f31a4443081e63591e184b5d707bba28c1	09-Jul-2014	Chao-ying Fu <chao-ying.fu@intel.com>	x86_64: GenInlinedCas must use wide rl_src_offset under 64-bit targets This patch fixes to use wide rl_src_offset for int and long types under 64-bit targets, and fixes movzx8 and movsx8 to use r8_form on the second register only. Change-Id: Ib8c0756609100f9bc5c228f1eb391421416f3af6 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
3d14eb620716e92c21c4d2c2d11a95be53319791	10-Jul-2014	Dave Allison <dallison@google.com>	Revert "Add implicit null and stack checks for x86" It breaks cross compilation with x86_64. This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf. Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43	09-Jul-2014	Udayan Banerji <udayan.banerji@intel.com>	X86 Backend support for vectorized float and byte 16x16 operations Add support for reserving vector registers for the duration of vector loop. Add support for 16x16 multiplication, shifts, and add reduce. Changed the vectorization implementation to be able to use the dataflow elements for SSA recreation and fixed a few implementation details. Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98 Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com> Signed-off-by: Olivier Come <olivier.come@intel.com> Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
94f3eb0c757d0a6a145e24ef95ef7d35c091bb01	24-Jun-2014	Serguei Katkov <serguei.i.katkov@intel.com>	x86_64: Clean-up after cmp-long fix The patch adresses the coments from review done by Ian Rogers. Clean-up of assembler. Change-Id: I9dbb350dfc6645f8a63d624b2b785233529459a9 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
01a50d660bd1fa692b132a24ec0f18f402e69de2	06-Jul-2014	Mark Mendell <mark.p.mendell@intel.com>	Fix missing dependency in new X86 instruction AX is written by this opcode. Note the dependency. Change-Id: I25209e1fb4ceb0387269436c8b00730b1caa03bc Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
c5e4ce116e4d44bfdf162f0c949e77772d7e0654	10-Jun-2014	nikolay serdjuk <nikolay.y.serdjuk@intel.com>	x86_64: Fix intrinsics The following intrinsics have been ported: - Abs(double/long/int/float) - String.indexOf/charAt/compareTo/is_empty/length - Float.floatToRawIntBits, Float.intBitsToFloat - Double.doubleToRawLongBits, Double.longBitsToDouble - Thread.currentThread - Unsafe.getInt/Long/Object, Unsafe.putInt/Long/Object - Math.sqrt, Math.max, Math.min - Long.reverseBytes Math.min and max for longs have been implemented for x86_64. Commented out until good tests available: - Memory.peekShort/Int/Long, Memory.pokeShort/Int/Long Turned off on x86-64 as reported having problems - Cas Change-Id: I934bc9c90fdf953be0d3836a17b6ee4e7c98f244
5192cbb12856b12620dc346758605baaa1469ced	01-Jul-2014	Yixin Shou <yixin.shou@intel.com>	Load 64 bit constant into GPR by single instruction for 64bit mode This patch load 64 bit constant into a register by a single movabsq instruction on 64 bit bit instead of previous mov, shift, add instruction sequences. Change-Id: I9d013c4f6c0b5c2e43bd125f91436263c7e6028c Signed-off-by: Yixin Shou <yixin.shou@intel.com>
dd64450b37776f68b9bfc47f8d9a88bc72c95727	01-Jul-2014	Elena Sayapina <elena.v.sayapina@intel.com>	x86_64: Unify 64-bit check in x86 compiler Update x86-specific Gen64Bit() check with the CompilationUnit target64 field which is set using unified Is64BitInstructionSet(InstructionSet) check. Change-Id: Ic00ac863ed19e4543d7ea878d6c6c76d0bd85ce8 Signed-off-by: Elena Sayapina <elena.v.sayapina@intel.com>
fb0fecffb31398adb6f74f58482f2c4aac95b9bf	20-Jun-2014	Olivier Come <olivier.come@intel.com>	ART: Add HADDPS/HADDPD/SHUFPS/SHUFPD instruction generation The patch adds the HADDPS, HADDPD, SHUFPS, and SHUFPD instruction generation for X86. Change-Id: Ida105d3e57be231a5331564c1a9bc298cf176ce6 Signed-off-by: Olivier Come <olivier.come@intel.com>
e63d9d4e42e659a365a9a06b910852ebc297f457	24-Jun-2014	Serguei Katkov <serguei.i.katkov@intel.com>	x86_64: int-to-long should ensure that int in kCoreReg it is possible that int in xmm so implementation of int-to-long should ensure that src in core reg before usage of move with sign extension which does not support xmm case. Change-Id: Ibab9df7564f0f1c1f3e1f5ff67c38f1a5e3cdb69 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
1c55703526827b5fc63f5d4b8477f36574649342	23-Jun-2014	Serguei Katkov <serguei.i.katkov@intel.com>	x86_64: Correct fix for cmp-long We cannot rely on the sign of the sub instruction because LONG_MAX - LONG_MIN = -1 and the sign will indicate that LONG_MAX < KONG_MIN and it is incorrect. The fix also contains small improvement for load wide constant. Change-Id: I74df70d7c198cebff5cad8c1d5614c1d29b79a1b Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
bd3682eada753de52975ae2b4a712bd87dc139a6	11-Jun-2014	Alexei Zavjalov <alexei.zavjalov@intel.com>	ART: Implement rem_double/rem_float for x86/x86-64 This adds inlined version of the rem_double/rem_float bytecodes for x86/x86-64 platforms. This patch also removes unnecessary fmod and fmodf stubs from runtime. Change-Id: I2311aa2adf08d6614527e0da070e3b6ce2343a20 Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
5aa6e04061ced68cca8111af1e9c19781b8a9c5d	14-Jun-2014	Ian Rogers <irogers@google.com>	Tidy x86 assembler. Use helper functions to compute when the kind has a SIB, a ModRM and RegReg form. Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
7e399fd3a99ba9c9dbfafdf14f75dd318fa7d454	11-Jun-2014	Chao-ying Fu <chao-ying.fu@intel.com>	x86_64: Disable all optimizations and fix bugs This disables all optimizations and ensures that art tests still pass. Change-Id: I43217378d6889bb04f4d064f8d53cb3ff4c20aa0 Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
8dea81ca9c0201ceaa88086b927a5838a06a3e69	06-Jun-2014	Vladimir Marko <vmarko@google.com>	Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
0f9b9c508814a62c6e21c6a06cfe4de39b5036c0	09-Jun-2014	Ian Rogers <irogers@google.com>	Tidy up x86 assembler and fix byte register encoding. Also fix reg storage int size issues. Also fix bad use of byte registers in GenInlinedCas. Change-Id: Id47424f36f9000e051110553e0b51816910e2fe8
ade54a2fba4ec977dc4ff019004a2ba68e9ea075	09-Jun-2014	Mark Mendell <mark.p.mendell@intel.com>	X86_64: Fix core.oat compilation issues Fix neg-long and X86Mir2Lir::GenInstanceofFinal Change-Id: I7fbcc1a89857cc461f74b55573ac6cb7c8e64561 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e0ccdc0dd166136cd43e5f54201179a4496d33e8	07-Jun-2014	Chao-ying Fu <chao-ying.fu@intel.com>	x86_64: Add long bytecode supports (1/2) This patch includes switch enabling and GenFillArray, assembler changes, updates of regalloc behavior for 64-bit, usage in basic utility operations, loading constants, and update for memory operations. Change-Id: I6d8aa35a75c5fd01d69c38a770c3398d0188cc8a Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com> Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com> Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com> Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ffddfdf6fec0b9d98a692e27242eecb15af5ead2	03-Jun-2014	Tim Murray <timmurray@google.com>	DO NOT MERGE Merge ART from AOSP to lmp-preview-dev. Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
a20468c004264592f309a548fc71ba62a69b8742	30-Apr-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	x86_64: Support r8-r15, xmm8-xmm15 in assembler Added REX support. The TARGET_REX_SUPPORT should be used during build. Change-Id: I82b457ff5085c8192ad873923bd939fbb91022ce Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
96992e8f2eddba05dc38a15cc7d4e705e8db4022	19-May-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	x86_64: Add 64-bit version of instructions in asm Add missed 64-bit versions of instructions. Change-Id: I8151484d909dff487cb7e521494a0be249a42214 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
fe94578b63380f464c3abd5c156b7b31d068db6c	22-May-2014	Mark Mendell <mark.p.mendell@intel.com>	Implement all vector instructions for X86 Add X86 code generation for the vector operations. Added support for X86 disassembler for the new instructions. Change-Id: I72b48f5efa3a516a16bb1dd4bdb5c9270a8db53a Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d65c51a556e6649db4e18bd083c8fec37607a442	29-Apr-2014	Mark Mendell <mark.p.mendell@intel.com>	ART: Add support for constant vector literals Add in some vector instructions. Implement the ConstVector instruction, which takes 4 words of data and loads it into an XMM register. Initially, only the ConstVector MIR opcode is implemented. Others will be added after this one goes in. Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
9ee801f5308aa3c62ae3bedae2658612762ffb91	12-May-2014	Dmitry Petrochenko <dmitry.petrochenko@intel.com>	Add x86_64 code generation support Utilizes r0..r7 in register allocator, implements spill/unsill core regs as well as operations with stack pointer. Change-Id: I973d5a1acb9aa735f6832df3d440185d9e896c67 Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
9ed427724a18dc24f9eb2ddf39e4729bea203c2e	07-May-2014	Mark Mendell <mark.p.mendell@intel.com>	X86: EmitArrayImm shouldn't truncate to 16 bits The code in X86Mir2Lir::EmitArrayImm() always truncates the immediate value to 16 bits. This can't be right. The code in EmitImm() will check the expected immediate size from the entry. Change-Id: I75b3b96e41777838b0f243d65f3f2ded2e1dbdd2 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2637f2e9bf4fc5591994b7c0158afead88321a7c	30-Apr-2014	Mark Mendell <mark.p.mendell@intel.com>	ART: Update and correct assemble_x86.cc Correct the definition of some X86 instructions in the file. Add some new instructions and the code to emit them properly. Added EmitMemCond() Change-Id: Icf4b70236cf0ca857c85dcb3edb218f26be458eb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
660188264dee3c8f3510e2e24c11816c6b60f197	06-May-2014	Andreas Gampe <agampe@google.com>	ART: Use utils.h::RoundUp instead of explicit bit-fiddling Change-Id: I249a2cfeb044d3699d02e13d42b8e72518571640
091cc408e9dc87e60fb64c61e186bea568fc3d3a	31-Mar-2014	buzbee <buzbee@google.com>	Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
99ad7230ccaace93bf323dea9790f35fe991a4a2	26-Feb-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Relaxed memory barriers for x86 X86 provides stronger memory guarantees and thus the memory barriers can be optimized. This patch ensures that all memory barriers for x86 are treated as scheduling barriers. And in cases where a barrier is needed (StoreLoad case), an mfence is used. Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
60d7a65f7fb60f502160a2e479e86014c7787553	14-Mar-2014	Brian Carlstrom <bdc@google.com>	Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d (cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
c0f96d03a1855fda7d94332331b94860404874dd	14-Mar-2014	Brian Carlstrom <bdc@google.com>	Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
e90501da0222717d75c126ebf89569db3976927e	12-Mar-2014	Serguei Katkov <serguei.i.katkov@intel.com>	Add dependency for operations with x86 FPU stack Load Hoisting optimization can re-order operations with FPU stack due to no dependency set. Patch adds resource dependency between these operations. Change-Id: Iccce98c8f3c565903667c03803884d9de1281ea8 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
4028a6c83a339036864999fdfd2855b012a9f1a7	20-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Inline x86 String.indexOf Take advantage of the presence of a constant search char or start index to tune the generated code. Change-Id: I0adcf184fb91b899a95aa4d8ef044a14deb51d88 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
614c2b4e219631e8c190fd9fd5d4d9cd343434e1	29-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Support to generate inline long to FP bytecodes for x86 long-to-float and long-to-double are now generated inline instead of calling a helper routine. The conversion is done by using x87. Change-Id: I196e526afec1be212898baceca8527549c3655b6 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
55d0eac918321e0525f6e6491f36a80977e0d416	06-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Support Direct Method/Type access for X86 Thumb generates code to optimize calls to methods within core.oat. Implement this for X86 as well, but take advantage of mov with 32 bit immediate and call relative with 32 bit immediate. Fix some incorrect return locations for long inlines. Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2c498d1f28e62e81fbdb477ff93ca7454e7493d7	30-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Specializing x86 range argument copying The ARM implementation of range argument copying was specialized in some cases. For all other architectures, it would fall back to generating memcpy. This patch updates the x86 implementation so it does not call memcpy and instead generates loads and stores, favoring movement of 128-bit chunks. Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
67c39c4aefca23cb136157b889c09ee200b3dec6	01-Feb-2014	Mark Mendell <mark.p.mendell@intel.com>	Support Literal pools for x86 They are being used to store double constants, which are very expensive to generate into XMM registers. Uses the 'Compiler Temporary' support just added. The MIR instructions are scanned for a reference to a double constant, a packed switch or a FillArray. These all need the address of the start of the method, since 32 bit x86 doesn't have a PC-relative addressing mode. If needed, a compiler temporary is allocated, and the address of the base of the method is calculated, and stored. Later uses can just refer to the saved value. Trickiness comes when generating the load from the literal area, as the offset is unknown before final assembler. Assume a 32 bit displacement is needed, and fix this if it wasn't necessary. Use LoadValue to load the 'base of method' pointer. Fix an incorrect test in GetRegLocation. Change-Id: I53ffaa725dabc370e9820c4e0e78664ede3563e6 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
feb2b4e2d1c6538777bb80b60f3a247537b6221d	28-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Redo x86 int arithmetic Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it to allow use of memory operands and knowledge of the fact that x86 has (mostly) two operand instructions. Remove x86 specific code from the generic version. Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide cases. Add some x86 helper routines to simplify generation. Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d3266bcc340d653e178e3ab9d74512c8db122eee	24-Jan-2014	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Reduce x86 sequence for GP pair to XMM Added support for punpckldq which is useful for interleaving 32-bit values from two xmm registers. This new instruction is now used for transfers from GP pairs to XMM in order to reduce path length. Change-Id: I70d9b69449dfcfb9a94a628deb74a7cffe96bac7 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4708dcd68eebf1173aef1097dad8ab13466059aa	22-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Improve x86 long multiply and shifts Generate inline code for long shifts by constants and do long multiplication inline. Convert multiplication by a constant to a shift when we can. Fix some x86 assembler problems and add the new instructions that were needed (64 bit shifts). Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2bf31e67694da24a19fc1f328285cebb1a4b9964	23-Jan-2014	Mark Mendell <mark.p.mendell@intel.com>	Improve x86 long divide Implement inline division for literal and variable divisors. Use the general case for dividing by a literal by using a double length multiply by the appropriate constant with fixups. This is the Hacker's Delight algorithm. Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bd288c2c1206bc99fafebfb9120a83f13cf9723b	21-Dec-2013	Razvan A Lupusoru <razvan.a.lupusoru@intel.com>	Add conditional move support to x86 and allow GenMinMax to use it X86 supports conditional moves which is useful for reducing branchiness. This patch adds support to the x86 backend to generate conditional reg to reg operations. Both encoder and decoder support was added for cmov. The x86 version of GenMinMax used for generating inlined version Math.min/max has been updated to make use of the conditional move support. Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
343adb52d3f031b6b5e005ff51f9cb04df219b21	18-Dec-2013	Mark Mendell <mark.p.mendell@intel.com>	Enhance GenArrayGet, GenArrayPut for x86 As pointed out by Ian Rogers, the x86 versions didn't optimize handling of constant index expressions. Added that support, simplified checking of constant indices, and removed the use of a temporary register for the 'wide' cases by using x86 scaled addressing mode. Change-Id: I82174e4e3674752d00d7c4730496f59d69f5f173 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
12f96283471dea664d26c185b2185445cdc49a46	16-Dec-2013	Vladimir Marko <vmarko@google.com>	Fix minor style issues Follow-up to I082aa20041c933ae5fc78f12ddf491d1c775c683. Change-Id: Ia334b192bdba231b0b9a2b2f2d7d18fcff2ca836
bff1ef0746048978b877c0664f758d2d6006f27d	13-Dec-2013	Mark Mendell <mark.p.mendell@intel.com>	Implement GenInlinedSqrt for x86 Implemented this using the hardware instruction, which handles NaN properly. Tested manually using host mode. Change-Id: I082aa20041c933ae5fc78f12ddf491d1c775c683 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a6fd8ba27bc84dfb942a8fa4ea987bcb39f0f3f1	13-Dec-2013	Vladimir Marko <vmarko@google.com>	Fix 64-bit CAS for x86. Bug: 12117970 Change-Id: I9fbba2291124a2594161782c89dc62201cf01c08
70b797d998f2a28e39f7d6ffc8a07c9cbc47da14	03-Dec-2013	Vladimir Marko <vmarko@google.com>	Unsafe.compareAndSwapLong() intrinsic for x86. Change-Id: Idbc5371a62dfdd84485a657d4548990519200205
057c74a3a2d50d1247d4e6472763ca6f59060762	03-Dec-2013	Vladimir Marko <vmarko@google.com>	Add support for emitting x86 kArray instructions. And factor out a lot of common code. Change-Id: Ib1f135e341404f8a6f92fcef0047ec04577d32cd
c29bb614c60e0eb9a2bacf90f6dfce796344021e	27-Nov-2013	Vladimir Marko <vmarko@google.com>	Unsafe.compareAndSwapInt()/Object() intrinsics for x86. Bug: 11391018 Change-Id: I0a97375103917b0e9e20f199304c17a7f849c361
1da1e2fceb0030b4b76b43510b1710a9613e0c2e	15-Nov-2013	buzbee <buzbee@google.com>	More compile-time tuning Another round of compile-time tuning, this time yeilding in the vicinity of 3% total reduction in compile time (which means about double that for the Quick Compile portion). Primary improvements are skipping the basic block combine optimization pass when using Quick (because we already have big blocks), combining the null check elimination and type inference passes, and limiting expensive local value number analysis to only those blocks which might benefit from it. Following this CL, the actual compile phase consumes roughly 60% of the total dex2oat time on the host, and 55% on the target (Note, I'm subtracting out the Deduping time here, which the timing logger normally counts against the compiler). A sample breakdown of the compilation time follows (this taken on PlusOne.apk w/ a Nexus 4): 39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling) 10.25% -> MIROpt:SSATransform: 361.31 8.45% -> BuildMIRGraph: 297.80 7.55% -> Assemble: 266.16 6.87% -> MIROpt:NCE_TypeInference: 242.22 5.56% -> Dedupe: 196.15 3.45% -> MIROpt:BBOpt: 121.53 3.20% -> RegisterAllocation: 112.69 3.00% -> PcMappingTable: 105.65 2.90% -> GcMap: 102.22 2.68% -> Launchpads: 94.50 1.16% -> MIROpt:InitRegLoc: 40.94 1.16% -> Cleanup: 40.93 1.10% -> MIROpt:CodeLayout: 38.80 0.97% -> MIROpt:ConstantProp: 34.35 0.96% -> MIROpt:UseCount: 33.75 0.86% -> MIROpt:CheckFilters: 30.28 0.44% -> SpecialMIR2LIR: 15.53 0.44% -> MIROpt:BBCombine: 15.41 (cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3) Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
a8b4caf7526b6b66a8ae0826bd52c39c66e3c714	24-Oct-2013	Vladimir Marko <vmarko@google.com>	Add byte swap instructions for ARM and x86. Change-Id: I03fdd61ffc811ae521141f532b3e04dda566c77d
17088bbded68e35da8050a40206dfd3cbba9e6d2	28-Oct-2013	Vladimir Marko <vmarko@google.com>	Fix invalid DCHECK for movzx/movsx. k86Movzx8RM and kMovsx8RM don't have to use eax/ecx/edx/ebx. The incorrect check could fail for LoadBaseDisp() with kUnsignedByte or kSignedByte. Change-Id: I777f14cf372c7b211ad8c595d4a8a47533bdd0fc
a61f49539a59b610e557b5513695295639496750	23-Aug-2013	buzbee <buzbee@google.com>	Add timing logger to Quick compiler Current Quick compiler breakdown for compiling the boot class path: MIR2LIR: 29.674% MIROpt:SSATransform: 17.656% MIROpt:BBOpt: 11.508% BuildMIRGraph: 7.815% Assemble: 6.898% MIROpt:ConstantProp: 5.151% Cleanup: 4.916% MIROpt:NullCheckElimination: 4.085% RegisterAllocation: 3.972% GcMap: 2.359% Launchpads: 2.147% PcMappingTable: 2.145% MIROpt:CodeLayout: 0.697% LiteralData: 0.654% SpecialMIR2LIR: 0.323% Change-Id: I9f77e825faf79e6f6b214bb42edcc4b36f55d291
e6ed00ba91da535fbe1d0b5a5705e99da149d82e	24-Oct-2013	Vladimir Marko <vmarko@google.com>	Fix x86 code generation for 0x0F 0x3A 0x?? instructions. Change-Id: I9b2b2190787d1e5674818159aa96e513d6325b54
0d82948094d9a198e01aa95f64012bdedd5b6fc9	12-Oct-2013	buzbee <buzbee@google.com>	64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
409fe94ad529d9334587be80b9f6a3d166805508	11-Oct-2013	buzbee <buzbee@google.com>	Quick assembler fix This CL re-instates the select pattern optimization disabled by CL 374310, and fixes the underlying problem: improper handling of the kPseudoBarrier LIR opcode. The bug was introduced in the recent assembler restructuring. In short, LIR pseudo opcodes (which have values < 0), should always have size 0 - and thus cause no bits to be emitted during assembly. In this case, bad logic caused us to set the size of a kPseudoBarrier opcode via lookup through the EncodingMap. Because all pseudo ops are < 0, this meant we did an array underflow load, picking up whatever garbage was located before the EncodingMap. This explains why this error showed up recently - we'd previuosly just gotten a lucky layout. This CL corrects the faulty logic, and adds DCHECKs to uses of the EncodingMap to ensure that we don't try to access w/ a pseudo op. Additionally, the existing is_pseudo_op() macro is replaced with IsPseudoLirOp(), named similar to the existing IsPseudoMirOp(). Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
b48819db07f9a0992a72173380c24249d7fc648a	15-Sep-2013	buzbee <buzbee@google.com>	Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
252254b130067cd7a5071865e793966871ae0246	09-Sep-2013	buzbee <buzbee@google.com>	More Quick compile-time tuning: labels & branches This CL represents a roughly 3.5% performance improvement for the compile phase of dex2oat. Move of the gain comes from avoiding the generation of dex boundary LIR labels unless a debug listing is requested. The other significant change is moving from a basic block ending branch model of "always generate a fall-through branch, and then delete it if we can" to a "only generate a fall-through branch if we need it" model. The data motivating these changes follow. Note that two area of potentially attractive gain remain: restructing the assembler model and reworking the register handling utilities. These will be addressed in subsequent CLs. --- data follows The Quick compiler's assembler has shown up on profile reports a bit more than seems reasonable. We've tried a few quick fixes to apparently hot portions of the code, but without much gain. So, I've been looking at the assembly process at a somewhat higher level. There look to be several potentially good opportunities. First, an analysis of the makeup of the LIR graph showed a surprisingly high proportion of LIR pseudo ops. Using the boot classpath as a basis, we get: 32.8% of all LIR nodes are pseudo ops. 10.4% are LIR instructions which require pc-relative fixups. 11.8% are LIR instructions that have been nop'd by the various optimization passes. Looking only at the LIR pseudo ops, we get: kPseudoDalvikByteCodeBoundary 43.46% kPseudoNormalBlockLabel 21.14% kPseudoSafepointPC 20.20% kPseudoThrowTarget 6.94% kPseudoTarget 3.03% kPseudoSuspendTarget 1.95% kPseudoMethodExit 1.26% kPseudoMethodEntry 1.26% kPseudoExportedPC 0.37% kPseudoCaseLabel 0.30% kPseudoBarrier 0.07% kPseudoIntrinsicRetry 0.02% Total LIR count: 10167292 The standout here is the Dalvik opcode boundary marker. This is just a label inserted at the beginning of the codegen for each Dalvik bytecode. If we're also doing a verbose listing, this is also where we hang the pretty-print disassembly string. However, this label was also being used as a convenient way to find the target of switch case statements (and, I think at one point was used in the Mir->GBC conversion process). This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only verbose listing runs, and replaces the codegen uses of the label with the kPseudoNormalBlockLabel attached to the basic block that contains the switch case target. Great savings here - 14.3% reduction in the number of LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6% of all LIR. That's still a lot, but much better. Possible further improvements via combining normal labels with kPseudoSafepointPC labels where appropriate, and also perhaps reduce memory usage by using a short-hand form for labels rather than a full LIR node. Also, many of the basic block labels are no longer branch targets by the time we get to assembly - cheaper to delete, or just ingore? Here's the "after" LIR pseudo op breakdown: kPseudoNormalBlockLabel 37.39% kPseudoSafepointPC 35.72% kPseudoThrowTarget 12.28% kPseudoTarget 5.36% kPseudoSuspendTarget 3.45% kPseudoMethodEntry 2.24% kPseudoMethodExit 2.22% kPseudoExportedPC 0.65% kPseudoCaseLabel 0.53% kPseudoBarrier 0.12% kPseudoIntrinsicRetry 0.04% Total LIR count: 5748232 Not done in this CL, but it will be worth experimenting with actually deleting LIR nodes from the graph when they are optimized away, rather than just setting the NOP bit. Keeping them around is invaluable during debugging - but when not debugging it may pay off if the cost of node removal is less than the cost of traversing through dead nodes in subsequent passes. Next up (and partially in this CL - but mostly to be done in follow-on CLs) is the overall assembly process. Inherited from the trace JIT, the Quick compiler has a fairly simple-minded approach to instruction assembly. First, a pass is made over the LIR list to assign offsets to each instruction. Then, the assembly pass is made - which generates the actual machine instruction bit patterns and pushes the instruction data into the code_buffer. However, the code generator takes the "always optimistic" approach to instruction selection and emits the shortest instruction. If, during assembly, we find that a branch or load doesn't reach, that short-form instruction is replaces with a longer sequence. Of course, this invalidates the previously-computed offset calculations. Assembly thus is an iterative process: compute offsets and then assemble until we survive an assembly pass without invalidation. This seems like a likely candidate for improvement. First, I analyzed the number of retries required, and the reason for invalidation over the boot classpath load. The results: more than half of methods don't require a retry, and very few require more than 1 extra pass: 5 or more: 6 of 96334 4 or more: 22 of 96334 3 or more: 140 of 96334 2 or more: 1794 of 96334 - 2% 1 or more: 40911 of 96334 - 40% 0 retries: 55423 of 96334 - 58% The interesting group here is the one that requires 1 retry. Looking at the reason, we see three typical reasons: 1. A cbnz/cbz doesn't reach (only 7 bits of offset) 2. A 16-bit Thumb1 unconditional branch doesn't reach. 3. An unconditional branch which branches to the next instruction is encountered, and deleted. The first 2 cases are the cost of the optimistic strategy - nothing much to change there. However, the interesting case is #3 - dead branch elimination. A further analysis of the single retry group showed that 42% of the methods (16305) that required a single retry did so only because of dead branch elimination. The big question here is why so many dead branches survive to the assembly stage. We have a dead branch elimination pass which is supposed to catch these - perhaps it's not working correctly, should be moved later in the optimization process, or perhaps run multiple times. Other things to consider: o Combine the offset generation pass with the assembly pass. Skip pc-relative fixup assembly (other than assigning offset), but push LIR* for them into work list. Following the main pass, zip through the work list and assemble the pc-relative instructions (now that we know the offsets). This would significantly cut back on traversal costs. o Store the assembled bits into both the code buffer and the LIR. In the event we have to retry, only the pc-relative instructions would need to be assembled, and we'd finish with a pass over the LIR just to dumb the bits into the code buffer. Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
7934ac288acfb2552bb0b06ec1f61e5820d924a4	26-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
b1eba213afaf7fa6445de863ddc9680ab99762ea	18-Jul-2013	Brian Carlstrom <bdc@google.com>	Fix cpplint whitespace/comma issues Change-Id: I456fc8d80371d6dfc07e6d109b7f478c25602b65
7940e44f4517de5e2634a7e07d58d0fb26160513	12-Jul-2013	Brian Carlstrom <bdc@google.com>	Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81