History log of /art/compiler/dex/quick/x86/assemble_x86.cc
Revision Date Author Comments
41b175aba41c9365a1c53b8a1afbd17129c87c14 19-May-2015 Vladimir Marko <vmarko@google.com> ART: Clean up arm64 kNumberOfXRegisters usage.

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd02024d20e60b197d3adfbb43cc303cf29e0)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82
dd17bc3806e800d3b82d5cb27e85ccc1c4e2ee1d 27-Apr-2015 nikolay serdjuk <nikolay.y.serdjuk@intel.com> Fix for incorrect encode and parse of PEXTRW instruction

The instruction PEXTRW encoded by sequence 66 0F 3A 15
was incorrectly encoded in compiler table and incorrectly
parsed by disassembler.

Signed-off-by: nikolay serdjuk <nikolay.y.serdjuk@intel.com>

(cherry picked from commit e0705f51fdc71e9670a29f8c3a47168f50724b35)

Change-Id: I7f051e23789aa3745d6eb854c97f80c475748b74
e0705f51fdc71e9670a29f8c3a47168f50724b35 27-Apr-2015 nikolay serdjuk <nikolay.y.serdjuk@intel.com> Fix for incorrect encode and parse of PEXTRW instruction

The instruction PEXTRW encoded by sequence 66 0F 3A 15
was incorrectly encoded in compiler table and incorrectly
parsed by disassembler.

Change-Id: Ib4d4db923cb15a76e74f13f6b5514cb0d1cbe164
Signed-off-by: nikolay serdjuk <nikolay.y.serdjuk@intel.com>
c4013ea00d9e63533f3badeed0131bb2eb859c90 22-Apr-2015 Chao-ying Fu <chao-ying.fu@intel.com> ART: Fix addpd opcode, add Quick x86 assembler test

This patch fixes the addpd opcode that may be used by vectorizations,
and adds an assembler test for the Quick x86 assembler, currently
lightly testing addpd, subpd and mulpd.

Change-Id: I29455a86212829c75fd75737679280f167da7b5b
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
2cebb24bfc3247d3e9be138a3350106737455918 22-Apr-2015 Mathieu Chartier <mathieuc@google.com> Replace NULL with nullptr

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
1961b609bfefaedb71cee3651c4f931cc3e7393d 08-Apr-2015 Vladimir Marko <vmarko@google.com> Quick: PC-relative loads from dex cache arrays on x86.

Rewrite all PC-relative addressing on x86 and implement
PC-relative loads from dex cache arrays. Don't adjust the
base to point to the start of the method, let it point to
the anchor, i.e. the target of the "call +0" insn.

Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
f6737f7ed741b15cfd60c2530dab69f897540735 23-Mar-2015 Vladimir Marko <vmarko@google.com> Quick: Clean up Mir2Lir codegen.

Clean up WrapPointer()/UnwrapPointer() and OpPcRelLoad().

Change-Id: I1a91f01e1e779599c77f3f6efcac2a6ad34629cf
0b9203e7996ee1856f620f95d95d8a273c43a3df 23-Jan-2015 Andreas Gampe <agampe@google.com> ART: Some Quick cleanup

Make several fields const in CompilationUnit. May benefit some Mir2Lir
code that repeats tests, and in general immutability is good.

Remove compiler_internals.h and refactor some other headers to reduce
overly broad imports (and thus forced recompiles on changes).

Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
27dee8bcd7b4a53840b60818da8d2c819ef199bd 02-Dec-2014 Mark Mendell <mark.p.mendell@intel.com> X86_64 QBE: use RIP addressing

Take advantage of RIP addressing in 64 bit mode to improve the code
generation for accesses to the constant area as well as packed switches.
Avoid computing the address of the start of the method, which is needed
in 32 bit mode.

To do this, we add a new 'pseudo-register' kRIPReg to minimize the
changes needed to get the new addressing mode to be generated.

Change-Id: Ia28c93f98b09939806d91ff0bd7392e58996d108
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
f18b92f1ae57e6eba6b18bee4c34ddbbd8bda74d 14-Nov-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> LSRA: Fix X86 shuffle flags

The shuffle opcodes for X86 have incorrect flags. Fix them.

Clean up a couple of the printable string names too to remove an extra
"kX86".

Change-Id: I52a0ebdb1334cf0904bc2399eaf28b7cda041112
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
b28c1c06236751aa5c9e64dcb68b3c940341e496 08-Nov-2014 Ian Rogers <irogers@google.com> Tidy RegStorage for X86.

Don't use global variables initialized in constructors to hold onto constant
values, instead use the TargetReg32 helper. Improve this helper with the use
of lookup tables. Elsewhere prefer to use constexpr values as they will have
less runtime cost.
Add an ostream operator to RegStorage for CHECK_EQ and use.

Change-Id: Ib8d092d46c10dac5909ecdff3cc1e18b7e9b1633
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f 31-Oct-2014 Ian Rogers <irogers@google.com> Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags.

Fix associated errors about unused paramenters and implict sign conversions.
For sign conversion this was largely in the area of enums, so add ostream
operators for the effected enums and fix tools/generate-operator-out.py.
Tidy arena allocation code and arena allocated data types, rather than fixing
new and delete operators.
Remove dead code.

Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
5f70c79c81f171ff0aa126d58bfbe98772ab7e33 29-Oct-2014 Mark Mendell <mark.p.mendell@intel.com> X86 QBE: Mark kX86StartOfMethod as defining reg 0

kX86StartOfMethod should be marked as writing to register 0, as that is
the destination register for the instruction.

Change-Id: I99cf24afa4c11d9ccdd4295f3481351e9eb4ee1f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
07140838a3ee44a6056cacdc78f2930e019107da 01-Oct-2014 Ian Rogers <irogers@google.com> Enable -Wunreachable-code

Caught bugs in DeoptimizeStackVisitor and assemble_x86 SIB encoding.
Add UNREACHABLE macro to document code expected to be unreachable.
Bug: 17731047

Change-Id: I2e363fe5b38a1246354d98be18c902a6031c0b9e
ae9f3e6ef9f97f47416f829448e5281e9a57d8b8 23-Sep-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> ART: Fix movnti assembler

Movnti was receiving rex prefix before its opcode. Additionally,
the 64-bit version was missing the rex.w prefix.

Change-Id: Ie5c3bbe109765a0b990cafeeea1ee30329daabd0
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a870bc5495b20a830ebd8342b49ef148bbff72dd 09-Sep-2014 Haitao Feng <haitao.feng@intel.com> ART: Address three issues with x86 assembler before enabling load store elimination.

1) Remove the IS_LOAD attribute from LEA instructions.
2) Change the attribute of fp stack instructions from IS_UNARY_OP to IS_BINARY_OP as
operands[1] will be used to compute GetInstructionOffset.
3) Add IS_MOVE attribute for general register move instructions.

Change-Id: I7054df47956f2acecf579ff7acfde385fd8ac194
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
b3a84e2f308b3ed7d17b8e96fc7adfcac36ebe77 28-Jul-2014 Lupusoru, Razvan A <razvan.a.lupusoru@intel.com> ART: Vectorization opcode implementation fixes

This patch fixes the implementation of the x86 vectorization opcodes.

Change-Id: I0028d54a9fa6edce791b7e3a053002d076798748
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
Signed-off-by: Philbert Lin <philbert.lin@intel.com>
b5bce7cc9f1130ab4932ba8e6917c362bf871f24 25-Jul-2014 Jean Christophe Beyler <jean.christophe.beyler@intel.com> ART: Add non-temporal store support

Added non-temporal store support as a hint from the ME.
Added the implementation of the memory barrier
extended instruction that supports non-temporal stores
by explicitly serializing all previous store-to-memory instructions.

Change-Id: I8205a92083f9725253d8ce893671a133a0b6849d
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
f40f890ae3acd7b3275355ec90e2814bba8d4fd6 14-Aug-2014 Yixin Shou <yixin.shou@intel.com> Implement inlined shift long for 32bit

Added support for x86 inlined shift long for 32bit

Change-Id: I6caef60dd7d80227c3057fd6f64b0ecb11025afa
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
e70f179aca4f13b15be8a47a4d9e5b6c2422c69a 09-Aug-2014 Haitao Feng <haitao.feng@intel.com> ART: Fix two small DumpLIRInsn issues for x86_64 port.

Change-Id: I81ef32380bfc73d6c2bfc37a7f4903d912a5d9c8
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
cf8184164650d7686b9f685850463f5976bc3251 24-Jul-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: Fix Test32RM

This patch fixes Test32RM use flags and the format.

Change-Id: I486cb7f27e65caeefccbd3bbcc38257ddca033c8
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
2bc477043b6ab2d7b4719ba8debf0a6a5b10c225 31-Jul-2014 Mark Mendell <mark.p.mendell@intel.com> Set REG0_USED on X86 Set8R instruction

Since this instruction only affects the low byte of the register, it is
preceded by an XOR to zero the upper 3 bytes. Set8R isn't marked as
using operand 0 as an input. In practice, this works for now, as the
Xor sets the CC and Set8R uses the CC (although not that of the Xor, but
of a Cmp generally).

This just marks REG0 as using the previous contents of the register, as
it is modifying only 1 byte of 4.

Change-Id: I7a69cbdb06979da5d5d2ae17fabd7c22c5a17701
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
fd0c237e7d80ad567399e760203f3cda404bf4c5 31-Jul-2014 Mark Mendell <mark.p.mendell@intel.com> X86: Assembler: Correct r8_form for some cases

Set r8_form to false for instruction formats that don't reference
registers.

Change-Id: Ib01edef4ef7f22de25a31dc4207889bff97d163d
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
35690630b82dc1dbf4a7ada37225893a550ea1e0 16-Jul-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86: Fix assembler for Pextr

Pextr family instructions use r/m part of rmod byte as destination.
This should be handled appropriately.
Disassembler works correctly.

Change-Id: I89d00cb11ae792e9d28c178ba79a0bc1fa27e3c5
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
147eb41b53729ec8d5c188d1cac90964a51afb8a 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73

Conflicts:
compiler/dex/quick/arm64/target_arm64.cc
compiler/image_test.cc
runtime/fault_handler.cc
1222c96fafe98061cfc57d3bd115f46edb64e624 15-Jul-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: inline Math.Max/Min (float and double)

This implements the inlined version of Math.Max/Min intrinsics.

Change-Id: I2db8fa7603db3cdf01016ec26811a96f91b1e6ed
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
Signed-off-by: Shou, Yixin <yixin.shou@intel.com>
69dfe51b684dd9d510dbcb63295fe180f998efde 11-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Revert "Revert "Add implicit null and stack checks for x86""""

This reverts commit 0025a86411145eb7cd4971f9234fc21c7b4aced1.

Bug: 16256184
Change-Id: Ie0760a0c293aa3b62e2885398a8c512b7a946a73
7fb36ded9cd5b1d254b63b3091f35c1e6471b90e 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Revert "Add implicit null and stack checks for x86""

Fixes x86_64 cross compile issue. Removes command line options
and property to set implicit checks - this is hard coded now.

This reverts commit 3d14eb620716e92c21c4d2c2d11a95be53319791.

Change-Id: I5404473b5aaf1a9c68b7181f5952cb174d93a90d
0025a86411145eb7cd4971f9234fc21c7b4aced1 11-Jul-2014 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Revert "Add implicit null and stack checks for x86"""

Broke the build.

This reverts commit 7fb36ded9cd5b1d254b63b3091f35c1e6471b90e.

Change-Id: I9df0e7446ff0913a0e1276a558b2ccf6c8f4c949
34e826ccc80dc1cf7c4c045de6b7f8360d504ccf 29-May-2014 Dave Allison <dallison@google.com> Add implicit null and stack checks for x86

This adds compiler and runtime changes for x86
implicit checks. 32 bit only.

Both host and target are supported.
By default, on the host, the implicit checks are null pointer and
stack overflow. Suspend is implemented but not switched on.

Change-Id: I88a609e98d6bf32f283eaa4e6ec8bbf8dc1df78a
021b60f31a4443081e63591e184b5d707bba28c1 09-Jul-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: GenInlinedCas must use wide rl_src_offset under 64-bit targets

This patch fixes to use wide rl_src_offset for int and long types
under 64-bit targets, and fixes movzx8 and movsx8 to use r8_form
on the second register only.

Change-Id: Ib8c0756609100f9bc5c228f1eb391421416f3af6
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
3d14eb620716e92c21c4d2c2d11a95be53319791 10-Jul-2014 Dave Allison <dallison@google.com> Revert "Add implicit null and stack checks for x86"

It breaks cross compilation with x86_64.

This reverts commit 34e826ccc80dc1cf7c4c045de6b7f8360d504ccf.

Change-Id: I34ba07821fc0a022fda33a7ae21850957bbec5e7
60bfe7b3e8f00f0a8ef3f5d8716adfdf86b71f43 09-Jul-2014 Udayan Banerji <udayan.banerji@intel.com> X86 Backend support for vectorized float and byte 16x16 operations

Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.

Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.

Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
94f3eb0c757d0a6a145e24ef95ef7d35c091bb01 24-Jun-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86_64: Clean-up after cmp-long fix

The patch adresses the coments from review done by Ian Rogers.
Clean-up of assembler.

Change-Id: I9dbb350dfc6645f8a63d624b2b785233529459a9
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
01a50d660bd1fa692b132a24ec0f18f402e69de2 06-Jul-2014 Mark Mendell <mark.p.mendell@intel.com> Fix missing dependency in new X86 instruction

AX is written by this opcode. Note the dependency.

Change-Id: I25209e1fb4ceb0387269436c8b00730b1caa03bc
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
c5e4ce116e4d44bfdf162f0c949e77772d7e0654 10-Jun-2014 nikolay serdjuk <nikolay.y.serdjuk@intel.com> x86_64: Fix intrinsics

The following intrinsics have been ported:

- Abs(double/long/int/float)
- String.indexOf/charAt/compareTo/is_empty/length
- Float.floatToRawIntBits, Float.intBitsToFloat
- Double.doubleToRawLongBits, Double.longBitsToDouble
- Thread.currentThread
- Unsafe.getInt/Long/Object, Unsafe.putInt/Long/Object
- Math.sqrt, Math.max, Math.min
- Long.reverseBytes

Math.min and max for longs have been implemented for x86_64.

Commented out until good tests available:
- Memory.peekShort/Int/Long, Memory.pokeShort/Int/Long

Turned off on x86-64 as reported having problems
- Cas

Change-Id: I934bc9c90fdf953be0d3836a17b6ee4e7c98f244
5192cbb12856b12620dc346758605baaa1469ced 01-Jul-2014 Yixin Shou <yixin.shou@intel.com> Load 64 bit constant into GPR by single instruction for 64bit mode

This patch load 64 bit constant into a register by a single movabsq
instruction on 64 bit bit instead of previous mov, shift, add
instruction sequences.

Change-Id: I9d013c4f6c0b5c2e43bd125f91436263c7e6028c
Signed-off-by: Yixin Shou <yixin.shou@intel.com>
dd64450b37776f68b9bfc47f8d9a88bc72c95727 01-Jul-2014 Elena Sayapina <elena.v.sayapina@intel.com> x86_64: Unify 64-bit check in x86 compiler

Update x86-specific Gen64Bit() check with the CompilationUnit target64 field
which is set using unified Is64BitInstructionSet(InstructionSet) check.

Change-Id: Ic00ac863ed19e4543d7ea878d6c6c76d0bd85ce8
Signed-off-by: Elena Sayapina <elena.v.sayapina@intel.com>
fb0fecffb31398adb6f74f58482f2c4aac95b9bf 20-Jun-2014 Olivier Come <olivier.come@intel.com> ART: Add HADDPS/HADDPD/SHUFPS/SHUFPD instruction generation

The patch adds the HADDPS, HADDPD, SHUFPS, and SHUFPD instruction generation
for X86.

Change-Id: Ida105d3e57be231a5331564c1a9bc298cf176ce6
Signed-off-by: Olivier Come <olivier.come@intel.com>
e63d9d4e42e659a365a9a06b910852ebc297f457 24-Jun-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86_64: int-to-long should ensure that int in kCoreReg

it is possible that int in xmm so implementation of int-to-long
should ensure that src in core reg before usage of move with
sign extension which does not support xmm case.

Change-Id: Ibab9df7564f0f1c1f3e1f5ff67c38f1a5e3cdb69
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
1c55703526827b5fc63f5d4b8477f36574649342 23-Jun-2014 Serguei Katkov <serguei.i.katkov@intel.com> x86_64: Correct fix for cmp-long

We cannot rely on the sign of the sub instruction because
LONG_MAX - LONG_MIN = -1 and the sign will indicate that
LONG_MAX < KONG_MIN and it is incorrect.

The fix also contains small improvement for load wide constant.

Change-Id: I74df70d7c198cebff5cad8c1d5614c1d29b79a1b
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
bd3682eada753de52975ae2b4a712bd87dc139a6 11-Jun-2014 Alexei Zavjalov <alexei.zavjalov@intel.com> ART: Implement rem_double/rem_float for x86/x86-64

This adds inlined version of the rem_double/rem_float bytecodes
for x86/x86-64 platforms. This patch also removes unnecessary
fmod and fmodf stubs from runtime.

Change-Id: I2311aa2adf08d6614527e0da070e3b6ce2343a20
Signed-off-by: Alexei Zavjalov <alexei.zavjalov@intel.com>
5aa6e04061ced68cca8111af1e9c19781b8a9c5d 14-Jun-2014 Ian Rogers <irogers@google.com> Tidy x86 assembler.

Use helper functions to compute when the kind has a SIB, a ModRM and RegReg
form.

Change-Id: I86a5cb944eec62451c63281265e6974cd7a08e07
7e399fd3a99ba9c9dbfafdf14f75dd318fa7d454 11-Jun-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: Disable all optimizations and fix bugs

This disables all optimizations and ensures that art tests still pass.

Change-Id: I43217378d6889bb04f4d064f8d53cb3ff4c20aa0
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
8dea81ca9c0201ceaa88086b927a5838a06a3e69 06-Jun-2014 Vladimir Marko <vmarko@google.com> Rewrite use/def masks to support 128 bits.

Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.

Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
0f9b9c508814a62c6e21c6a06cfe4de39b5036c0 09-Jun-2014 Ian Rogers <irogers@google.com> Tidy up x86 assembler and fix byte register encoding.

Also fix reg storage int size issues.
Also fix bad use of byte registers in GenInlinedCas.

Change-Id: Id47424f36f9000e051110553e0b51816910e2fe8
ade54a2fba4ec977dc4ff019004a2ba68e9ea075 09-Jun-2014 Mark Mendell <mark.p.mendell@intel.com> X86_64: Fix core.oat compilation issues

Fix neg-long and X86Mir2Lir::GenInstanceofFinal

Change-Id: I7fbcc1a89857cc461f74b55573ac6cb7c8e64561
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
e0ccdc0dd166136cd43e5f54201179a4496d33e8 07-Jun-2014 Chao-ying Fu <chao-ying.fu@intel.com> x86_64: Add long bytecode supports (1/2)

This patch includes switch enabling and GenFillArray,
assembler changes, updates of regalloc behavior for 64-bit,
usage in basic utility operations, loading constants,
and update for memory operations.

Change-Id: I6d8aa35a75c5fd01d69c38a770c3398d0188cc8a
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
ffddfdf6fec0b9d98a692e27242eecb15af5ead2 03-Jun-2014 Tim Murray <timmurray@google.com> DO NOT MERGE

Merge ART from AOSP to lmp-preview-dev.

Change-Id: I0f578733a4b8756fd780d4a052ad69b746f687a9
a20468c004264592f309a548fc71ba62a69b8742 30-Apr-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Support r8-r15, xmm8-xmm15 in assembler

Added REX support. The TARGET_REX_SUPPORT should be used during build.

Change-Id: I82b457ff5085c8192ad873923bd939fbb91022ce
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
96992e8f2eddba05dc38a15cc7d4e705e8db4022 19-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> x86_64: Add 64-bit version of instructions in asm

Add missed 64-bit versions of instructions.

Change-Id: I8151484d909dff487cb7e521494a0be249a42214
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
fe94578b63380f464c3abd5c156b7b31d068db6c 22-May-2014 Mark Mendell <mark.p.mendell@intel.com> Implement all vector instructions for X86

Add X86 code generation for the vector operations. Added support for
X86 disassembler for the new instructions.

Change-Id: I72b48f5efa3a516a16bb1dd4bdb5c9270a8db53a
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d65c51a556e6649db4e18bd083c8fec37607a442 29-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Add support for constant vector literals

Add in some vector instructions. Implement the ConstVector
instruction, which takes 4 words of data and loads it into
an XMM register.

Initially, only the ConstVector MIR opcode is implemented. Others will
be added after this one goes in.

Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
9ee801f5308aa3c62ae3bedae2658612762ffb91 12-May-2014 Dmitry Petrochenko <dmitry.petrochenko@intel.com> Add x86_64 code generation support

Utilizes r0..r7 in register allocator, implements spill/unsill
core regs as well as operations with stack pointer.

Change-Id: I973d5a1acb9aa735f6832df3d440185d9e896c67
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
9ed427724a18dc24f9eb2ddf39e4729bea203c2e 07-May-2014 Mark Mendell <mark.p.mendell@intel.com> X86: EmitArrayImm shouldn't truncate to 16 bits

The code in X86Mir2Lir::EmitArrayImm() always truncates the immediate
value to 16 bits. This can't be right. The code in EmitImm() will check
the expected immediate size from the entry.

Change-Id: I75b3b96e41777838b0f243d65f3f2ded2e1dbdd2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2637f2e9bf4fc5591994b7c0158afead88321a7c 30-Apr-2014 Mark Mendell <mark.p.mendell@intel.com> ART: Update and correct assemble_x86.cc

Correct the definition of some X86 instructions in the file.
Add some new instructions and the code to emit them properly.

Added EmitMemCond()

Change-Id: Icf4b70236cf0ca857c85dcb3edb218f26be458eb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
660188264dee3c8f3510e2e24c11816c6b60f197 06-May-2014 Andreas Gampe <agampe@google.com> ART: Use utils.h::RoundUp instead of explicit bit-fiddling

Change-Id: I249a2cfeb044d3699d02e13d42b8e72518571640
091cc408e9dc87e60fb64c61e186bea568fc3d3a 31-Mar-2014 buzbee <buzbee@google.com> Quick compiler: allocate doubles as doubles

Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.

Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.

Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.

TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.

Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
99ad7230ccaace93bf323dea9790f35fe991a4a2 26-Feb-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Relaxed memory barriers for x86

X86 provides stronger memory guarantees and thus the memory barriers can be
optimized. This patch ensures that all memory barriers for x86 are treated
as scheduling barriers. And in cases where a barrier is needed (StoreLoad case),
an mfence is used.

Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
60d7a65f7fb60f502160a2e479e86014c7787553 14-Mar-2014 Brian Carlstrom <bdc@google.com> Fix stack overflow for mutual recursion.

There was an error where we would have a pc that was in the method
which generated the stack overflow. This didn't work however
because the stack overflow check was before we stored the method in
the stack. The result was that the stack overflow handler had a PC
which wasnt necessarily in the method at the top of the stack. This
is now fixed by always restoring the link register before branching
to the throw entrypoint.

Slight code size regression on ARM/Mips (unmeasured). Regression on ARM
is 4 bytes of code per stack overflow check. Some of this regression is
mitigated by having one less GC safepoint.

Also adds test case for StackOverflowError issue (from bdc).

Tests passing: ARM, X86, Mips
Phone booting: ARM

Bug: https://code.google.com/p/android/issues/detail?id=66411
Bug: 12967914
Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d

(cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
c0f96d03a1855fda7d94332331b94860404874dd 14-Mar-2014 Brian Carlstrom <bdc@google.com> Fix stack overflow for mutual recursion.

There was an error where we would have a pc that was in the method
which generated the stack overflow. This didn't work however
because the stack overflow check was before we stored the method in
the stack. The result was that the stack overflow handler had a PC
which wasnt necessarily in the method at the top of the stack. This
is now fixed by always restoring the link register before branching
to the throw entrypoint.

Slight code size regression on ARM/Mips (unmeasured). Regression on ARM
is 4 bytes of code per stack overflow check. Some of this regression is
mitigated by having one less GC safepoint.

Also adds test case for StackOverflowError issue (from bdc).

Tests passing: ARM, X86, Mips
Phone booting: ARM

Bug: https://code.google.com/p/android/issues/detail?id=66411
Bug: 12967914
Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
e90501da0222717d75c126ebf89569db3976927e 12-Mar-2014 Serguei Katkov <serguei.i.katkov@intel.com> Add dependency for operations with x86 FPU stack

Load Hoisting optimization can re-order operations with
FPU stack due to no dependency set.

Patch adds resource dependency between these operations.

Change-Id: Iccce98c8f3c565903667c03803884d9de1281ea8
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
4028a6c83a339036864999fdfd2855b012a9f1a7 20-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Inline x86 String.indexOf

Take advantage of the presence of a constant search char or start index
to tune the generated code.

Change-Id: I0adcf184fb91b899a95aa4d8ef044a14deb51d88
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
614c2b4e219631e8c190fd9fd5d4d9cd343434e1 29-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Support to generate inline long to FP bytecodes for x86

long-to-float and long-to-double are now generated inline instead of calling
a helper routine. The conversion is done by using x87.

Change-Id: I196e526afec1be212898baceca8527549c3655b6
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
55d0eac918321e0525f6e6491f36a80977e0d416 06-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Support Direct Method/Type access for X86

Thumb generates code to optimize calls to methods within core.oat.
Implement this for X86 as well, but take advantage of mov with 32 bit
immediate and call relative with 32 bit immediate.

Fix some incorrect return locations for long inlines.

Change-Id: I1907bdfc7574f3d0aa76c7fad13dc537acdf1ed3
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2c498d1f28e62e81fbdb477ff93ca7454e7493d7 30-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Specializing x86 range argument copying

The ARM implementation of range argument copying was specialized in some cases.
For all other architectures, it would fall back to generating memcpy. This patch
updates the x86 implementation so it does not call memcpy and instead generates
loads and stores, favoring movement of 128-bit chunks.

Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
67c39c4aefca23cb136157b889c09ee200b3dec6 01-Feb-2014 Mark Mendell <mark.p.mendell@intel.com> Support Literal pools for x86

They are being used to store double constants, which are very
expensive to generate into XMM registers. Uses the 'Compiler
Temporary' support just added. The MIR instructions are scanned for
a reference to a double constant, a packed switch or a FillArray.
These all need the address of the start of the method, since 32
bit x86 doesn't have a PC-relative addressing mode.

If needed, a compiler temporary is allocated, and the address of
the base of the method is calculated, and stored. Later uses can
just refer to the saved value.

Trickiness comes when generating the load from the literal area,
as the offset is unknown before final assembler. Assume a 32 bit
displacement is needed, and fix this if it wasn't necessary.

Use LoadValue to load the 'base of method' pointer. Fix an incorrect
test in GetRegLocation.

Change-Id: I53ffaa725dabc370e9820c4e0e78664ede3563e6
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
feb2b4e2d1c6538777bb80b60f3a247537b6221d 28-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Redo x86 int arithmetic

Make Mir2Lir::GenArithOpInt virtual, and implement an x86 version of it
to allow use of memory operands and knowledge of the fact that x86 has
(mostly) two operand instructions. Remove x86 specific code from the
generic version.

Add StoreFinalValue (matches StoreFinalValueWide) to handle the non-wide
cases. Add some x86 helper routines to simplify generation.

Change-Id: I6c13689c6da981f2570ab5af7a97f9816108b7ae
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
d3266bcc340d653e178e3ab9d74512c8db122eee 24-Jan-2014 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Reduce x86 sequence for GP pair to XMM

Added support for punpckldq which is useful for interleaving
32-bit values from two xmm registers.

This new instruction is now used for transfers from GP pairs
to XMM in order to reduce path length.

Change-Id: I70d9b69449dfcfb9a94a628deb74a7cffe96bac7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4708dcd68eebf1173aef1097dad8ab13466059aa 22-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long multiply and shifts

Generate inline code for long shifts by constants and do long
multiplication inline. Convert multiplication by a constant to a
shift when we can. Fix some x86 assembler problems and add the new
instructions that were needed (64 bit shifts).

Change-Id: I6237a31c36159096e399d40d01eb6bfa22ac2772
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
2bf31e67694da24a19fc1f328285cebb1a4b9964 23-Jan-2014 Mark Mendell <mark.p.mendell@intel.com> Improve x86 long divide

Implement inline division for literal and variable divisors. Use the
general case for dividing by a literal by using a double length multiply
by the appropriate constant with fixups. This is the Hacker's Delight
algorithm.

Change-Id: I563c250f99d89fca5ff8bcbf13de74de13815cfe
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
bd288c2c1206bc99fafebfb9120a83f13cf9723b 21-Dec-2013 Razvan A Lupusoru <razvan.a.lupusoru@intel.com> Add conditional move support to x86 and allow GenMinMax to use it

X86 supports conditional moves which is useful for reducing branchiness.
This patch adds support to the x86 backend to generate conditional reg
to reg operations. Both encoder and decoder support was added for cmov.

The x86 version of GenMinMax used for generating inlined version Math.min/max
has been updated to make use of the conditional move support.

Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
343adb52d3f031b6b5e005ff51f9cb04df219b21 18-Dec-2013 Mark Mendell <mark.p.mendell@intel.com> Enhance GenArrayGet, GenArrayPut for x86

As pointed out by Ian Rogers, the x86 versions didn't optimize
handling of constant index expressions. Added that support,
simplified checking of constant indices, and removed the use of
a temporary register for the 'wide' cases by using x86 scaled
addressing mode.

Change-Id: I82174e4e3674752d00d7c4730496f59d69f5f173
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
12f96283471dea664d26c185b2185445cdc49a46 16-Dec-2013 Vladimir Marko <vmarko@google.com> Fix minor style issues

Follow-up to I082aa20041c933ae5fc78f12ddf491d1c775c683.

Change-Id: Ia334b192bdba231b0b9a2b2f2d7d18fcff2ca836
bff1ef0746048978b877c0664f758d2d6006f27d 13-Dec-2013 Mark Mendell <mark.p.mendell@intel.com> Implement GenInlinedSqrt for x86

Implemented this using the hardware instruction, which handles
NaN properly.

Tested manually using host mode.

Change-Id: I082aa20041c933ae5fc78f12ddf491d1c775c683
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
a6fd8ba27bc84dfb942a8fa4ea987bcb39f0f3f1 13-Dec-2013 Vladimir Marko <vmarko@google.com> Fix 64-bit CAS for x86.

Bug: 12117970
Change-Id: I9fbba2291124a2594161782c89dc62201cf01c08
70b797d998f2a28e39f7d6ffc8a07c9cbc47da14 03-Dec-2013 Vladimir Marko <vmarko@google.com> Unsafe.compareAndSwapLong() intrinsic for x86.

Change-Id: Idbc5371a62dfdd84485a657d4548990519200205
057c74a3a2d50d1247d4e6472763ca6f59060762 03-Dec-2013 Vladimir Marko <vmarko@google.com> Add support for emitting x86 kArray instructions.

And factor out a lot of common code.

Change-Id: Ib1f135e341404f8a6f92fcef0047ec04577d32cd
c29bb614c60e0eb9a2bacf90f6dfce796344021e 27-Nov-2013 Vladimir Marko <vmarko@google.com> Unsafe.compareAndSwapInt()/Object() intrinsics for x86.

Bug: 11391018
Change-Id: I0a97375103917b0e9e20f199304c17a7f849c361
1da1e2fceb0030b4b76b43510b1710a9613e0c2e 15-Nov-2013 buzbee <buzbee@google.com> More compile-time tuning

Another round of compile-time tuning, this time yeilding in the
vicinity of 3% total reduction in compile time (which means about
double that for the Quick Compile portion).

Primary improvements are skipping the basic block combine optimization
pass when using Quick (because we already have big blocks), combining
the null check elimination and type inference passes, and limiting
expensive local value number analysis to only those blocks which
might benefit from it.

Following this CL, the actual compile phase consumes roughly 60%
of the total dex2oat time on the host, and 55% on the target (Note,
I'm subtracting out the Deduping time here, which the timing logger
normally counts against the compiler).

A sample breakdown of the compilation time follows (this taken on
PlusOne.apk w/ a Nexus 4):

39.00% -> MIR2LIR: 1374.90 (Note: includes local optimization & scheduling)
10.25% -> MIROpt:SSATransform: 361.31
8.45% -> BuildMIRGraph: 297.80
7.55% -> Assemble: 266.16
6.87% -> MIROpt:NCE_TypeInference: 242.22
5.56% -> Dedupe: 196.15
3.45% -> MIROpt:BBOpt: 121.53
3.20% -> RegisterAllocation: 112.69
3.00% -> PcMappingTable: 105.65
2.90% -> GcMap: 102.22
2.68% -> Launchpads: 94.50
1.16% -> MIROpt:InitRegLoc: 40.94
1.16% -> Cleanup: 40.93
1.10% -> MIROpt:CodeLayout: 38.80
0.97% -> MIROpt:ConstantProp: 34.35
0.96% -> MIROpt:UseCount: 33.75
0.86% -> MIROpt:CheckFilters: 30.28
0.44% -> SpecialMIR2LIR: 15.53
0.44% -> MIROpt:BBCombine: 15.41

(cherry pick of 9e8e234af4430abe8d144414e272cd72d215b5f3)

Change-Id: I86c665fa7e88b75eb75629a99fd292ff8c449969
a8b4caf7526b6b66a8ae0826bd52c39c66e3c714 24-Oct-2013 Vladimir Marko <vmarko@google.com> Add byte swap instructions for ARM and x86.

Change-Id: I03fdd61ffc811ae521141f532b3e04dda566c77d
17088bbded68e35da8050a40206dfd3cbba9e6d2 28-Oct-2013 Vladimir Marko <vmarko@google.com> Fix invalid DCHECK for movzx/movsx.

k86Movzx8RM and kMovsx8RM don't have to use eax/ecx/edx/ebx.
The incorrect check could fail for LoadBaseDisp() with
kUnsignedByte or kSignedByte.

Change-Id: I777f14cf372c7b211ad8c595d4a8a47533bdd0fc
a61f49539a59b610e557b5513695295639496750 23-Aug-2013 buzbee <buzbee@google.com> Add timing logger to Quick compiler

Current Quick compiler breakdown for compiling the boot class path:

MIR2LIR: 29.674%
MIROpt:SSATransform: 17.656%
MIROpt:BBOpt: 11.508%
BuildMIRGraph: 7.815%
Assemble: 6.898%
MIROpt:ConstantProp: 5.151%
Cleanup: 4.916%
MIROpt:NullCheckElimination: 4.085%
RegisterAllocation: 3.972%
GcMap: 2.359%
Launchpads: 2.147%
PcMappingTable: 2.145%
MIROpt:CodeLayout: 0.697%
LiteralData: 0.654%
SpecialMIR2LIR: 0.323%

Change-Id: I9f77e825faf79e6f6b214bb42edcc4b36f55d291
e6ed00ba91da535fbe1d0b5a5705e99da149d82e 24-Oct-2013 Vladimir Marko <vmarko@google.com> Fix x86 code generation for 0x0F 0x3A 0x?? instructions.

Change-Id: I9b2b2190787d1e5674818159aa96e513d6325b54
0d82948094d9a198e01aa95f64012bdedd5b6fc9 12-Oct-2013 buzbee <buzbee@google.com> 64-bit prep

Preparation for 64-bit roll.
o Eliminated storing pointers in 32-bit int slots in LIR.
o General size reductions of common structures to reduce impact
of doubled pointer sizes:
- BasicBlock struct was 72 bytes, now is 48.
- MIR struct was 72 bytes, now is 64.
- RegLocation was 12 bytes, now is 8.
o Generally replaced uses of BasicBlock* pointers with 16-bit Ids.
o Replaced several doubly-linked lists with singly-linked to save
one stored pointer per node.
o We had quite a few uses of uintptr_t's that were a holdover from
the JIT (which used pointers to mapped dex & actual code cache
addresses rather than trace-relative offsets). Replaced those with
uint32_t's.
o Clean up handling of embedded data for switch tables and array data.
o Miscellaneous cleanup.

I anticipate one or two additional CLs to reduce the size of MIR and LIR
structs.

Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
409fe94ad529d9334587be80b9f6a3d166805508 11-Oct-2013 buzbee <buzbee@google.com> Quick assembler fix

This CL re-instates the select pattern optimization disabled by
CL 374310, and fixes the underlying problem: improper handling of
the kPseudoBarrier LIR opcode. The bug was introduced in the
recent assembler restructuring. In short, LIR pseudo opcodes (which
have values < 0), should always have size 0 - and thus cause no
bits to be emitted during assembly. In this case, bad logic caused
us to set the size of a kPseudoBarrier opcode via lookup through the
EncodingMap.

Because all pseudo ops are < 0, this meant we did an array underflow
load, picking up whatever garbage was located before the EncodingMap.
This explains why this error showed up recently - we'd previuosly just
gotten a lucky layout.

This CL corrects the faulty logic, and adds DCHECKs to uses of
the EncodingMap to ensure that we don't try to access w/ a
pseudo op. Additionally, the existing is_pseudo_op() macro is
replaced with IsPseudoLirOp(), named similar to the existing
IsPseudoMirOp().

Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
b48819db07f9a0992a72173380c24249d7fc648a 15-Sep-2013 buzbee <buzbee@google.com> Compile-time tuning: assembly phase

Not as much compile-time gain from reworking the assembly phase as I'd
hoped, but still worthwhile. Should see ~2% improvement thanks to
the assembly rework. On the other hand, expect some huge gains for some
application thanks to better detection of large machine-generated init
methods. Thinkfree shows a 25% improvement.

The major assembly change was to establish thread the LIR nodes that
require fixup into a fixup chain. Only those are processed during the
final assembly pass(es). This doesn't help for methods which only
require a single pass to assemble, but does speed up the larger methods
which required multiple assembly passes.

Also replaced the block_map_ basic block lookup table (which contained
space for a BasicBlock* for each dex instruction unit) with a block id
map - cutting its space requirements by half in a 32-bit pointer
environment.

Changes:
o Reduce size of LIR struct by 12.5% (one of the big memory users)
o Repurpose the use/def portion of the LIR after optimization complete.
o Encode instruction bits to LIR
o Thread LIR nodes requiring pc fixup
o Change follow-on assembly passes to only consider fixup LIRs
o Switch on pc-rel fixup kind
o Fast-path for small methods - single pass assembly
o Avoid using cb[n]z for null checks (almost always exceed displacement)
o Improve detection of large initialization methods.
o Rework def/use flag setup.
o Remove a sequential search from FindBlock using lookup table of 16-bit
block ids rather than full block pointers.
o Eliminate pcRelFixup and use fixup kind instead.
o Add check for 16-bit overflow on dex offset.

Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
252254b130067cd7a5071865e793966871ae0246 09-Sep-2013 buzbee <buzbee@google.com> More Quick compile-time tuning: labels & branches

This CL represents a roughly 3.5% performance improvement for the
compile phase of dex2oat. Move of the gain comes from avoiding
the generation of dex boundary LIR labels unless a debug listing
is requested. The other significant change is moving from a basic block
ending branch model of "always generate a fall-through branch, and then
delete it if we can" to a "only generate a fall-through branch if we need
it" model.

The data motivating these changes follow. Note that two area of
potentially attractive gain remain: restructing the assembler model and
reworking the register handling utilities. These will be addressed
in subsequent CLs.

--- data follows

The Quick compiler's assembler has shown up on profile reports a bit
more than seems reasonable. We've tried a few quick fixes to apparently
hot portions of the code, but without much gain. So, I've been looking at
the assembly process at a somewhat higher level. There look to be several
potentially good opportunities.

First, an analysis of the makeup of the LIR graph showed a surprisingly
high proportion of LIR pseudo ops. Using the boot classpath as a basis,
we get:

32.8% of all LIR nodes are pseudo ops.
10.4% are LIR instructions which require pc-relative fixups.
11.8% are LIR instructions that have been nop'd by the various
optimization passes.

Looking only at the LIR pseudo ops, we get:
kPseudoDalvikByteCodeBoundary 43.46%
kPseudoNormalBlockLabel 21.14%
kPseudoSafepointPC 20.20%
kPseudoThrowTarget 6.94%
kPseudoTarget 3.03%
kPseudoSuspendTarget 1.95%
kPseudoMethodExit 1.26%
kPseudoMethodEntry 1.26%
kPseudoExportedPC 0.37%
kPseudoCaseLabel 0.30%
kPseudoBarrier 0.07%
kPseudoIntrinsicRetry 0.02%
Total LIR count: 10167292

The standout here is the Dalvik opcode boundary marker. This is just a
label inserted at the beginning of the codegen for each Dalvik bytecode.
If we're also doing a verbose listing, this is also where we hang the
pretty-print disassembly string. However, this label was also
being used as a convenient way to find the target of switch case
statements (and, I think at one point was used in the Mir->GBC conversion
process).

This CL moves the use of kPseudoDalvikByteCodeBoundary labels to only
verbose listing runs, and replaces the codegen uses of the label with
the kPseudoNormalBlockLabel attached to the basic block that contains the
switch case target. Great savings here - 14.3% reduction in the number of
LIR nodes needed. After this CL, our LIR pseudo proportions drop to 21.6%
of all LIR. That's still a lot, but much better. Possible further
improvements via combining normal labels with kPseudoSafepointPC labels
where appropriate, and also perhaps reduce memory usage by using a
short-hand form for labels rather than a full LIR node. Also, many
of the basic block labels are no longer branch targets by the time
we get to assembly - cheaper to delete, or just ingore?

Here's the "after" LIR pseudo op breakdown:

kPseudoNormalBlockLabel 37.39%
kPseudoSafepointPC 35.72%
kPseudoThrowTarget 12.28%
kPseudoTarget 5.36%
kPseudoSuspendTarget 3.45%
kPseudoMethodEntry 2.24%
kPseudoMethodExit 2.22%
kPseudoExportedPC 0.65%
kPseudoCaseLabel 0.53%
kPseudoBarrier 0.12%
kPseudoIntrinsicRetry 0.04%
Total LIR count: 5748232

Not done in this CL, but it will be worth experimenting with actually
deleting LIR nodes from the graph when they are optimized away, rather
than just setting the NOP bit. Keeping them around is invaluable
during debugging - but when not debugging it may pay off if the cost of
node removal is less than the cost of traversing through dead nodes
in subsequent passes.

Next up (and partially in this CL - but mostly to be done in follow-on
CLs) is the overall assembly process. Inherited from the trace JIT,
the Quick compiler has a fairly simple-minded approach to instruction
assembly. First, a pass is made over the LIR list to assign offsets
to each instruction. Then, the assembly pass is made - which generates
the actual machine instruction bit patterns and pushes the instruction
data into the code_buffer. However, the code generator takes the "always
optimistic" approach to instruction selection and emits the shortest
instruction. If, during assembly, we find that a branch or load doesn't
reach, that short-form instruction is replaces with a longer sequence.

Of course, this invalidates the previously-computed offset calculations.
Assembly thus is an iterative process: compute offsets and then assemble
until we survive an assembly pass without invalidation. This seems
like a likely candidate for improvement. First, I analyzed the
number of retries required, and the reason for invalidation over the
boot classpath load.

The results: more than half of methods don't require a retry, and
very few require more than 1 extra pass:

5 or more: 6 of 96334
4 or more: 22 of 96334
3 or more: 140 of 96334
2 or more: 1794 of 96334 - 2%
1 or more: 40911 of 96334 - 40%
0 retries: 55423 of 96334 - 58%

The interesting group here is the one that requires 1 retry. Looking
at the reason, we see three typical reasons:

1. A cbnz/cbz doesn't reach (only 7 bits of offset)
2. A 16-bit Thumb1 unconditional branch doesn't reach.
3. An unconditional branch which branches to the next instruction
is encountered, and deleted.

The first 2 cases are the cost of the optimistic strategy - nothing
much to change there. However, the interesting case is #3 - dead
branch elimination. A further analysis of the single retry group showed
that 42% of the methods (16305) that required a single retry did so
*only* because of dead branch elimination. The big question here is
why so many dead branches survive to the assembly stage. We have
a dead branch elimination pass which is supposed to catch these - perhaps
it's not working correctly, should be moved later in the optimization
process, or perhaps run multiple times.

Other things to consider:

o Combine the offset generation pass with the assembly pass. Skip
pc-relative fixup assembly (other than assigning offset), but push
LIR* for them into work list. Following the main pass, zip through
the work list and assemble the pc-relative instructions (now that we
know the offsets). This would significantly cut back on traversal
costs.

o Store the assembled bits into both the code buffer and the LIR.
In the event we have to retry, only the pc-relative instructions
would need to be assembled, and we'd finish with a pass over the
LIR just to dumb the bits into the code buffer.

Change-Id: I50029d216fa14f273f02b6f1c8b6a0dde5a7d6a6
7934ac288acfb2552bb0b06ec1f61e5820d924a4 26-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comments issues

Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
b1eba213afaf7fa6445de863ddc9680ab99762ea 18-Jul-2013 Brian Carlstrom <bdc@google.com> Fix cpplint whitespace/comma issues

Change-Id: I456fc8d80371d6dfc07e6d109b7f478c25602b65
7940e44f4517de5e2634a7e07d58d0fb26160513 12-Jul-2013 Brian Carlstrom <bdc@google.com> Create separate Android.mk for main build targets

The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.

Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81