2cebb24bfc3247d3e9be138a3350106737455918 |
|
22-Apr-2015 |
Mathieu Chartier <mathieuc@google.com> |
Replace NULL with nullptr Also fixed some lines that were too long, and a few other minor details. Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb
|
1961b609bfefaedb71cee3651c4f931cc3e7393d |
|
08-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Quick: PC-relative loads from dex cache arrays on x86. Rewrite all PC-relative addressing on x86 and implement PC-relative loads from dex cache arrays. Don't adjust the base to point to the start of the method, let it point to the anchor, i.e. the target of the "call +0" insn. Change-Id: Ic22544a8bc0c5e49eb00a75154dc8f3ead816989
|
cc23481b66fd1f2b459d82da4852073e32f033aa |
|
07-Apr-2015 |
Vladimir Marko <vmarko@google.com> |
Promote pointer to dex cache arrays on arm. Do the use-count analysis on temps (ArtMethod* and the new PC-relative temp) in Mir2Lir, rather than MIRGraph. MIRGraph isn't really supposed to know how the ArtMethod* is used by the backend. Change-Id: Iaf56a46ae203eca86281b02b54f39a80fe5cc2dd
|
0b9203e7996ee1856f620f95d95d8a273c43a3df |
|
23-Jan-2015 |
Andreas Gampe <agampe@google.com> |
ART: Some Quick cleanup Make several fields const in CompilationUnit. May benefit some Mir2Lir code that repeats tests, and in general immutability is good. Remove compiler_internals.h and refactor some other headers to reduce overly broad imports (and thus forced recompiles on changes). Change-Id: I898405907c68923581373b5981d8a85d2e5d185a
|
e7227c628e6f22a823945cc76e554eb2a8b0d925 |
|
13-Jan-2015 |
Vladimir Marko <vmarko@google.com> |
Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd. If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and we need to record the safepoint for the ldrexd rather than strexd. IGET_WIDE was simply missing the memory barrier. Bug: 18993519 (cherry picked from commit ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c) Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
|
ee5e273e4d0dd91b480c8d5dbcccad15c1b7353c |
|
13-Jan-2015 |
Vladimir Marko <vmarko@google.com> |
Fix wide volatile IGET/IPUT on ARM without atomic ldrd/strd. If ldrd/strd isn't atomic, IPUT_WIDE uses ldrexd+strexd and we need to record the safepoint for the ldrexd rather than strexd. IGET_WIDE was simply missing the memory barrier. Bug: 18993519 Change-Id: I4e9270b994f413c1a047c1c4bb9cce5f29e42cb4
|
aed3ad734c47fdccf179ff65971284a0d38583cd |
|
03-Dec-2014 |
Vladimir Marko <vmarko@google.com> |
Quick: Use fewer insns for ARM LDR/STR with large offsets. LDR with large offset is frequently used for reading from DexCache arrays, for example for static and direct invokes. STR with large offset is rarely used but it's updated for consistency. Change-Id: I75871416cecbfd7fe7de590922cea0376a2f4019
|
a29f698b1754ee0ea2f46b6f5900e0da840dff79 |
|
25-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Implement InexpensiveConstantInt(., opcode) for ARM. Fix kThumb2{Add,Sub}RRI12 to be used for their full range. Add ORN for completeness. Change-Id: I49a51541fa9ea085d4674b9131d8dd94da5337f3
|
174636dad59068fc6e879b147ae02ac932f38c6f |
|
26-Nov-2014 |
Vladimir Marko <vmarko@google.com> |
Quick: Use 16-bit conditional branch in Thumb2. We were using the 32-bit version because the compilation time impact of having to change the instruction length and reassemble instructions when the target is out of range was too high. However, the assembly phase has been rewritten since making that decision and the compile time impact is now insignificant, so we prefer to save space. Change-Id: Ib90f90d3f4e0c4e310267af272e3b16611026bbe
|
d582fa4ea62083a7598dded5b82dc2198b3daac7 |
|
06-Nov-2014 |
Ian Rogers <irogers@google.com> |
Instruction set features for ARM64, MIPS and X86. Also, refactor how feature strings are handled so they are additive or subtractive. Make MIPS have features for FPU 32-bit and MIPS v2. Use in the quick compiler rather than #ifdefs that wouldn't have worked in cross-compilation. Add SIMD features for x86/x86-64 proposed in: https://android-review.googlesource.com/#/c/112370/ Bug: 18056890 Change-Id: Ic88ff84a714926bd277beb74a430c5c7d5ed7666
|
b28c1c06236751aa5c9e64dcb68b3c940341e496 |
|
08-Nov-2014 |
Ian Rogers <irogers@google.com> |
Tidy RegStorage for X86. Don't use global variables initialized in constructors to hold onto constant values, instead use the TargetReg32 helper. Improve this helper with the use of lookup tables. Elsewhere prefer to use constexpr values as they will have less runtime cost. Add an ostream operator to RegStorage for CHECK_EQ and use. Change-Id: Ib8d092d46c10dac5909ecdff3cc1e18b7e9b1633
|
277ccbd200ea43590dfc06a93ae184a765327ad0 |
|
04-Nov-2014 |
Andreas Gampe <agampe@google.com> |
ART: More warnings Enable -Wno-conversion-null, -Wredundant-decls and -Wshadow in general, and -Wunused-but-set-parameter for GCC builds. Change-Id: I81bbdd762213444673c65d85edae594a523836e5
|
6a3c1fcb4ba42ad4d5d142c17a3712a6ddd3866f |
|
31-Oct-2014 |
Ian Rogers <irogers@google.com> |
Remove -Wno-unused-parameter and -Wno-sign-promo from base cflags. Fix associated errors about unused paramenters and implict sign conversions. For sign conversion this was largely in the area of enums, so add ostream operators for the effected enums and fix tools/generate-operator-out.py. Tidy arena allocation code and arena allocated data types, rather than fixing new and delete operators. Remove dead code. Change-Id: I5b433e722d2f75baacfacae4d32aef4a828bfe1b
|
5667fdbb6e441dee7534ade18b628ed396daf593 |
|
23-Oct-2014 |
Zheng Xu <zheng.xu@arm.com> |
ARM: Use hardfp calling convention between java to java call. This patch default to use hardfp calling convention. Softfp can be enabled by setting kArm32QuickCodeUseSoftFloat to true. We get about -1 ~ +5% performance improvement with different benchmark tests. Hopefully, we should be able to get more performance by address the left TODOs, as some part of the code takes the original assumption which is not optimal. DONE: 1. Interpreter to quick code 2. Quick code to interpreter 3. Transition assembly and callee-saves 4. Trampoline(generic jni, resolution, invoke with access check and etc.) 5. Pass fp arg reg following aapcs(gpr and stack do not follow aapcs) 6. Quick helper assembly routines to handle ABI differences 7. Quick code method entry 8. Quick code method invocation 9. JNI compiler TODO: 10. Rework ArgMap, FlushIn, GenDalvikArgs and affected common code. 11. Rework CallRuntimeHelperXXX(). Change-Id: I9965d8a007f4829f2560b63bcbbde271bdcf6ec2
|
677cd61ad05d993c4d3b22656675874f06d6aabc |
|
15-Oct-2014 |
Ian Rogers <irogers@google.com> |
Make ART compile with GCC -O0 again. Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on architecture. Add to instruction_set_test to warn when InstructionSetFeatures don't agree with ones from system properties, AT_HWCAP and /proc/cpuinfo. Clean-up class linker entry point logic to not return entry points but to test whether the passed code is the particular entrypoint. This works around image trampolines that replicate entrypoints. Bug: 17993736 (cherry picked from commit 6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3) Change-Id: I3e7595f437db4828072589d475a5453b7f31003e
|
6f3dbbadf4ce66982eb3d400e0a74cb73eb034f3 |
|
15-Oct-2014 |
Ian Rogers <irogers@google.com> |
Make ART compile with GCC -O0 again. Tidy up InstructionSetFeatures so that it has a type hierarchy dependent on architecture. Add to instruction_set_test to warn when InstructionSetFeatures don't agree with ones from system properties, AT_HWCAP and /proc/cpuinfo. Clean-up class linker entry point logic to not return entry points but to test whether the passed code is the particular entrypoint. This works around image trampolines that replicate entrypoints. Bug: 17993736 Change-Id: I5f4b49e88c3b02a79f9bee04f83395146ed7be23
|
fc787ecd91127b2c8458afd94e5148e2ae51a1f5 |
|
10-Oct-2014 |
Ian Rogers <irogers@google.com> |
Enable -Wimplicit-fallthrough. Falling through switch cases on a clang build must now annotate the fallthrough with the FALLTHROUGH_INTENDED macro. Bug: 17731372 Change-Id: I836451cd5f96b01d1ababdbf9eef677fe8fa8324
|
63999683329612292d534e6be09dbde9480f1250 |
|
15-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
Revert "Revert "Enable Load Store Elimination for ARM and ARM64"" This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: I3aadb12a787164146a95bc314e85fa73ad91e12b
|
c32447bcc8c36ee8ff265ed678c7df86936a9ebe |
|
27-Jul-2014 |
Bill Buzbee <buzbee@android.com> |
Revert "Enable Load Store Elimination for ARM and ARM64" On extended testing, I'm seeing a CHECK failure at utility_arm.cc:1201. This reverts commit fcc36ba2a2b8fd10e6eebd21ecb6329606443ded. Change-Id: Icae3d49cd7c8fcab09f2f989cbcb1d7e5c6d137a
|
fcc36ba2a2b8fd10e6eebd21ecb6329606443ded |
|
15-Jul-2014 |
Serban Constantinescu <serban.constantinescu@arm.com> |
Enable Load Store Elimination for ARM and ARM64 This patch refactors the implementation of the LoadStoreElimination optimisation pass. Please note that this pass was disabled and not functional for any of the backends. The current implementation tracks aliases and handles DalvikRegs as well as Heap memory regions. It has been tested and it is known to optimise out the following: * Load - Load * Store - Load * Store - Store * Load Literals Change-Id: Iefae9b696f87f833ef35c451ed4d49c5a1b6fde0
|
984305917bf57b3f8d92965e4715a0370cc5bcfb |
|
28-Jul-2014 |
Andreas Gampe <agampe@google.com> |
ART: Rework quick entrypoint code in Mir2Lir, cleanup To reduce the complexity of calling trampolines in generic code, introduce an enumeration for entrypoints. Introduce a header that lists the entrypoint enum and exposes a templatized method that translates an enum value to the corresponding thread offset value. Call helpers are rewritten to have an enum parameter instead of the thread offset. Also rewrite LoadHelper and GenConversionCall this way. It is now LoadHelper's duty to select the right thread offset size. Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to further simplify the call helpers, as well as make OpThreadMem specific to X86 only (removed from Mir2Lir). Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they are now specific to X86 only. Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend. Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend. Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented. Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
|
9ee4519afd97121f893f82d41d23164fc6c9ed34 |
|
17-Jul-2014 |
Serguei Katkov <serguei.i.katkov@intel.com> |
x86: GenSelect utility update The is follow-up https://android-review.googlesource.com/#/c/101396/ to make x86 GenSelectConst32 implementation complete. Change-Id: I69f318e18093f9a5b00f8f00f0f1c2e4ff7a9ab2 Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
48f5c47907654350ce30a8dfdda0e977f5d3d39f |
|
27-Jun-2014 |
Hans Boehm <hboehm@google.com> |
Replace memory barriers to better reflect Java needs. Replaces barriers that enforce ordering of one access type (e.g. Load) with respect to another (e.g. store) with more general ones that better reflect both Java requirements and actual hardware barrier/fence instructions. The old code was inconsistent and unclear about which barriers implied which others. Sometimes multiple barriers were generated and then eliminated; sometimes it was assumed that certain barriers implied others. The new barriers closely parallel those in C++11, though, for now, we use something closer to the old naming. Bug: 14685856 Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
|
b5860fb459f1ed71f39d8a87b45bee6727d79fe8 |
|
22-Jun-2014 |
buzbee <buzbee@google.com> |
Register promotion support for 64-bit targets Not sufficiently tested for 64-bit targets, but should be fairly close. A significant amount of refactoring could stil be done, (in later CLs). With this change we are not making any changes to the vmap scheme. As a result, it is a requirement that if a vreg is promoted to both a 32-bit view and the low half of a 64-bit view it must share the same physical register. We may change this restriction later on to allow for more flexibility for 32-bit Arm. For example, if v4, v5, v4/v5 and v5/v6 are all hot enough to promote, we'd end up with something like: v4 (as an int) -> r10 v4/v5 (as a long) -> r10 v5 (as an int) -> r11 v5/v6 (as a long) -> r11 Fix a couple of ARM64 bugs on the way... Change-Id: I6a152b9c164d9f1a053622266e165428045362f3
|
de68676b24f61a55adc0b22fe828f036a5925c41 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter" This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d. Breaks the build. Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
3c12c512faf6837844d5465b23b9410889e5eb11 |
|
24-Jun-2014 |
Andreas Gampe <agampe@google.com> |
Revert "Revert "ART: Split out more cases of Load/StoreRef, volatile as parameter"" This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41. Fixes an API comment, and differentiates between inserting and appending. Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d |
|
23-Jun-2014 |
Andreas Gampe <agampe@google.com> |
ART: Split out more cases of Load/StoreRef, volatile as parameter Splits out more cases of ref registers being loaded or stored. For code clarity, adds volatile as a flag parameter instead of a separate method. On ARM64, continue cleanup. Add flags to print/fatal on size mismatches. Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
8dea81ca9c0201ceaa88086b927a5838a06a3e69 |
|
06-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Rewrite use/def masks to support 128 bits. Reduce LIR memory usage by holding masks by pointers in the LIR rather than directly and using pre-defined const masks for the common cases, allocating very few on the arena. Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
37573977769e9068874506050c62acd4e324d246 |
|
16-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Clean up ARM load/store with offset imm8 << 2. Change-Id: I95ed6860131b99eef7ed727f54745976949cbcb3
|
db9d523ff305721d4ca3f1470d1b2ce64c736e0a |
|
10-Jun-2014 |
Vladimir Marko <vmarko@google.com> |
Clean up ArmMirToLir::LoadDispBody()/StoreDispBody(). Refactor the 64-bit load and store code to use a shared helper function that can be used for any opcode with the displacement limited to 8-bit value shifted by 2 (i.e. max 1020). Use that function also for 32-bit float load and store as it is actually better than the old code for offsets exceeding the 1020 byte limit. Change-Id: I7dec38bae8cd9891420d2e92b1bac6138af5d64e
|
082833c8d577db0b2bebc100602f31e4e971613e |
|
18-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler, out of registers fix It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This could result in "out of registers" failures, as well as other more subtle problems. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Cherry-pick of internal change 468162 Change-Id: Id2da97e99105a4c272c5fd256205a94b904ecea8
|
05d3aeb33683b16837741f9348d6fba9a8432068 |
|
18-May-2014 |
buzbee <buzbee@google.com> |
Quick compiler, out of registers fix Fixes b/15024623 It turns out that the register pool sanity checker was not working as expected, leaving some inconsistencies unreported. This CL fixes the sanity checker, adds a lot more check and cleans up the previously undetected episodes of insanity. Change-Id: I4d67db864ca5926a1975db251e7e631b65a86275
|
fe8cf8b1c1b4af0f8b4bb639576f7a5fc59f52ea |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. Similarly, for medium-large frames r12 could get clobbered during frame creation. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Cherry-pick of internal commit If5b30f729e31d86db604045dd7581fd4626e0b55 Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
|
56e86eaf73eb3efa029f2dd53b2d21e3597d8e5f |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Revert "Revert "Quick Compiler: fix Arm cts failures"" It turns out the medium-large frame explicit stack overflow check was also broken in a similar way: r12 was live going into the frame extension, but the frame extension code sometimes needs a free temp. This reverts commit 9cf44af1a223f905457688931317a4e4cb086a84. Change-Id: If5b30f729e31d86db604045dd7581fd4626e0b55
|
9cf44af1a223f905457688931317a4e4cb086a84 |
|
15-May-2014 |
Bill Buzbee <buzbee@google.com> |
Revert "Quick Compiler: fix Arm cts failures" Error detected on further testing. This reverts commit 06a4809f271c44ec1491e0b07ae9974aa35bc8ad. Change-Id: Ia7b6b463f6422abac432f1a9484e4e080d003148
|
06a4809f271c44ec1491e0b07ae9974aa35bc8ad |
|
14-May-2014 |
buzbee <buzbee@google.com> |
Quick Compiler: fix Arm cts failures Fixes move_wide_16#testN1, move_wide_16#testN2 Two bugs for the price of one (thanks CTS!) First, the new stack overflow checking code was broken for very large frames. For Arm on method entry, we only have 1 available temp register, r12, until argument registers are flushed. Previously, for explicit checks on large frames, r12 was immediately loaded with the stack_end value. However, later on when the frame is extended, if the frame size exceeds the range of a reg-reg-imm subtract, the codegen utilities will allocate a new temporary register to complete the operation. r12 was getting clobbered. What we should always do when directly using fixed registers like this is to lock them to prevent them from being allocated as a temp. The other half of the first bug is easily solved by delaying the load of stack_end until after the new sp is computed. We'll increase the stall cost, but this is an uncommon case. The second bug was likely a typo in LoadValueDisp(). I'm a bit surprised we hadn't hit this one earlier - but perhaps it was recently introduced. The wrong base register was being used in the non-float, wide, excessive offset case (which I suppose is also somewhat uncommon). Change-Id: I2c5074c9570b022af680f472deac9fe72a2e827e
|
2f244e9faccfcca68af3c5484c397a01a1c3a342 |
|
08-May-2014 |
Andreas Gampe <agampe@google.com> |
ART: Add more ThreadOffset in Mir2Lir and backends This duplicates all methods with ThreadOffset parameters, so that both ThreadOffset<4> and ThreadOffset<8> can be handled. Dynamic checks against the compilation unit's instruction set determine which pointer size to use and therefore which methods to call. Methods with unsupported pointer sizes should fatally fail, as this indicates an issue during method selection. Change-Id: Ifdb445b3732d3dc5e6a220db57374a55e91e1bf6
|
674744e635ddbdfb311fbd25b5a27356560d30c3 |
|
24-Apr-2014 |
Vladimir Marko <vmarko@google.com> |
Use atomic load/store for volatile IGET/IPUT/SGET/SPUT. Bug: 14112919 Change-Id: I79316f438dd3adea9b2653ffc968af83671ad282
|
3bf7c60a86d49bf8c05c5d2ac5ca8e9f80bd9824 |
|
07-May-2014 |
Vladimir Marko <vmarko@google.com> |
Cleanup ARM load/store wide and remove unused param s_reg. Use a single LDRD/VLDR instruction for wide load/store on ARM, adjust the base pointer if needed. Remove unused parameter s_reg from LoadBaseDisp(), LoadBaseIndexedDisp() and StoreBaseIndexedDisp() on all architectures. Change-Id: I25a9a42d523a68addbc11abe44ddc55a4401df98
|
455759b5702b9435b91d1b4dada22c4cce7cae3c |
|
06-May-2014 |
Vladimir Marko <vmarko@google.com> |
Remove LoadBaseDispWide and StoreBaseDispWide. Just pass k64 or kDouble to non-wide versions. Change-Id: I000619c3b78d3a71db42edc747c8a0ba1ee229be
|
091cc408e9dc87e60fb64c61e186bea568fc3d3a |
|
31-Mar-2014 |
buzbee <buzbee@google.com> |
Quick compiler: allocate doubles as doubles Significant refactoring of register handling to unify usage across all targets & 32/64 backends. Reworked RegStorage encoding to allow expanded use of x86 xmm registers; removed vector registers as a separate register type. Reworked RegisterInfo to describe aliased physical registers. Eliminated quite a bit of target-specific code and generalized common code. Use of RegStorage instead of int for registers now propagated down to the NewLIRx() level. In future CLs, the NewLIRx() routines will be replaced with versions that are explicit about what kind of operand they expect (RegStorage, displacement, etc.). The goal is to eventually use RegStorage all the way to the assembly phase. TBD: MIPS needs verification. TBD: Re-enable liveness tracking. Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
fd698e67953e40e804d7c9d1a3e8460e9d67382a |
|
28-Apr-2014 |
buzbee <buzbee@google.com> |
Quick compiler: fix DCHECKS The recent change to introduce k32, k64 and kReference operand sizes missed updating a few DCHECKS. Change-Id: I66eb617b07766e781b38962dc862fc5b023c2fbd
|
695d13a82d6dd801aaa57a22a9d4b3f6db0d0fdb |
|
19-Apr-2014 |
buzbee <buzbee@google.com> |
Update load/store utilities for 64-bit backends This CL replaces the typical use of LoadWord/StoreWord utilities (which, in practice, were 32-bit load/store) in favor of a new set that make the size explicit. We now have: LoadWordDisp/StoreWordDisp: 32 or 64 depending on target. Load or store the natural word size. Expect this to be used infrequently - generally when we know we're dealing with a native pointer or flushed register not holding a Dalvik value (Dalvik values will flush to home location sizes based on Dalvik, rather than the target). Load32Disp/Store32Disp: Load or store 32 bits, regardless of target. Load64Disp/Store64Disp: Load or store 64 bits, regardless of target. LoadRefDisp: Load a 32-bit compressed reference, and expand it to the natural word size in the target register. StoreRefDisp: Compress a reference held in a register of the natural word size and store it as a 32-bit compressed reference. Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
d6ed642458c8820e1beca72f3d7b5f0be4a4b64b |
|
10-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Revert "Use trampolines for calls to helpers""" This reverts commit f9487c039efb4112616d438593a2ab02792e0304. Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
f9487c039efb4112616d438593a2ab02792e0304 |
|
09-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Revert "Use trampolines for calls to helpers"" This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff. Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61 Conflicts: compiler/dex/quick/mir_to_lir.h
|
081f73e888b3c246cf7635db37b7f1105cf1a2ff |
|
07-Apr-2014 |
Dave Allison <dallison@google.com> |
Revert "Use trampolines for calls to helpers" This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6. Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
754ddad084ccb610d0cf486f6131bdc69bae5bc6 |
|
19-Feb-2014 |
Dave Allison <dallison@google.com> |
Use trampolines for calls to helpers This is an ARM specific optimization to the compiler that uses trampoline islands to make calls to runtime helper functions. The intention is to reduce the size of the generated code (by 2 bytes per call) without affecting performance. By default this is on when generating an OAT file. It is off when compiling to memory. To switch this off in dex2oat, use the command line option: --no-helper-trampolines Enhances disassembler to print the trampoline entry on the BL instruction like this: 0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend Bug: 12607709 Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
dd7624d2b9e599d57762d12031b10b89defc9807 |
|
15-Mar-2014 |
Ian Rogers <irogers@google.com> |
Allow mixing of thread offsets between 32 and 64bit architectures. Begin a more full implementation x86-64 REX prefixes. Doesn't implement 64bit thread offset support for the JNI compiler. Change-Id: If9af2f08a1833c21ddb4b4077f9b03add1a05147
|
f943914730db8ad2ff03d49a2cacd31885d08fd7 |
|
27-Mar-2014 |
Dave Allison <dallison@google.com> |
Implement implicit stack overflow checks This also fixes some failing run tests due to missing null pointer markers. The implementation of the implicit stack overflow checks introduces the ability to have a gap in the stack that is skipped during stack walk backs. This gap is protected against read/write and is used to trigger a SIGSEGV at function entry if the stack will overflow. Change-Id: I0c3e214c8b87dc250cf886472c6d327b5d58653e
|
e2143c0a4af68c08e811885eb2f3ea5bfdb21ab6 |
|
28-Mar-2014 |
Ian Rogers <irogers@google.com> |
Revert "Revert "Optimize easy multiply and easy div remainder."" This reverts commit 3654a6f50a948ead89627f398aaf86a2c2db0088. Remove the part of the change that confused !is_div with being multiply rather than implying remainder. Change-Id: I202610069c69351259a320e8852543cbed4c3b3e
|
3441512d61ac192c1bf0b9b1eb696d5a8a8d677e |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb. (cherry picked from commit 3654a6f50a948ead89627f398aaf86a2c2db0088) Change-Id: If8befd7c7135b9dfe3d3e9111768aba89aaa0863
|
3654a6f50a948ead89627f398aaf86a2c2db0088 |
|
28-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Revert "Optimize easy multiply and easy div remainder." This reverts commit 08df4b3da75366e5db37e696eaa7e855cba01deb.
|
08df4b3da75366e5db37e696eaa7e855cba01deb |
|
25-Mar-2014 |
Zheng Xu <zheng.xu@arm.com> |
Optimize easy multiply and easy div remainder. Update OpRegRegShift and OpRegRegRegShift to use RegStorage parameters. Add special cases for *0 and *1. Add more easy multiply special cases for Arm. Reuse easy multiply in SmallLiteralDivRem() to support remainder cases. Change-Id: Icd76a993d3ac8d4988e9653c19eab4efca14fad0
|
2700f7e1edbcd2518f4978e4cd0e05a4149f91b6 |
|
07-Mar-2014 |
buzbee <buzbee@google.com> |
Continuing register cleanup Ready for review. Continue the process of using RegStorage rather than ints to hold register value in the top layers of codegen. Given the huge number of changes in this CL, I've attempted to minimize the number of actual logic changes. With this CL, the use of ints for registers has largely been eliminated except in the lowest utility levels. "Wide" utility routines have been updated to take a single RegStorage rather than a pair of ints representing low and high registers. Upcoming CLs will be smaller and more targeted. My expectations: o Allocate float double registers as a single double rather than a pair of float single registers. o Refactor to push code which assumes long and double Dalvik values are held in a pair of register to the target dependent layer. o Clean-up of the xxx_mir.h files to reduce the amount of #defines for registers. May also do a register renumbering to bring all of our targets' register naming more consistent. Possibly introduce a target-independent float/non-float test at the RegStorage level. Change-Id: I646de7392bdec94595dd2c6f76e0f1c4331096ff
|
40bbb39b85c063cd6a9f4ab00ff70372370e08cf |
|
19-Mar-2014 |
buzbee <buzbee@google.com> |
Fix Quick compiler "out of registers" There are a few places in the Arm backend that expect to be able to survive on a single temp register - in particular entry code generation and argument passing. However, in the case of a very large frame and floating point ld/st, the existing code could end up using 2 temps. In short, if there is a displacement overflow we try to use indexed load/store instructions (slightly more efficient). However, there are none for floating point - so we ended up burning yet another register to construct a direct pointer. This CL detects this case and doesn't try to use the indexed load/store mechanism for floats. Fix for https://code.google.com/p/android/issues/detail?id=67349 Change-Id: I1ea596ea660e4add89fd4fddb8cbf99a54fbd343
|
60d7a65f7fb60f502160a2e479e86014c7787553 |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d (cherry-picked from c0f96d03a1855fda7d94332331b94860404874dd)
|
c0f96d03a1855fda7d94332331b94860404874dd |
|
14-Mar-2014 |
Brian Carlstrom <bdc@google.com> |
Fix stack overflow for mutual recursion. There was an error where we would have a pc that was in the method which generated the stack overflow. This didn't work however because the stack overflow check was before we stored the method in the stack. The result was that the stack overflow handler had a PC which wasnt necessarily in the method at the top of the stack. This is now fixed by always restoring the link register before branching to the throw entrypoint. Slight code size regression on ARM/Mips (unmeasured). Regression on ARM is 4 bytes of code per stack overflow check. Some of this regression is mitigated by having one less GC safepoint. Also adds test case for StackOverflowError issue (from bdc). Tests passing: ARM, X86, Mips Phone booting: ARM Bug: https://code.google.com/p/android/issues/detail?id=66411 Bug: 12967914 Change-Id: I96fe667799458b58d1f86671e051968f7be78d5d
|
0f6784737882199197796b67b99e5f1ded383bee |
|
11-Mar-2014 |
Ian Rogers <irogers@google.com> |
Unify 64bit int constant definitions. LL and ULL prefixes are word size dependent, use the INT64_C and UINT64_C macros instead. Change-Id: I5b70027651898814fc0b3e9e22a18a1047e76cb9
|
dbb8c49d540edd2a39076093163c7218f03aa502 |
|
28-Feb-2014 |
Vladimir Marko <vmarko@google.com> |
Remove non-existent ARM insn kThumb2SubsRRI12. For kOpSub/kOpAdd, prefer modified immediate encodings because they set flags. Change-Id: I41dcd2d43ba1e62120c99eaf9106edc61c41e157
|
2c498d1f28e62e81fbdb477ff93ca7454e7493d7 |
|
30-Jan-2014 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Specializing x86 range argument copying The ARM implementation of range argument copying was specialized in some cases. For all other architectures, it would fall back to generating memcpy. This patch updates the x86 implementation so it does not call memcpy and instead generates loads and stores, favoring movement of 128-bit chunks. Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
bd288c2c1206bc99fafebfb9120a83f13cf9723b |
|
21-Dec-2013 |
Razvan A Lupusoru <razvan.a.lupusoru@intel.com> |
Add conditional move support to x86 and allow GenMinMax to use it X86 supports conditional moves which is useful for reducing branchiness. This patch adds support to the x86 backend to generate conditional reg to reg operations. Both encoder and decoder support was added for cmov. The x86 version of GenMinMax used for generating inlined version Math.min/max has been updated to make use of the conditional move support. Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7 Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
58af1f9385742f70aca4fcb5e13aba53b8be2ef4 |
|
19-Dec-2013 |
Vladimir Marko <vmarko@google.com> |
Clean up usage of carry flag condition codes. On X86, kCondUlt and kCondUge are bound to CS and CC, respectively, while on ARM it's the other way around. The explicit binding in ConditionCode was wrong and misleading and could lead to subtle bugs. Therefore, we detach those constants and clean up usage. The CS and CC conditions are now effectively unused but we keep them around as they may eventually be useful. And some minor cleanup and comments. Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
|
2247984899247b1402408d39731ff64048f0e274 |
|
19-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Clean up kOpCmp on ARM. kThumb2CmnRI8M is now used. Change-Id: I300299258ed99d86c300dee45c904c360dd44638
|
332b7aa6220124dc638b9f7e59611c376473f128 |
|
18-Nov-2013 |
Vladimir Marko <vmarko@google.com> |
Improve Thumb2 instructions' use of constant operands. Rename instructions using modified immediate to use suffix I8M. Many were using I8 which may lead to confusion with Thumb I8 instructions and some were using other suffixes. Add and use CmnRI8M, increase constant range of AddRRI12 and SubRRI12 and use BicRRI8M for applicable kOpAnd constants. In particular, this should marginaly improve Math.abs(float) and Math.abs(double) by converting x & 0x7fffffff to BIC. Bug: 11579369 Change-Id: I0f17a9eb80752d2625730a60555152cdffed50ba
|
7020278bce98a0735dc6abcbd33bdf1ed2634f1d |
|
23-Oct-2013 |
Dave Allison <dallison@google.com> |
Support hardware divide instruction Bug: 11299025 Uses sdiv for division and a combo of sdiv, mul and sub for modulus. Only does this on processors that are capable of the sdiv instruction, as determined by the build system. Also provides a command line arg --instruction-set-features= to allow cross compilation. Makefile adds the --instruction-set-features= arg to build-time dex2oat runs and defaults it to something obtained from the target architecture. Provides a GetInstructionSetFeatures() function on CompilerDriver that can be queried for various features. The only feature supported right now is hasDivideInstruction(). Also adds a few more instructions to the ARM disassembler b/11535253 is an addition to this CL to be done later. Change-Id: Ia8aaf801fd94bc71e476902749cf20f74eba9f68
|
a8b4caf7526b6b66a8ae0826bd52c39c66e3c714 |
|
24-Oct-2013 |
Vladimir Marko <vmarko@google.com> |
Add byte swap instructions for ARM and x86. Change-Id: I03fdd61ffc811ae521141f532b3e04dda566c77d
|
0d82948094d9a198e01aa95f64012bdedd5b6fc9 |
|
12-Oct-2013 |
buzbee <buzbee@google.com> |
64-bit prep Preparation for 64-bit roll. o Eliminated storing pointers in 32-bit int slots in LIR. o General size reductions of common structures to reduce impact of doubled pointer sizes: - BasicBlock struct was 72 bytes, now is 48. - MIR struct was 72 bytes, now is 64. - RegLocation was 12 bytes, now is 8. o Generally replaced uses of BasicBlock* pointers with 16-bit Ids. o Replaced several doubly-linked lists with singly-linked to save one stored pointer per node. o We had quite a few uses of uintptr_t's that were a holdover from the JIT (which used pointers to mapped dex & actual code cache addresses rather than trace-relative offsets). Replaced those with uint32_t's. o Clean up handling of embedded data for switch tables and array data. o Miscellaneous cleanup. I anticipate one or two additional CLs to reduce the size of MIR and LIR structs. Change-Id: I58e426d3f8e5efe64c1146b2823453da99451230
|
409fe94ad529d9334587be80b9f6a3d166805508 |
|
11-Oct-2013 |
buzbee <buzbee@google.com> |
Quick assembler fix This CL re-instates the select pattern optimization disabled by CL 374310, and fixes the underlying problem: improper handling of the kPseudoBarrier LIR opcode. The bug was introduced in the recent assembler restructuring. In short, LIR pseudo opcodes (which have values < 0), should always have size 0 - and thus cause no bits to be emitted during assembly. In this case, bad logic caused us to set the size of a kPseudoBarrier opcode via lookup through the EncodingMap. Because all pseudo ops are < 0, this meant we did an array underflow load, picking up whatever garbage was located before the EncodingMap. This explains why this error showed up recently - we'd previuosly just gotten a lucky layout. This CL corrects the faulty logic, and adds DCHECKs to uses of the EncodingMap to ensure that we don't try to access w/ a pseudo op. Additionally, the existing is_pseudo_op() macro is replaced with IsPseudoLirOp(), named similar to the existing IsPseudoMirOp(). Change-Id: I46761a0275a923d85b545664cadf052e1ab120dc
|
b48819db07f9a0992a72173380c24249d7fc648a |
|
15-Sep-2013 |
buzbee <buzbee@google.com> |
Compile-time tuning: assembly phase Not as much compile-time gain from reworking the assembly phase as I'd hoped, but still worthwhile. Should see ~2% improvement thanks to the assembly rework. On the other hand, expect some huge gains for some application thanks to better detection of large machine-generated init methods. Thinkfree shows a 25% improvement. The major assembly change was to establish thread the LIR nodes that require fixup into a fixup chain. Only those are processed during the final assembly pass(es). This doesn't help for methods which only require a single pass to assemble, but does speed up the larger methods which required multiple assembly passes. Also replaced the block_map_ basic block lookup table (which contained space for a BasicBlock* for each dex instruction unit) with a block id map - cutting its space requirements by half in a 32-bit pointer environment. Changes: o Reduce size of LIR struct by 12.5% (one of the big memory users) o Repurpose the use/def portion of the LIR after optimization complete. o Encode instruction bits to LIR o Thread LIR nodes requiring pc fixup o Change follow-on assembly passes to only consider fixup LIRs o Switch on pc-rel fixup kind o Fast-path for small methods - single pass assembly o Avoid using cb[n]z for null checks (almost always exceed displacement) o Improve detection of large initialization methods. o Rework def/use flag setup. o Remove a sequential search from FindBlock using lookup table of 16-bit block ids rather than full block pointers. o Eliminate pcRelFixup and use fixup kind instead. o Add check for 16-bit overflow on dex offset. Change-Id: I4c6615f83fed46f84629ad6cfe4237205a9562b4
|
468532ea115657709bc32ee498e701a4c71762d4 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e (cherry picked from commit 848871b4d8481229c32e0d048a9856e5a9a17ef9)
|
848871b4d8481229c32e0d048a9856e5a9a17ef9 |
|
05-Aug-2013 |
Ian Rogers <irogers@google.com> |
Entry point clean up. Create set of entry points needed for image methods to avoid fix-up at load time: - interpreter - bridge to interpreter, bridge to compiled code - jni - dlsym lookup - quick - resolution and bridge to interpreter - portable - resolution and bridge to interpreter Fix JNI work around to use JNI work around argument rewriting code that'd been accidentally disabled. Remove abstact method error stub, use interpreter bridge instead. Consolidate trampoline (previously stub) generation in generic helper. Simplify trampolines to jump directly into assembly code, keeps stack crawlable. Dex: replace use of int with ThreadOffset for values that are thread offsets. Tidy entry point routines between interpreter, jni, quick and portable. Change-Id: I52a7c2bbb1b7e0ff8a3c3100b774212309d0828e
|
7934ac288acfb2552bb0b06ec1f61e5820d924a4 |
|
26-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/comments issues Change-Id: Iae286862c85fb8fd8901eae1204cd6d271d69496
|
6f485c62b9cfce3ab71020c646ab9f48d9d29d6d |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/indent issues Change-Id: I7c1647f0c39e1e065ca5820f9b79998691ba40b1
|
9b7085a4e7c40e7fa01932ea1647a4a33ac1c585 |
|
19-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint readability/braces issues Change-Id: I56b88956510077b0e13aad4caee8898313fab55b
|
38f85e4892f6504971bde994fec81fd61780ac30 |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/operators issues Change-Id: I730bd87b476bfa36e93b42e816ef358006b69ba5
|
df62950e7a32031b82360c407d46a37b94188fbb |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/parens issues Change-Id: Ifc678d59a8bed24ffddde5a0e543620b17b0aba9
|
2ce745c06271d5223d57dbf08117b20d5b60694a |
|
18-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Fix cpplint whitespace/braces issues Change-Id: Ide80939faf8e8690d8842dde8133902ac725ed1a
|
7940e44f4517de5e2634a7e07d58d0fb26160513 |
|
12-Jul-2013 |
Brian Carlstrom <bdc@google.com> |
Create separate Android.mk for main build targets The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|