History log of /art/runtime/arch/quick_alloc_entrypoints.S
Revision Date Author Comments
854df416f12c48b52239fe163ab8a7fcac4cddd3 27-Jun-2017 Goran Jakovljevic <Goran.Jakovljevic@imgtec.com> MIPS: TLAB allocation entrypoints

Add fast paths for TLAB allocation entrypoints for MIPS32 and MIPS64.
Also improve rosalloc entrypoints.

Note: All tests are executed on CI20 (MIPS32R2) and in QEMU (MIPS32R6
and MIPS64R6), with and without ART_TEST_DEBUG_GC=true.

Test: ./testrunner.py --optimizing --target
Test: mma test-art-target-gtest
Test: mma test-art-host-gtest

Change-Id: I92195d2d318b26a19afc5ac46a1844b13b2d5191
97c46466aea25ab63a99b3d1afc558f0d9f55abb 11-May-2017 Roland Levillain <rpl@google.com> Introduce a Marking Register in ARM64 code generation.

When generating code for ARM64, maintain the status of
Thread::Current()->GetIsGcMarking() in register X20,
dubbed MR (Marking Register), and check the value of that
register (instead of loading and checking a read barrier
marking entrypoint) in read barriers.

Test: m test-art-target
Test: m test-art-target with tree built with ART_USE_READ_BARRIER=false
Test: ARM64 device boot test
Bug: 37707231
Change-Id: Ibe9bc5c99a2176b0a0476e9e9ad7fcc9f745017b
d09584456559f669f5999fb1ff32aa89ebf6ef4e 30-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Align allocation entrypoints implementation between arm/arm64/x86/x64.

x64:
- Add art_quick_alloc_initialized_rosalloc

x86:
- Add art_quick_alloc_initialized_rosalloc
- Add art_quick_alloc_initialized{_region}_tlab
- Add art_quick_alloc_array_resolved{8,16,32,64}{_region}_tlab

arm32:
- Add art_quick_alloc_initialized_rosalloc
- Add art_quick_alloc_initialized{_region}_tlab
- Add art_quick_alloc_array_resolved{8,16,32,64}{_region}_tlab

arm64:
- Add art_quick_alloc_initialized_rosalloc
- Add art_quick_alloc_initialized{_region}_tlab
- Add art_quick_alloc_array_resolved{8,16,32,64}_tlab

Test: test-art-target test-art-host
bug: 30933338

Change-Id: I0dd8667a2921dd0b3403bea5d05304ba5d40627f
b048cb74b742b03eb6dd5f1d6dd49e559f730b36 23-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Add per array size allocation entrypoints.

- Update architectures that have fast paths for
array allocation to use it.
- Will add more fast paths in follow-up CLs.

Test: test-art-target test-art-host.
Change-Id: I138cccd16464a85de22a8ed31c915f876e78fb04
8d91ac31ccb92557e434d89ffade3372466e1af5 18-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Remove unused array entrypoints.

Test: test-art-host test-art-target
Change-Id: I910d1c912c7c9056ecea0e1e7da7afb2a7220dfa
e761bccf9f0d884cc4d4ec104568cef968296492 19-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Load the array class in the compiler for allocations.""

This reverts commit fee255039e30c1c3dfc70c426c3d176221c3cdf9.

Change-Id: I02b45f9a659d872feeb35df40b42c1be9878413a
fee255039e30c1c3dfc70c426c3d176221c3cdf9 19-Jan-2017 Hiroshi Yamauchi <yamauchi@google.com> Revert "Load the array class in the compiler for allocations."

libcore test fails.

This reverts commit cc99df230feb46ba717252f002d0cc2da6828421.

Change-Id: I5bac595acd2b240886062e8c1f11f9095ff6a9ed
cc99df230feb46ba717252f002d0cc2da6828421 18-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Load the array class in the compiler for allocations.

Removing one other dependency for needing to pass
the current method, and having dex_cache_resolved_types_
in ArtMethod.

oat file increase:
- x64: 0.25%
- arm32: 0.30%
- x86: 0.28%

test: test-art-host, test-art-target
Change-Id: Ibca4fa00d3e31954db2ccb1f65a584b8c67cb230
39cee66a8ddf0254626c9591662cf87e4a1cedc4 13-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Entrypoints cleanup.

Remove unused ones to facilitate the transition to compressed
dex caches.

test: test-art-host, test-art-target
Change-Id: I1d1cb0daffa86dd9dda2eaa3c1ea3650a5c8d9d0
0d3998b5ff619364acf47bec0b541e7a49bd6fe7 12-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Revert "Revert "Make object allocation entrypoints only take a class.""

This reverts commit f7aaacd97881c6924b8212c7f8fe4a4c8721ef53.

Change-Id: I6756cd1e6110bb45231f62f5e388f16c044cb145
f7aaacd97881c6924b8212c7f8fe4a4c8721ef53 12-Jan-2017 Hiroshi Yamauchi <yamauchi@google.com> Revert "Make object allocation entrypoints only take a class."

960-default-smali64 is failing.

This reverts commit 2b615ba29c4dfcf54aaf44955f2eac60f5080b2e.

Change-Id: Iebb8ee5a917fa84c5f01660ce432798524d078ef
2b615ba29c4dfcf54aaf44955f2eac60f5080b2e 06-Jan-2017 Nicolas Geoffray <ngeoffray@google.com> Make object allocation entrypoints only take a class.

Change motivated by:
- Dex cache compression: having the allocation fast path do a
dex cache lookup will be too expensive. So instead, rely on the
compiler having direct access to the class (either through BSS for
AOT, or JIT tables for JIT).
- Inlining: the entrypoints relied on the caller of the allocation to
have the same dex cache as the outer method (stored at the bottom of
the stack). This meant we could not inline methods from a different
dex file that do allocations. By avoiding the dex cache lookup in
the entrypoint, we can now remove this restriction.

Code expansion on average for Docs/Gms/FB/Framework (go/lem numbers):
- Around 0.8% on arm64
- Around 1% for x64, arm
- Around 1.5% on x86

Test: test-art-host, test-art-target, ART_USE_READ_BARRIER=true/false
Test: test-art-host, test-art-target, ART_DEFAULT_GC_TYPE=SS ART_USE_TLAB=true

Change-Id: I41f3748bb4d251996aaf6a90fae4c50176f9295f
5ace201d84adb7753680bf4c7877b3b71558da82 30-Nov-2016 Mathieu Chartier <mathieuc@google.com> Revert "Revert CC related changes."

Disable entrypoint switching in ResetQuickAllocEntryPointsForThread
instead of callers. Fixes bug where instrumentation would switch
to non CC entrypoints for non X86_64 architectures causing aborts.

Bug: 31018974

Test: test-art-host
Test: test/run-test 099

This reverts commit 96172e0172c5fca6e9a5ad4b857a24d8c7b064e5.

Change-Id: If206694ae35ff4446c6a8a97bfbcbf2dac35e3f9
96172e0172c5fca6e9a5ad4b857a24d8c7b064e5 30-Nov-2016 Nicolas Geoffray <ngeoffray@google.com> Revert CC related changes.

Revert: "X86_64: Add allocation entrypoint switching for CC is_marking"
Revert: "Fix mips build in InitEntryPoints"
Revert: "Fix mac build in ResetQuickAllocEntryPoints"

Test: test-art-target-run-test
Change-Id: If38d44edf8c5def5c4d8c9419e4af0cd8d3be724
f5de23265360e15fcfceb7d07bdadca0e5bb5f0a 16-Nov-2016 Mathieu Chartier <mathieuc@google.com> X86_64: Add allocation entrypoint switching for CC is_marking

Only X86_64 done so far. Use normal TLAB allocators if GC is not
marking.

Allocation speed goes up by ~8% based on perf sampling.

Without change:
1.19%: art_quick_alloc_object_region_tlab

With change:
0.63%: art_quick_alloc_object_tlab
0.47%: art_quick_alloc_object_region_tlab

Bug: 31018974
Bug: 12687968

Test: test-art-host-run-test

Change-Id: I4c4d9eb229d4ad2f41b856ba5c2958a5eb3b7ffa
8261d02f9523b95013108f271b82bb157ef6f71d 08-Aug-2016 Mathieu Chartier <mathieuc@google.com> Revert "Revert "ARM64 asm for region space array allocation""

Also added missing large object check. No regression from the check
N6P CC EAAC time at 1313 for 10 samples vs 1314 before reverts.

Bug: 30162165
Bug: 12687968

Test: test-art-target with CC + heap poisoning

This reverts commit 6ae7f3a4541e70f04243a6fe469aa3bd51e16d79.

Change-Id: Ie28f652f619898d7d37eeebf3f31a88af8fac949
6ae7f3a4541e70f04243a6fe469aa3bd51e16d79 08-Aug-2016 Roland Levillain <rpl@google.com> Revert "ARM64 asm for region space array allocation"

This change breaks many tests on the ARM64 concurrent
collector configuration.

Bug: 30162165
Bug: 12687968

This reverts commit f686c3feabe3519bedd1f3001e5dd598f46946ef.

Change-Id: I5d7ef5fa2ffb6a8d9a4d3adbcc14854efa257313
f686c3feabe3519bedd1f3001e5dd598f46946ef 04-Aug-2016 Mathieu Chartier <mathieuc@google.com> ARM64 asm for region space array allocation

Wrote region space tlab array and array resolved allocators in
assembly code. The speedup is a combined increase from checking the
mark bit and having an assembly fast path.

Added resolved, initialized entrypoints for object region TLAB
allocator.

N6P (960000 mhz) EEAC benchmark (average of 50 samples):
CC 1442.309524 -> 1314 (10% improvement)
CMS: 1382.32

Read barrier slow paths reaching C++ code go from 5M to 2.5M.

Bug: 30162165
Bug: 12687968

Test: With CC: N6P boot, run EAAC, test-art-target

Change-Id: I51515b11ef3f795f57eb72fe0f5759618fef5084
10d4c08c0ea9df0a85a11e1c77974df24078c0ec 24-Feb-2016 Hiroshi Yamauchi <yamauchi@google.com> Assembly region TLAB allocation fast path for arm.

This is for the CC collector.

Share the common fast path code with the tlab fast path code.

Speedup (on N5):
BinaryTrees: 2291 -> 902 ms (-60%)
MemAllocTest: 2137 -> 1845 ms (-14%)

Bug: 9986565
Bug: 12687968

Change-Id: Ica63094ec2f85eaa4fd04d202a20090399275d85
dc412b6f49a65774b7af654f65cbff619cb7d85a 15-Oct-2015 Hiroshi Yamauchi <yamauchi@google.com> Revert "Revert "Implement rosalloc fast path in assembly for 32 bit arm.""

With a heap poisoning fix.

This reverts commit cf91c7d973f3b2f491abc61d47c141782c96d46e.

Bug: 9986565
Change-Id: Ia72edbde65ef6119e1931a77cc4c595a0b80ce31
cf91c7d973f3b2f491abc61d47c141782c96d46e 15-Oct-2015 Nicolas Geoffray <ngeoffray@google.com> Revert "Implement rosalloc fast path in assembly for 32 bit arm."

Tentative. Looks like heap poisoning breaks with this change.

bug: 9986565

This reverts commit e6316940db61faead36f9642cce137d41fc8f606.

Change-Id: I5c63758221464fe319315f40ae79c656048faed0
e6316940db61faead36f9642cce137d41fc8f606 08-Oct-2015 Hiroshi Yamauchi <yamauchi@google.com> Implement rosalloc fast path in assembly for 32 bit arm.

Measurements (N5, ms)
BinaryTrees: 1702 -> 987 (-42%)
MemAllocTest: 2480 -> 2270 (-8%)

Bug: 9986565

Change-Id: I460af3626ad724078463d27cf74a94b7ff7468c5
4adeab196d160f70b4865fb8be048ddd2ac7ab82 03-Oct-2015 Hiroshi Yamauchi <yamauchi@google.com> Refactor the alloc entry point generation code.

Move the x86/x86-64 specific alloc entrypoint generation code to a macro
GENERATE_ALLOC_ENTRYPOINTS_FOR_EACH_ALLOCATOR in a common file to remove
duplication.

This will make it easier to selectively add more hand-written assembly
allocation fast path code.

Rename RETURN_IF_RESULT_IS_NON_ZERO to
RETURN_IF_RESULT_IS_NON_ZERO_OR_DELIVER in the x86/x86_64 files to match
the other architectures.

Bug: 9986565
Change-Id: I56f33b790f94db68891db8a2f42e9231d1770eef
69bdcb29fdbd8266374e3793cb4e28dcc5daf0f9 28-Apr-2015 Jeff Hao <jeffhao@google.com> Fix java_lang_Class newInstance for strings; also quick entrypoints.

Change-Id: I35fd23c5a9051e1ffda0ecc2cbafb5d318c7b5e6
848f70a3d73833fc1bf3032a9ff6812e429661d9 15-Jan-2014 Jeff Hao <jeffhao@google.com> Replace String CharArray with internal uint16_t array.

Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter

Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
2cd334ae2d4287216523882f0d298cf3901b7ab1 09-Jan-2015 Hiroshi Yamauchi <yamauchi@google.com> More of the concurrent copying collector.

Bug: 12687968
Change-Id: I62f70274d47df6d6cab714df95c518b750ce3105
1cc7dbabd03e0a6c09d68161417a21bd6f9df371 18-Dec-2014 Andreas Gampe <agampe@google.com> ART: Reorder entrypoint argument order

Shuffle the ArtMethod* referrer backwards for easier removal.

Clean up ARM & MIPS assembly code.

Change some macros to make future changes easier.

Change-Id: Ie2862b68bd6e519438e83eecd9e1611df51d7945
bb8f0ab736b61db8f543e433859272e83f96ee9b 28-Jan-2014 Hiroshi Yamauchi <yamauchi@google.com> Embed array class pointers at array allocation sites.

Following https://android-review.googlesource.com/#/c/79302, embed
array class pointers at array allocation sites in the compiled code.

Change-Id: I67a1292466dfbb7f48e746e5060e992dd93525c5
be1ca55db3362f5b100c4c65da5342fd299520bb 15-Jan-2014 Hiroshi Yamauchi <yamauchi@google.com> Use direct class pointers at allocation sites in the compiled code.

- Rather than looking up a class from its type ID (and checking if
it's resolved/initialized, resolving/initializing if not), use
direct class pointers, if possible (boot-code-to-boot-class pointers
and app-code-to-boot-class pointers.)
- This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4.
- Embedding the object size (along with class pointers) caused a 1-2%
slowdown in MemAllocTest and isn't implemented in this change.
- TODO: do the same for array allocations.
- TODO: when/if an application gets its own image, implement
app-code-to-app-class pointers.
- Fix a -XX:gc bug.
cf. https://android-review.googlesource.com/79460/
- Add /tmp/android-data/dalvik-cache to the list of locations to
remove oat files in clean-oat-host.
cf. https://android-review.googlesource.com/79550
- Add back a dropped UNLIKELY in FindMethodFromCode().
cf. https://android-review.googlesource.com/74205

Bug: 9986565
Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
e6da9af8dfe0a3e3fbc2be700554f6478380e7b9 16-Dec-2013 Mathieu Chartier <mathieuc@google.com> Background compaction support.

When the process state changes to a state which does not perceives
jank, we copy from the main free-list backed allocation space to
the bump pointer space and enable the semispace allocator.

When we transition back to foreground, we copy back to a free-list
backed space.

Create a seperate non-moving space which only holds non-movable
objects. This enables us to quickly wipe the current alloc space
(DlMalloc / RosAlloc) when we transition to background.

Added multiple alloc space support to the sticky mark sweep GC.

Added a -XX:BackgroundGC option which lets you specify
which GC to use for background apps. Passing in
-XX:BackgroundGC=SS makes the heap compact the heap for apps which
do not perceive jank.

Results:
Simple background foreground test:
0. Reboot phone, unlock.
1. Open browser, click on home.
2. Open calculator, click on home.
3. Open calendar, click on home.
4. Open camera, click on home.
5. Open clock, click on home.
6. adb shell dumpsys meminfo

PSS Normal ART:
Sample 1:
88468 kB: Dalvik
3188 kB: Dalvik Other
Sample 2:
81125 kB: Dalvik
3080 kB: Dalvik Other

PSS Dalvik:
Total PSS by category:
Sample 1:
81033 kB: Dalvik
27787 kB: Dalvik Other
Sample 2:
81901 kB: Dalvik
28869 kB: Dalvik Other

PSS ART + Background Compaction:
Sample 1:
71014 kB: Dalvik
1412 kB: Dalvik Other
Sample 2:
73859 kB: Dalvik
1400 kB: Dalvik Other

Dalvik other reduction can be explained by less deep allocation
stacks / less live bitmaps / less dirty cards.

TODO improvements: Recycle mem-maps which are unused in the current
state. Not hardcode 64 MB capacity of non movable space (avoid
returning linear alloc nightmares). Figure out ways to deal with low
virtual address memory problems.

Bug: 8981901

Change-Id: Ib235d03f45548ffc08a06b8ae57bf5bada49d6f3
692fafd9778141fa6ef0048c9569abd7ee0253bf 30-Nov-2013 Mathieu Chartier <mathieuc@google.com> Thread local bump pointer allocator.

Added a thread local allocator to the heap, each thread has three
pointers which specify the thread local buffer: start, cur, and
end. When the remaining space in the thread local buffer isn't large
enough for the allocation, the allocator allocates a new thread
local buffer using the bump pointer allocator.

The bump pointer space had to be modified to accomodate thread
local buffers. These buffers are called "blocks", where a block
is a buffer which contains a set of adjacent objects. Blocks
aren't necessarily full and may have wasted memory towards the
end. Blocks have an 8 byte header which specifies their size and is
required for traversing bump pointer spaces.

Memory usage is in between full bump pointer and ROSAlloc since
madvised memory limits wasted ram to an average of 1/2 page per
block.

Added a runtime option -XX:UseTLAB which specifies whether or
not to use the thread local allocator. Its a NOP if the garbage
collector is not the semispace collector.

TODO: Smarter block accounting to prevent us reading objects until
we either hit the end of the block or GetClass() == null which
signifies that the block isn't 100% full. This would provide a
slight speedup to BumpPointerSpace::Walk.

Timings: -XX:HeapMinFree=4m -XX:HeapMaxFree=8m -Xmx48m
ritzperf memalloc:
Dalvik -Xgc:concurrent: 11678
Dalvik -Xgc:noconcurrent: 6697
-Xgc:MS: 5978
-Xgc:SS: 4271
-Xgc:CMS: 4150
-Xgc:SS -XX:UseTLAB: 3255

Bug: 9986565
Bug: 12042213

Change-Id: Ib7e1d4b199a8199f3b1de94b0a7b6e1730689cad
7410f29b4dae223befac036ea567d7f33351dad1 24-Nov-2013 Mathieu Chartier <mathieuc@google.com> Fix dumpsys meminfo <pid>.

Added a case for BumpPointerSpaces. Confirmed working non-debug.
Should also work in debug builds.

Bug: 11830794
Change-Id: I12053ff16eec403dcd4a780e13095e3212a77132
cbb2d20bea2861f244da2e2318d8c088300a3710 15-Nov-2013 Mathieu Chartier <mathieuc@google.com> Refactor allocation entrypoints.

Adds support for switching entrypoints during runtime. Enables
addition of new allocators with out requiring significant copy
paste. Slight speedup on ritzperf probably due to more inlining.

TODO: Ensuring that the entire allocation path is inlined so
that the switch statement in the allocation code is optimized
out.

Rosalloc measurements:
4583
4453
4439
4434
4751

After change:
4184
4287
4131
4335
4097

Change-Id: I1352a3cbcdf6dae93921582726324d91312df5c9