854df416f12c48b52239fe163ab8a7fcac4cddd3 |
|
27-Jun-2017 |
Goran Jakovljevic <Goran.Jakovljevic@imgtec.com> |
MIPS: TLAB allocation entrypoints Add fast paths for TLAB allocation entrypoints for MIPS32 and MIPS64. Also improve rosalloc entrypoints. Note: All tests are executed on CI20 (MIPS32R2) and in QEMU (MIPS32R6 and MIPS64R6), with and without ART_TEST_DEBUG_GC=true. Test: ./testrunner.py --optimizing --target Test: mma test-art-target-gtest Test: mma test-art-host-gtest Change-Id: I92195d2d318b26a19afc5ac46a1844b13b2d5191
|
97c46466aea25ab63a99b3d1afc558f0d9f55abb |
|
11-May-2017 |
Roland Levillain <rpl@google.com> |
Introduce a Marking Register in ARM64 code generation. When generating code for ARM64, maintain the status of Thread::Current()->GetIsGcMarking() in register X20, dubbed MR (Marking Register), and check the value of that register (instead of loading and checking a read barrier marking entrypoint) in read barriers. Test: m test-art-target Test: m test-art-target with tree built with ART_USE_READ_BARRIER=false Test: ARM64 device boot test Bug: 37707231 Change-Id: Ibe9bc5c99a2176b0a0476e9e9ad7fcc9f745017b
|
d09584456559f669f5999fb1ff32aa89ebf6ef4e |
|
30-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Align allocation entrypoints implementation between arm/arm64/x86/x64. x64: - Add art_quick_alloc_initialized_rosalloc x86: - Add art_quick_alloc_initialized_rosalloc - Add art_quick_alloc_initialized{_region}_tlab - Add art_quick_alloc_array_resolved{8,16,32,64}{_region}_tlab arm32: - Add art_quick_alloc_initialized_rosalloc - Add art_quick_alloc_initialized{_region}_tlab - Add art_quick_alloc_array_resolved{8,16,32,64}{_region}_tlab arm64: - Add art_quick_alloc_initialized_rosalloc - Add art_quick_alloc_initialized{_region}_tlab - Add art_quick_alloc_array_resolved{8,16,32,64}_tlab Test: test-art-target test-art-host bug: 30933338 Change-Id: I0dd8667a2921dd0b3403bea5d05304ba5d40627f
|
b048cb74b742b03eb6dd5f1d6dd49e559f730b36 |
|
23-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Add per array size allocation entrypoints. - Update architectures that have fast paths for array allocation to use it. - Will add more fast paths in follow-up CLs. Test: test-art-target test-art-host. Change-Id: I138cccd16464a85de22a8ed31c915f876e78fb04
|
8d91ac31ccb92557e434d89ffade3372466e1af5 |
|
18-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Remove unused array entrypoints. Test: test-art-host test-art-target Change-Id: I910d1c912c7c9056ecea0e1e7da7afb2a7220dfa
|
e761bccf9f0d884cc4d4ec104568cef968296492 |
|
19-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Load the array class in the compiler for allocations."" This reverts commit fee255039e30c1c3dfc70c426c3d176221c3cdf9. Change-Id: I02b45f9a659d872feeb35df40b42c1be9878413a
|
fee255039e30c1c3dfc70c426c3d176221c3cdf9 |
|
19-Jan-2017 |
Hiroshi Yamauchi <yamauchi@google.com> |
Revert "Load the array class in the compiler for allocations." libcore test fails. This reverts commit cc99df230feb46ba717252f002d0cc2da6828421. Change-Id: I5bac595acd2b240886062e8c1f11f9095ff6a9ed
|
cc99df230feb46ba717252f002d0cc2da6828421 |
|
18-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Load the array class in the compiler for allocations. Removing one other dependency for needing to pass the current method, and having dex_cache_resolved_types_ in ArtMethod. oat file increase: - x64: 0.25% - arm32: 0.30% - x86: 0.28% test: test-art-host, test-art-target Change-Id: Ibca4fa00d3e31954db2ccb1f65a584b8c67cb230
|
39cee66a8ddf0254626c9591662cf87e4a1cedc4 |
|
13-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Entrypoints cleanup. Remove unused ones to facilitate the transition to compressed dex caches. test: test-art-host, test-art-target Change-Id: I1d1cb0daffa86dd9dda2eaa3c1ea3650a5c8d9d0
|
0d3998b5ff619364acf47bec0b541e7a49bd6fe7 |
|
12-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Revert "Make object allocation entrypoints only take a class."" This reverts commit f7aaacd97881c6924b8212c7f8fe4a4c8721ef53. Change-Id: I6756cd1e6110bb45231f62f5e388f16c044cb145
|
f7aaacd97881c6924b8212c7f8fe4a4c8721ef53 |
|
12-Jan-2017 |
Hiroshi Yamauchi <yamauchi@google.com> |
Revert "Make object allocation entrypoints only take a class." 960-default-smali64 is failing. This reverts commit 2b615ba29c4dfcf54aaf44955f2eac60f5080b2e. Change-Id: Iebb8ee5a917fa84c5f01660ce432798524d078ef
|
2b615ba29c4dfcf54aaf44955f2eac60f5080b2e |
|
06-Jan-2017 |
Nicolas Geoffray <ngeoffray@google.com> |
Make object allocation entrypoints only take a class. Change motivated by: - Dex cache compression: having the allocation fast path do a dex cache lookup will be too expensive. So instead, rely on the compiler having direct access to the class (either through BSS for AOT, or JIT tables for JIT). - Inlining: the entrypoints relied on the caller of the allocation to have the same dex cache as the outer method (stored at the bottom of the stack). This meant we could not inline methods from a different dex file that do allocations. By avoiding the dex cache lookup in the entrypoint, we can now remove this restriction. Code expansion on average for Docs/Gms/FB/Framework (go/lem numbers): - Around 0.8% on arm64 - Around 1% for x64, arm - Around 1.5% on x86 Test: test-art-host, test-art-target, ART_USE_READ_BARRIER=true/false Test: test-art-host, test-art-target, ART_DEFAULT_GC_TYPE=SS ART_USE_TLAB=true Change-Id: I41f3748bb4d251996aaf6a90fae4c50176f9295f
|
5ace201d84adb7753680bf4c7877b3b71558da82 |
|
30-Nov-2016 |
Mathieu Chartier <mathieuc@google.com> |
Revert "Revert CC related changes." Disable entrypoint switching in ResetQuickAllocEntryPointsForThread instead of callers. Fixes bug where instrumentation would switch to non CC entrypoints for non X86_64 architectures causing aborts. Bug: 31018974 Test: test-art-host Test: test/run-test 099 This reverts commit 96172e0172c5fca6e9a5ad4b857a24d8c7b064e5. Change-Id: If206694ae35ff4446c6a8a97bfbcbf2dac35e3f9
|
96172e0172c5fca6e9a5ad4b857a24d8c7b064e5 |
|
30-Nov-2016 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert CC related changes. Revert: "X86_64: Add allocation entrypoint switching for CC is_marking" Revert: "Fix mips build in InitEntryPoints" Revert: "Fix mac build in ResetQuickAllocEntryPoints" Test: test-art-target-run-test Change-Id: If38d44edf8c5def5c4d8c9419e4af0cd8d3be724
|
f5de23265360e15fcfceb7d07bdadca0e5bb5f0a |
|
16-Nov-2016 |
Mathieu Chartier <mathieuc@google.com> |
X86_64: Add allocation entrypoint switching for CC is_marking Only X86_64 done so far. Use normal TLAB allocators if GC is not marking. Allocation speed goes up by ~8% based on perf sampling. Without change: 1.19%: art_quick_alloc_object_region_tlab With change: 0.63%: art_quick_alloc_object_tlab 0.47%: art_quick_alloc_object_region_tlab Bug: 31018974 Bug: 12687968 Test: test-art-host-run-test Change-Id: I4c4d9eb229d4ad2f41b856ba5c2958a5eb3b7ffa
|
8261d02f9523b95013108f271b82bb157ef6f71d |
|
08-Aug-2016 |
Mathieu Chartier <mathieuc@google.com> |
Revert "Revert "ARM64 asm for region space array allocation"" Also added missing large object check. No regression from the check N6P CC EAAC time at 1313 for 10 samples vs 1314 before reverts. Bug: 30162165 Bug: 12687968 Test: test-art-target with CC + heap poisoning This reverts commit 6ae7f3a4541e70f04243a6fe469aa3bd51e16d79. Change-Id: Ie28f652f619898d7d37eeebf3f31a88af8fac949
|
6ae7f3a4541e70f04243a6fe469aa3bd51e16d79 |
|
08-Aug-2016 |
Roland Levillain <rpl@google.com> |
Revert "ARM64 asm for region space array allocation" This change breaks many tests on the ARM64 concurrent collector configuration. Bug: 30162165 Bug: 12687968 This reverts commit f686c3feabe3519bedd1f3001e5dd598f46946ef. Change-Id: I5d7ef5fa2ffb6a8d9a4d3adbcc14854efa257313
|
f686c3feabe3519bedd1f3001e5dd598f46946ef |
|
04-Aug-2016 |
Mathieu Chartier <mathieuc@google.com> |
ARM64 asm for region space array allocation Wrote region space tlab array and array resolved allocators in assembly code. The speedup is a combined increase from checking the mark bit and having an assembly fast path. Added resolved, initialized entrypoints for object region TLAB allocator. N6P (960000 mhz) EEAC benchmark (average of 50 samples): CC 1442.309524 -> 1314 (10% improvement) CMS: 1382.32 Read barrier slow paths reaching C++ code go from 5M to 2.5M. Bug: 30162165 Bug: 12687968 Test: With CC: N6P boot, run EAAC, test-art-target Change-Id: I51515b11ef3f795f57eb72fe0f5759618fef5084
|
10d4c08c0ea9df0a85a11e1c77974df24078c0ec |
|
24-Feb-2016 |
Hiroshi Yamauchi <yamauchi@google.com> |
Assembly region TLAB allocation fast path for arm. This is for the CC collector. Share the common fast path code with the tlab fast path code. Speedup (on N5): BinaryTrees: 2291 -> 902 ms (-60%) MemAllocTest: 2137 -> 1845 ms (-14%) Bug: 9986565 Bug: 12687968 Change-Id: Ica63094ec2f85eaa4fd04d202a20090399275d85
|
dc412b6f49a65774b7af654f65cbff619cb7d85a |
|
15-Oct-2015 |
Hiroshi Yamauchi <yamauchi@google.com> |
Revert "Revert "Implement rosalloc fast path in assembly for 32 bit arm."" With a heap poisoning fix. This reverts commit cf91c7d973f3b2f491abc61d47c141782c96d46e. Bug: 9986565 Change-Id: Ia72edbde65ef6119e1931a77cc4c595a0b80ce31
|
cf91c7d973f3b2f491abc61d47c141782c96d46e |
|
15-Oct-2015 |
Nicolas Geoffray <ngeoffray@google.com> |
Revert "Implement rosalloc fast path in assembly for 32 bit arm." Tentative. Looks like heap poisoning breaks with this change. bug: 9986565 This reverts commit e6316940db61faead36f9642cce137d41fc8f606. Change-Id: I5c63758221464fe319315f40ae79c656048faed0
|
e6316940db61faead36f9642cce137d41fc8f606 |
|
08-Oct-2015 |
Hiroshi Yamauchi <yamauchi@google.com> |
Implement rosalloc fast path in assembly for 32 bit arm. Measurements (N5, ms) BinaryTrees: 1702 -> 987 (-42%) MemAllocTest: 2480 -> 2270 (-8%) Bug: 9986565 Change-Id: I460af3626ad724078463d27cf74a94b7ff7468c5
|
4adeab196d160f70b4865fb8be048ddd2ac7ab82 |
|
03-Oct-2015 |
Hiroshi Yamauchi <yamauchi@google.com> |
Refactor the alloc entry point generation code. Move the x86/x86-64 specific alloc entrypoint generation code to a macro GENERATE_ALLOC_ENTRYPOINTS_FOR_EACH_ALLOCATOR in a common file to remove duplication. This will make it easier to selectively add more hand-written assembly allocation fast path code. Rename RETURN_IF_RESULT_IS_NON_ZERO to RETURN_IF_RESULT_IS_NON_ZERO_OR_DELIVER in the x86/x86_64 files to match the other architectures. Bug: 9986565 Change-Id: I56f33b790f94db68891db8a2f42e9231d1770eef
|
69bdcb29fdbd8266374e3793cb4e28dcc5daf0f9 |
|
28-Apr-2015 |
Jeff Hao <jeffhao@google.com> |
Fix java_lang_Class newInstance for strings; also quick entrypoints. Change-Id: I35fd23c5a9051e1ffda0ecc2cbafb5d318c7b5e6
|
848f70a3d73833fc1bf3032a9ff6812e429661d9 |
|
15-Jan-2014 |
Jeff Hao <jeffhao@google.com> |
Replace String CharArray with internal uint16_t array. Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
|
2cd334ae2d4287216523882f0d298cf3901b7ab1 |
|
09-Jan-2015 |
Hiroshi Yamauchi <yamauchi@google.com> |
More of the concurrent copying collector. Bug: 12687968 Change-Id: I62f70274d47df6d6cab714df95c518b750ce3105
|
1cc7dbabd03e0a6c09d68161417a21bd6f9df371 |
|
18-Dec-2014 |
Andreas Gampe <agampe@google.com> |
ART: Reorder entrypoint argument order Shuffle the ArtMethod* referrer backwards for easier removal. Clean up ARM & MIPS assembly code. Change some macros to make future changes easier. Change-Id: Ie2862b68bd6e519438e83eecd9e1611df51d7945
|
bb8f0ab736b61db8f543e433859272e83f96ee9b |
|
28-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Embed array class pointers at array allocation sites. Following https://android-review.googlesource.com/#/c/79302, embed array class pointers at array allocation sites in the compiled code. Change-Id: I67a1292466dfbb7f48e746e5060e992dd93525c5
|
be1ca55db3362f5b100c4c65da5342fd299520bb |
|
15-Jan-2014 |
Hiroshi Yamauchi <yamauchi@google.com> |
Use direct class pointers at allocation sites in the compiled code. - Rather than looking up a class from its type ID (and checking if it's resolved/initialized, resolving/initializing if not), use direct class pointers, if possible (boot-code-to-boot-class pointers and app-code-to-boot-class pointers.) - This results in a 1-2% speedup in Ritz MemAllocTest on Nexus 4. - Embedding the object size (along with class pointers) caused a 1-2% slowdown in MemAllocTest and isn't implemented in this change. - TODO: do the same for array allocations. - TODO: when/if an application gets its own image, implement app-code-to-app-class pointers. - Fix a -XX:gc bug. cf. https://android-review.googlesource.com/79460/ - Add /tmp/android-data/dalvik-cache to the list of locations to remove oat files in clean-oat-host. cf. https://android-review.googlesource.com/79550 - Add back a dropped UNLIKELY in FindMethodFromCode(). cf. https://android-review.googlesource.com/74205 Bug: 9986565 Change-Id: I590b96bd21f7a7472f88e36752e675547559a5b1
|
e6da9af8dfe0a3e3fbc2be700554f6478380e7b9 |
|
16-Dec-2013 |
Mathieu Chartier <mathieuc@google.com> |
Background compaction support. When the process state changes to a state which does not perceives jank, we copy from the main free-list backed allocation space to the bump pointer space and enable the semispace allocator. When we transition back to foreground, we copy back to a free-list backed space. Create a seperate non-moving space which only holds non-movable objects. This enables us to quickly wipe the current alloc space (DlMalloc / RosAlloc) when we transition to background. Added multiple alloc space support to the sticky mark sweep GC. Added a -XX:BackgroundGC option which lets you specify which GC to use for background apps. Passing in -XX:BackgroundGC=SS makes the heap compact the heap for apps which do not perceive jank. Results: Simple background foreground test: 0. Reboot phone, unlock. 1. Open browser, click on home. 2. Open calculator, click on home. 3. Open calendar, click on home. 4. Open camera, click on home. 5. Open clock, click on home. 6. adb shell dumpsys meminfo PSS Normal ART: Sample 1: 88468 kB: Dalvik 3188 kB: Dalvik Other Sample 2: 81125 kB: Dalvik 3080 kB: Dalvik Other PSS Dalvik: Total PSS by category: Sample 1: 81033 kB: Dalvik 27787 kB: Dalvik Other Sample 2: 81901 kB: Dalvik 28869 kB: Dalvik Other PSS ART + Background Compaction: Sample 1: 71014 kB: Dalvik 1412 kB: Dalvik Other Sample 2: 73859 kB: Dalvik 1400 kB: Dalvik Other Dalvik other reduction can be explained by less deep allocation stacks / less live bitmaps / less dirty cards. TODO improvements: Recycle mem-maps which are unused in the current state. Not hardcode 64 MB capacity of non movable space (avoid returning linear alloc nightmares). Figure out ways to deal with low virtual address memory problems. Bug: 8981901 Change-Id: Ib235d03f45548ffc08a06b8ae57bf5bada49d6f3
|
692fafd9778141fa6ef0048c9569abd7ee0253bf |
|
30-Nov-2013 |
Mathieu Chartier <mathieuc@google.com> |
Thread local bump pointer allocator. Added a thread local allocator to the heap, each thread has three pointers which specify the thread local buffer: start, cur, and end. When the remaining space in the thread local buffer isn't large enough for the allocation, the allocator allocates a new thread local buffer using the bump pointer allocator. The bump pointer space had to be modified to accomodate thread local buffers. These buffers are called "blocks", where a block is a buffer which contains a set of adjacent objects. Blocks aren't necessarily full and may have wasted memory towards the end. Blocks have an 8 byte header which specifies their size and is required for traversing bump pointer spaces. Memory usage is in between full bump pointer and ROSAlloc since madvised memory limits wasted ram to an average of 1/2 page per block. Added a runtime option -XX:UseTLAB which specifies whether or not to use the thread local allocator. Its a NOP if the garbage collector is not the semispace collector. TODO: Smarter block accounting to prevent us reading objects until we either hit the end of the block or GetClass() == null which signifies that the block isn't 100% full. This would provide a slight speedup to BumpPointerSpace::Walk. Timings: -XX:HeapMinFree=4m -XX:HeapMaxFree=8m -Xmx48m ritzperf memalloc: Dalvik -Xgc:concurrent: 11678 Dalvik -Xgc:noconcurrent: 6697 -Xgc:MS: 5978 -Xgc:SS: 4271 -Xgc:CMS: 4150 -Xgc:SS -XX:UseTLAB: 3255 Bug: 9986565 Bug: 12042213 Change-Id: Ib7e1d4b199a8199f3b1de94b0a7b6e1730689cad
|
7410f29b4dae223befac036ea567d7f33351dad1 |
|
24-Nov-2013 |
Mathieu Chartier <mathieuc@google.com> |
Fix dumpsys meminfo <pid>. Added a case for BumpPointerSpaces. Confirmed working non-debug. Should also work in debug builds. Bug: 11830794 Change-Id: I12053ff16eec403dcd4a780e13095e3212a77132
|
cbb2d20bea2861f244da2e2318d8c088300a3710 |
|
15-Nov-2013 |
Mathieu Chartier <mathieuc@google.com> |
Refactor allocation entrypoints. Adds support for switching entrypoints during runtime. Enables addition of new allocators with out requiring significant copy paste. Slight speedup on ritzperf probably due to more inlining. TODO: Ensuring that the entire allocation path is inlined so that the switch statement in the allocation code is optimized out. Rosalloc measurements: 4583 4453 4439 4434 4751 After change: 4184 4287 4131 4335 4097 Change-Id: I1352a3cbcdf6dae93921582726324d91312df5c9
|