1# Simpleperf
2
3Simpleperf is a native profiling tool for Android. It can be used to profile
4both Android applications and native processes running on Android. It can
5profile both Java and C++ code on Android. It can be used on Android L
6and above.
7
8Simpleperf is part of the Android Open Source Project. The source code is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/).
9The latest document is [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/README.md).
10Bugs and feature requests can be submitted at http://github.com/android-ndk/ndk/issues.
11
12
13## Table of Contents
14
15- [Simpleperf introduction](#simpleperf-introduction)
16 - [Why simpleperf](#why-simpleperf)
17 - [Tools in simpleperf](#tools-in-simpleperf)
18 - [Simpleperf's profiling principle](#simpleperfs-profiling-principle)
19 - [Main simpleperf commands](#main-simpleperf-commands)
20 - [Simpleperf list](#simpleperf-list)
21 - [Simpleperf stat](#simpleperf-stat)
22 - [Simpleperf record](#simpleperf-record)
23 - [Simpleperf report](#simpleperf-report)
24- [Android application profiling](#android-application-profiling)
25 - [Prepare an Android application](#prepare-an-android-application)
26 - [Record and report profiling data (using command-lines)](#record-and-report-profiling-data-using-commandlines)
27 - [Record and report profiling data (using python scripts)](#record-and-report-profiling-data-using-python-scripts)
28 - [Record and report call graph](#record-and-report-call-graph)
29 - [Visualize profiling data](#visualize-profiling-data)
30 - [Annotate source code](#annotate-source-code)
31- [Answers to common issues](#answers-to-common-issues)
32 - [The correct way to pull perf.data on host](#the-correct-way-to-pull-perfdata-on-host)
33
34## Simpleperf introduction
35
36### Why simpleperf
37
38Simpleperf works similar to linux-tools-perf, but it has some specific features for
39Android profiling:
40
411. Aware of Android environment
42
43 a. It can profile embedded shared libraries in apk.
44
45 b. It reads symbols and debug information from .gnu_debugdata section.
46
47 c. It gives suggestions when errors occur.
48
49 d. When recording with -g option, unwind the stack before writting to file to
50 save storage space.
51
52 e. It supports adding additional information (like symbols) in perf.data, to
53 support recording on device and reporting on host.
54
552. Using python scripts for profiling tasks
56
573. Easy to release
58
59 a. Simpleperf executables on device are built as static binaries. They can be
60 pushed on any Android device and run.
61
62 b. Simpleperf executables on host are built as static binaries, and support
63 different hosts: mac, linux and windows.
64
65
66### Tools in simpleperf
67
68Simpleperf is periodically released with Android ndk, located at `simpleperf/`.
69The latest release can be found [here](https://android.googlesource.com/platform/prebuilts/simpleperf/).
70Simpleperf tools contain executables, shared libraries and python scripts.
71
72**Simpleperf executables running on Android device**
73Simpleperf executables running on Android device are located at `bin/android/`.
74Each architecture has one executable, like `bin/android/arm64/simpleperf`. It
75can record and report profiling data. It provides a command-line interface
76broadly the same as the linux-tools perf, and also supports some additional
77features for Android-specific profiling.
78
79**Simpleperf executables running on hosts**
80Simpleperf executables running on hosts are located at `bin/darwin`, `bin/linux`
81and `bin/windows`. Each host and architecture has one executable, like
82`bin/linux/x86_64/simpleperf`. It provides a command-line interface for
83reporting profiling data on hosts.
84
85**Simpleperf report shared libraries used on host**
86Simpleperf report shared libraries used on host are located at `bin/darwin`,
87`bin/linux` and `bin/windows`. Each host and architecture has one library, like
88`bin/linux/x86_64/libsimpleperf_report.so`. It is a library for parsing
89profiling data.
90
91**Python scripts**
92Python scripts are written to help different profiling tasks.
93
94`annotate.py` is used to annotate source files based on profiling data.
95
96`app_profiler.py` is used to profile Android applications.
97
98`binary_cache_builder.py` is used to pull libraries from Android devices.
99
100`pprof_proto_generator.py` is used to convert profiling data to format used by pprof.
101
102`report.py` is used to provide a GUI interface to report profiling result.
103
104`report_sample.py` is used to generate flamegraph.
105
106`simpleperf_report_lib.py` provides a python interface for parsing profiling data.
107
108
109### Simpleperf's profiling principle
110
111Modern CPUs have a hardware component called the performance monitoring unit
112(PMU). The PMU has several hardware counters, counting events like how many cpu
113cycles have happened, how many instructions have executed, or how many cache
114misses have happened.
115
116The Linux kernel wraps these hardware counters into hardware perf events. In
117addition, the Linux kernel also provides hardware independent software events
118and tracepoint events. The Linux kernel exposes all this to userspace via the
119perf_event_open system call, which simpleperf uses.
120
121Simpleperf has three main functions: stat, record and report.
122
123The stat command gives a summary of how many events have happened in the
124profiled processes in a time period. Here’s how it works:
1251. Given user options, simpleperf enables profiling by making a system call to
126linux kernel.
1272. Linux kernel enables counters while scheduling on the profiled processes.
1283. After profiling, simpleperf reads counters from linux kernel, and reports a
129counter summary.
130
131The record command records samples of the profiled process in a time period.
132Here’s how it works:
1331. Given user options, simpleperf enables profiling by making a system call to
134linux kernel.
1352. Simpleperf creates mapped buffers between simpleperf and linux kernel.
1363. Linux kernel enable counters while scheduling on the profiled processes.
1374. Each time a given number of events happen, linux kernel dumps a sample to a
138mapped buffer.
1395. Simpleperf reads samples from the mapped buffers and generates perf.data.
140
141The report command reads a "perf.data" file and any shared libraries used by
142the profiled processes, and outputs a report showing where the time was spent.
143
144
145### Main simpleperf commands
146
147Simpleperf supports several subcommands, including list, stat, record and report.
148Each subcommand supports different options. This section only covers the most
149important subcommands and options. To see all subcommands and options,
150use --help.
151
152 # List all subcommands.
153 $ simpleperf --help
154
155 # Print help message for record subcommand.
156 $ simpleperf record --help
157
158
159#### Simpleperf list
160
161simpleperf list is used to list all events available on the device. Different
162devices may support different events because of differences in hardware and
163kernel.
164
165 $ simpleperf list
166 List of hw-cache events:
167 branch-loads
168 ...
169 List of hardware events:
170 cpu-cycles
171 instructions
172 ...
173 List of software events:
174 cpu-clock
175 task-clock
176 ...
177
178
179#### Simpleperf stat
180
181simpleperf stat is used to get a raw event counter information of the profiled program
182or system-wide. By passing options, we can select which events to use, which
183processes/threads to monitor, how long to monitor and the print interval.
184Below is an example.
185
186 # Stat using default events (cpu-cycles,instructions,...), and monitor
187 # process 7394 for 10 seconds.
188 $ simpleperf stat -p 7394 --duration 10
189 Performance counter statistics:
190
191 1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
192 510,426,028 instructions # 2.587047 cycles per instruction (100%)
193 4,692,338 branch-misses # 468.118 K/sec (100%)
194 886.008130(ms) task-clock # 0.088390 cpus used (100%)
195 753 context-switches # 75.121 /sec (100%)
196 870 page-faults # 86.793 /sec (100%)
197
198 Total test time: 10.023829 seconds.
199
200**Select events**
201We can select which events to use via -e option. Below are examples:
202
203 # Stat event cpu-cycles.
204 $ simpleperf stat -e cpu-cycles -p 11904 --duration 10
205
206 # Stat event cache-references and cache-misses.
207 $ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
208
209When running the stat command, if the number of hardware events is larger than
210the number of hardware counters available in the PMU, the kernel shares hardware
211counters between events, so each event is only monitored for part of the total
212time. In the example below, there is a percentage at the end of each row,
213showing the percentage of the total time that each event was actually monitored.
214
215 # Stat using event cache-references, cache-references:u,....
216 $ simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
217 Performance counter statistics:
218
219 4,331,018 cache-references # 4.861 M/sec (87%)
220 3,064,089 cache-references:u # 3.439 M/sec (87%)
221 1,364,959 cache-references:k # 1.532 M/sec (87%)
222 91,721 cache-misses # 102.918 K/sec (87%)
223 45,735 cache-misses:u # 51.327 K/sec (87%)
224 38,447 cache-misses:k # 43.131 K/sec (87%)
225 9,688,515 instructions # 10.561 M/sec (89%)
226
227 Total test time: 1.026802 seconds.
228
229In the example above, each event is monitored about 87% of the total time. But
230there is no guarantee that any pair of events are always monitored at the same
231time. If we want to have some events monitored at the same time, we can use
232--group option. Below is an example.
233
234 # Stat using event cache-references, cache-references:u,....
235 $ simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
236 Performance counter statistics:
237
238 3,638,900 cache-references # 4.786 M/sec (74%)
239 65,171 cache-misses # 1.790953% miss rate (74%)
240 2,390,433 cache-references:u # 3.153 M/sec (74%)
241 32,280 cache-misses:u # 1.350383% miss rate (74%)
242 879,035 cache-references:k # 1.251 M/sec (68%)
243 30,303 cache-misses:k # 3.447303% miss rate (68%)
244 8,921,161 instructions # 10.070 M/sec (86%)
245
246 Total test time: 1.029843 seconds.
247
248**Select target to monitor**
249We can select which processes or threads to monitor via -p option or -t option.
250Monitoring a process is the same as monitoring all threads in the process.
251Simpleperf can also fork a child process to run the new command and then monitor
252the child process. Below are examples.
253
254 # Stat process 11904 and 11905.
255 $ simpleperf stat -p 11904,11905 --duration 10
256
257 # Stat thread 11904 and 11905.
258 $ simpleperf stat -t 11904,11905 --duration 10
259
260 # Start a child process running `ls`, and stat it.
261 $ simpleperf stat ls
262
263**Decide how long to monitor**
264When monitoring existing threads, we can use --duration option to decide how long
265to monitor. When monitoring a child process running a new command, simpleperf
266monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
267at any time. Below are examples.
268
269 # Stat process 11904 for 10 seconds.
270 $ simpleperf stat -p 11904 --duration 10
271
272 # Stat until the child process running `ls` finishes.
273 $ simpleperf stat ls
274
275 # Stop monitoring using Ctrl-C.
276 $ simpleperf stat -p 11904 --duration 10
277 ^C
278
279**Decide the print interval**
280When monitoring perf counters, we can also use --interval option to decide the print
281interval. Below are examples.
282
283 # Print stat for process 11904 every 300ms.
284 $ simpleperf stat -p 11904 --duration 10 --interval 300
285
286 # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
287 # system wide profiling needs root privilege
288 $ su 0 simpleperf stat -a --duration 10 --interval 300
289
290**Display counters in systrace**
291simpleperf can also work with systrace to dump counters in the collected trace.
292Below is an example to do a system wide stat
293
294 # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
295 $ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
296 # on host launch systrace to collect trace for 10 seconds
297 (HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
298 # open the collected new.html in browser and perf counters will be shown up
299
300
301#### Simpleperf record
302
303simpleperf record is used to dump records of the profiled program. By passing
304options, we can select which events to use, which processes/threads to monitor,
305what frequency to dump records, how long to monitor, and where to store records.
306
307 # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
308 # using default sample frequency (4000 samples per second), writing records
309 # to perf.data.
310 $ simpleperf record -p 7394 --duration 10
311 simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
312
313**Select events**
314In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
315As a hardware event, it is both accurate and efficient. We can also use other
316events via -e option. Below is an example.
317
318 # Record using event instructions.
319 $ simpleperf record -e instructions -p 11904 --duration 10
320
321**Select target to monitor**
322The way to select target in record command is similar to that in stat command.
323Below are examples.
324
325 # Record process 11904 and 11905.
326 $ simpleperf record -p 11904,11905 --duration 10
327
328 # Record thread 11904 and 11905.
329 $ simpleperf record -t 11904,11905 --duration 10
330
331 # Record a child process running `ls`.
332 $ simpleperf record ls
333
334**Set the frequency to record**
335We can set the frequency to dump records via the -f or -c options. For example,
336-f 4000 means dumping approximately 4000 records every second when the monitored
337thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
338or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
339records every second. Another way is using -c option. For example, -c 10000
340means dumping one record whenever 10000 events happen. Below are examples.
341
342 # Record with sample frequency 1000: sample 1000 times every second running.
343 $ simpleperf record -f 1000 -p 11904,11905 --duration 10
344
345 # Record with sample period 100000: sample 1 time every 100000 events.
346 $ simpleperf record -c 100000 -t 11904,11905 --duration 10
347
348**Decide how long to monitor**
349The way to decide how long to monitor in record command is similar to that in
350stat command. Below are examples.
351
352 # Record process 11904 for 10 seconds.
353 $ simpleperf record -p 11904 --duration 10
354
355 # Record until the child process running `ls` finishes.
356 $ simpleperf record ls
357
358 # Stop monitoring using Ctrl-C.
359 $ simpleperf record -p 11904 --duration 10
360 ^C
361
362**Set the path to store records**
363By default, simpleperf stores records in perf.data in current directory. We can
364use -o option to set the path to store records. Below is an example.
365
366 # Write records to data/perf2.data.
367 $ simpleperf record -p 11904 -o data/perf2.data --duration 10
368
369
370#### Simpleperf report
371
372simpleperf report is used to report based on perf.data generated by simpleperf
373record command. Report command groups records into different sample entries,
374sorts sample entries based on how many events each sample entry contains, and
375prints out each sample entry. By passing options, we can select where to find
376perf.data and executable binaries used by the monitored program, filter out
377uninteresting records, and decide how to group records.
378
379Below is an example. Records are grouped into 4 sample entries, each entry is
380a row. There are several columns, each column shows piece of information
381belonging to a sample entry. The first column is Overhead, which shows the
382percentage of events inside current sample entry in total events. As the
383perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
384time used in each function.
385
386 # Reports perf.data, using only records sampled in libsudo-game-jni.so,
387 # grouping records using thread name(comm), process id(pid), thread id(tid),
388 # function name(symbol), and showing sample count for each row.
389 $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
390 Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
391 Arch: arm64
392 Event: cpu-cycles (type 0, config 0)
393 Samples: 28235
394 Event count: 546356211
395
396 Overhead Sample Command Pid Tid Symbol
397 59.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int)
398 20.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int)
399 13.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int)
400 6.24% 1756 sudogame 7394 7394 @plt
401
402**Set the path to read records**
403By default, simpleperf reads perf.data in current directory. We can use -i
404option to select another file to read records.
405
406 $ simpleperf report -i data/perf2.data
407
408**Set the path to find executable binaries**
409If reporting function symbols, simpleperf needs to read executable binaries
410used by the monitored processes to get symbol table and debug information. By
411default, the paths are the executable binaries used by monitored processes while
412recording. However, these binaries may not exist when reporting or not contain
413symbol table and debug information. So we can use --symfs to redirect the paths.
414Below is an example.
415
416 $ simpleperf report
417 # In this case, when simpleperf wants to read executable binary /A/b,
418 # it reads file in /A/b.
419
420 $ simpleperf report --symfs /debug_dir
421 # In this case, when simpleperf wants to read executable binary /A/b,
422 # it prefers file in /debug_dir/A/b to file in /A/b.
423
424**Filter records**
425When reporting, it happens that not all records are of interest. Simpleperf
426supports five filters to select records of interest. Below are examples.
427
428 # Report records in threads having name sudogame.
429 $ simpleperf report --comms sudogame
430
431 # Report records in process 7394 or 7395
432 $ simpleperf report --pids 7394,7395
433
434 # Report records in thread 7394 or 7395.
435 $ simpleperf report --tids 7394,7395
436
437 # Report records in libsudo-game-jni.so.
438 $ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
439
440 # Report records in function checkValid or canFindSolution_r.
441 $ simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
442
443**Decide how to group records into sample entries**
444Simpleperf uses --sort option to decide how to group sample entries. Below are
445examples.
446
447 # Group records based on their process id: records having the same process
448 # id are in the same sample entry.
449 $ simpleperf report --sort pid
450
451 # Group records based on their thread id and thread comm: records having
452 # the same thread id and thread name are in the same sample entry.
453 $ simpleperf report --sort tid,comm
454
455 # Group records based on their binary and function: records in the same
456 # binary and function are in the same sample entry.
457 $ simpleperf report --sort dso,symbol
458
459 # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
460 # thread, and belong to the same function in the same binary.
461 $ simpleperf report
462
463
464## Android application profiling
465
466This section shows how to profile an Android application.
467[Here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/README.md) are examples. And we use
468[SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) project to show the profiling results.
469
470Simpleperf only supports profiling native instructions in binaries in ELF
471format. If the Java code is executed by interpreter, or with jit cache, it
472can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation,
473it can compile Java bytecode into native instructions with debug information.
474On devices with Android version <= M, we need root privilege to compile Java
475bytecode with debug information. However, on devices with Android version >= N,
476we don't need root privilege to do so.
477
478Profiling an Android application involves three steps:
4791. Prepare the application.
4802. Record profiling data.
4813. Report profiling data.
482
483To profile, we can use either command lines or python scripts. Below shows both.
484
485
486### Prepare an Android application
487
488Before profiling, we need to install the application to be profiled on an Android device.
489To get valid profiling results, please check following points:
490
491**1. The application should be debuggable.**
492It means [android:debuggable](https://developer.android.com/guide/topics/manifest/application-element.html#debug)
493should be true. So we need to use debug [build type](https://developer.android.com/studio/build/build-variants.html#build-types)
494instead of release build type. It is understandable because we can't profile others' apps.
495However, on a rooted Android device, the application doesn't need to be debuggable.
496
497**2. Run on an Android device >= L.**
498Profiling on emulators are not yet supported. And to profile Java code, we need
499the jvm running in oat mode, which is only available >= L.
500
501**3. On Android O, add `wrap.sh` in the apk.**
502To profile Java code, we need the jvm running in oat mode. But on Android O,
503debuggable applications are forced to run in jit mode. To work around this,
504we need to add a `wrap.sh` in the apk. So if you are running on Android O device,
505Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
506for how to add `wrap.sh` in the apk.
507
508**4. Make sure C++ code is compiled with optimizing flags.**
509If the application contains C++ code, it can be compiled with -O0 flag in debug build type.
510This makes C++ code slow. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
511for how to avoid that.
512
513**5. Use native libraries with debug info in the apk when possible.**
514If the application contains C++ code or pre-compiled native libraries, try to use
515unstripped libraries in the apk. This helps simpleperf generating better profiling
516results. Check [here](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava/app/profiling.gradle)
517for how to use unstripped libraries.
518
519Here we use [SimpleperfExamplePureJava](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/demo/SimpleperfExamplePureJava) as an example.
520It builds an app-profiling.apk for profiling.
521
522 $ git clone https://android.googlesource.com/platform/system/extras
523 $ cd extras/simpleperf/demo
524 # Open SimpleperfExamplesPureJava project with Android studio,
525 # and build this project sucessfully, otherwise the `./gradlew` command below will fail.
526 $ cd SimpleperfExamplePureJava
527
528 # On windows, use "gradlew" instead.
529 $ ./gradlew clean assemble
530 $ adb install -r app/build/outputs/apk/app-profiling.apk
531
532
533### Record and report profiling data (using command-lines)
534
535We recommend using python scripts for profiling because they are more convenient.
536But using command-line will give us a better understanding of the profile process
537step by step. So we first show how to use command lines.
538
539**1. Enable profiling**
540
541 $ adb shell setprop security.perf_harden 0
542
543**2. Fully compile the app**
544
545We need to compile Java bytecode into native instructions to profile Java code
546in the application. This needs different commands on different Android versions.
547
548On Android >= N:
549
550 $ adb shell setprop debug.generate-debug-info true
551 $ adb shell cmd package compile -f -m speed com.example.simpleperf.simpleperfexamplepurejava
552 # Restart the app to take effect
553 $ adb shell am force-stop com.example.simpleperf.simpleperfexamplepurejava
554
555On Android M devices, We need root privilege to force Android to fully compile
556Java code into native instructions in ELF binaries with debug information. We
557also need root privilege to read compiled native binaries (because installd
558writes them to a directory whose uid/gid is system:install). So profiling Java
559code can only be done on rooted devices.
560
561 $ adb root
562 $ adb shell setprop dalvik.vm.dex2oat-flags -g
563
564 # Reinstall the app.
565 $ adb install -r app/build/outputs/apk/app-profiling.apk
566
567On Android L devices, we also need root privilege to compile the app with debug info
568and access the native binaries.
569
570 $ adb root
571 $ adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
572
573 # Reinstall the app.
574 $ adb install -r app/build/outputs/apk/app-profiling.apk
575
576
577**3. Find the app process**
578
579 # Start the app if needed
580 $ adb shell am start -n com.example.simpleperf.simpleperfexamplepurejava/.MainActivity
581
582 # Run `ps` in the app's context. On Android >= O devicces, run `ps -e` instead.
583 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ps | grep simpleperf
584 u0_a151 6885 3346 1590504 53980 SyS_epoll_ 6fc2024b6c S com.example.simpleperf.simpleperfexamplepurejava
585
586So the id of the app process is `6885`. We will use this number in the command lines below,
587please replace this number with what you get by running `ps` command.
588
589**4. Download simpleperf to the app's data directory**
590
591 # Find which architecture the app is using.
592 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cat /proc/6885/maps | grep boot.oat
593 708e6000-70e33000 r--p 00000000 103:09 1214 /system/framework/arm64/boot.oat
594
595 # The app uses /arm64/boot.oat, so push simpleperf in bin/android/arm64/ to device.
596 $ cd ../../scripts/
597 $ adb push bin/android/arm64/simpleperf /data/local/tmp
598 $ adb shell chmod a+x /data/local/tmp/simpleperf
599 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava cp /data/local/tmp/simpleperf .
600
601
602**5. Record perf.data**
603
604 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -p 6885 --duration 10
605 simpleperf I 04-27 20:41:11 6940 6940 cmd_record.cpp:357] Samples recorded: 40008. Samples lost: 0.
606
607 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ls -lh perf.data
608 simpleperf I 04-27 20:31:40 5999 5999 cmd_record.cpp:357] Samples recorded: 39949. Samples lost: 0.
609
610The profiling data is recorded at perf.data.
611
612Normally we need to use the app when profiling, otherwise we may record no samples.
613But in this case, the MainActivity starts a busy thread. So we don't need to use
614the app while profiling.
615
616There are many options to record profiling data, check [record command](#simpleperf-record) for details.
617
618**6. Report perf.data**
619
620 # Pull perf.data on host.
621 $ adb shell "run-as com.example.simpleperf.simpleperfexamplepurejava cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
622 $ adb pull /data/local/tmp/perf.data
623
624 # Report samples using corresponding simpleperf executable on host.
625 # On windows, use "bin\windows\x86_64\simpleperf" instead.
626 $ bin/linux/x86_64/simpleperf report
627 ...
628 Overhead Command Pid Tid Shared Object Symbol
629 83.54% Thread-2 6885 6900 /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
630 16.11% Thread-2 6885 6900 /data/app/com.example.simpleperf.simpleperfexamplepurejava-2/oat/arm64/base.odex int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
631
632See [here](#the-correct-way-to-pull-perfdata-on-host) for why we use tee rather than just >.
633There are many ways to show reports, check [report command](#simpleperf-report) for details.
634
635
636### Record and report profiling data (using python scripts)
637
638Besides command lines, We can use `app-profiler.py` to profile Android applications.
639It downloads simpleperf on device, records perf.data, and collects profiling
640results and native binaries on host. It is configured by `app-profiler.config`.
641
642**1. Fill `app-profiler.config`**
643
644 Change `app_package_name` line to app_package_name="com.example.simpleperf.simpleperfexamplepurejava"
645 Change `apk_file_path` line to apk_file_path = "../SimpleperfExamplePureJava/app/build/outputs/apk/app-profiling.apk"
646 Change `android_studio_project_dir` line to android_studio_project_dir = "../SimpleperfExamplePureJava/"
647 Change `record_options` line to record_options = "--duration 10"
648
649`apk_file_path` is needed to fully compile the application on Android L/M. It is
650not necessary on Android >= N.
651
652`android_studio_project_dir` is used to search native libraries in the
653application. It is not necessary for profiling.
654
655`record_options` can be set to any option accepted by simpleperf record command.
656
657**2. Run `app-profiler.py`**
658
659 $ python app_profiler.py
660
661
662If running successfully, it will collect profiling data in perf.data in current
663directory, and related native binaries in binary_cache/.
664
665**3. Report perf.data**
666
667We can use `report.py` to report perf.data.
668
669 $ python report.py
670
671We can add any option accepted by `simpleperf report` command to `report.py`.
672
673
674### Record and report call graph
675
676A call graph is a tree showing function call relations. Below is an example.
677
678 main() {
679 FunctionOne();
680 FunctionTwo();
681 }
682 FunctionOne() {
683 FunctionTwo();
684 FunctionThree();
685 }
686 callgraph:
687 main-> FunctionOne
688 | |
689 | |-> FunctionTwo
690 | |-> FunctionThree
691 |
692 |-> FunctionTwo
693
694
695#### Record dwarf based call graph
696
697When using command lines, add `-g` option like below:
698
699 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record -g -p 6685 --duration 10
700
701When using python scripts, change `app-profiler.config` as below:
702
703 Change `record_options` line to record_options = "--duration 10 -g"
704
705Recording dwarf based call graph needs support of debug information
706in native binaries. So if using native libraries in the application,
707it is better to contain non-stripped native libraries in the apk.
708
709
710#### Record stack frame based call graph
711
712When using command lines, add `--call-graph fp` option like below:
713
714 $ adb shell run-as com.example.simpleperf.simpleperfexamplepurejava ./simpleperf record --call-graph fp -p 6685 --duration 10
715
716When using python scripts, change `app-profiler.config` as below:
717
718 Change `record_options` line to record_options = "--duration 10 --call-graph fp"
719
720Recording stack frame based call graphs needs support of stack frame
721register. Notice that on arm architecture, the stack frame register
722is not well supported, even if compiled using -O0 -g -fno-omit-frame-pointer
723options. It is because the kernel can't unwind user stack containing both
724arm/thumb code. **So please consider using dwarf based call graph on arm
725architecture, or profiling in arm64 environment.**
726
727
728#### Report call graph
729
730To report call graph using command lines, add `-g` option.
731
732 $ bin/linux/x86_64/simpleperf report -g
733 ...
734 Children Self Command Pid Tid Shared Object Symbol
735 99.97% 0.00% Thread-2 10859 10876 /system/framework/arm64/boot.oat java.lang.Thread.run
736 |
737 -- java.lang.Thread.run
738 |
739 -- void com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.run()
740 |--83.66%-- [hit in function]
741 |
742 |--16.22%-- int com.example.simpleperf.simpleperfexamplepurejava.MainActivity$1.callFunction(int)
743 | |--99.97%-- [hit in function]
744
745To report call graph using python scripts, add `-g` option.
746
747 $ python report.py -g
748 # Double-click an item started with '+' to show its callgraph.
749
750### Visualize profiling data
751
752`simpleperf_report_lib.py` provides an interface reading samples from perf.data.
753By using it, You can write python scripts to read perf.data or convert perf.data
754to other formats. Below are two examples.
755
756
757### Show flamegraph
758
759 $ python report_sample.py >out.perf
760 $ stackcollapse-perf.pl out.perf >out.folded
761 $ ./flamegraph.pl out.folded >a.svg
762
763
764### Visualize using pprof
765
766pprof is a tool for visualization and analysis of profiling data. It can
767be got from https://github.com/google/pprof. pprof_proto_generator.py can
768generate profiling data in a format acceptable by pprof.
769
770 $ python pprof_proto_generator.py
771 $ pprof -pdf pprof.profile
772
773
774### Annotate source code
775
776`annotate.py` reads perf.data, binaries in `binary-cache` (collected by `app-profiler.py`)
777and source code, and generates annoated source code in `annotated_files/`.
778
779**1. Run annotate.py**
780
781 $ python annotate.py -s ../SimpleperfExamplePureJava
782
783`addr2line` is need to annotate source code. It can be found in Android ndk
784release, in paths like toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/aarch64-linux-android-addr2line.
785Please use `--addr2line` option to set the path of `addr2line` if annotate.py
786can't find it.
787
788**2. Read annotated code**
789
790The annotated source code is located at `annotated_files/`.
791`annotated_files/summary` shows how each source file is annotated.
792
793One annotated source file is `annotated_files/java/com/example/simpleperf/simpleperfexamplepurejava/MainActivity.java`.
794It's content is similar to below:
795
796 // [file] shows how much time is spent in current file.
797 /* [file] acc_p: 99.966552%, p: 99.837438% */package com.example.simpleperf.simpleperfexamplepurejava;
798 ...
799 // [func] shows how much time is spent in current function.
800 /* [func] acc_p: 16.213395%, p: 16.209250% */ private int callFunction(int a) {
801 ...
802 // This shows how much time is spent in current line.
803 // acc_p field means how much time is spent in current line and functions called by current line.
804 // p field means how much time is spent just in current line.
805 /* acc_p: 99.966552%, p: 83.628188% */ i = callFunction(i);
806
807
808## Answers to common issues
809
810### The correct way to pull perf.data on host
811As perf.data is generated in app's context, it can't be pulled directly to host.
812One way is to `adb shell run-as xxx cat perf.data >perf.data`. However, it
813doesn't work well on Windows, because the content can be modified when it goes
814through the pipe. So we first copy it from app's context to shell's context,
815then pull it on host. The commands are as below:
816
817 $adb shell "run-as xxx cat perf.data | tee /data/local/tmp/perf.data >/dev/null"
818 $adb pull /data/local/tmp/perf.data
819
820## Inferno
821
822![logo](./inferno/inferno_small.png)
823
824### Description
825
826Inferno is a flamegraph generator for native (C/C++) Android apps. It was
827originally written to profile and improve surfaceflinger performance
828(Android compositor) but it can be used for any native Android application
829. You can see a sample report generated with Inferno
830[here](./inferno/report.html). Report are self-contained in HTML so they can be
831exchanged easily.
832
833Notice there is no concept of time in a flame graph since all callstack are
834merged together. As a result, the width of a flamegraph represents 100% of
835the number of samples and the height is related to the number of functions on
836the stack when sampling occurred.
837
838
839![flamegraph sample](./inferno/main_thread_flamegraph.png)
840
841In the flamegraph featured above you can see the main thread of SurfaceFlinger.
842It is immediatly apparent that most of the CPU time is spent processing messages
843`android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
844 the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
845. This graphic division helps to see what part of the program is costly and
846where a developer's effort to improve performances should go.
847
848### Example of bottleneck
849
850A flamegraph give you instant vision on the CPU cycles cost centers but
851it can also be used to find specific offenders. To find them, look for
852plateaus. It is easier to see an example:
853
854![flamegraph sample](./inferno/bottleneck.png)
855
856In the previous flamegraph, two
857plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
858are immediately apparent.
859
860### How it works
861Inferno relies on simpleperf to record the callstack of a native application
862thousands of times per second. Simpleperf takes care of unwinding the stack
863either using frame pointer (recommended) or dwarf. At the end of the recording
864`simpleperf` also symbolize all IPs automatically. The record are aggregated and
865dumps dumped to a file `perf.data`. This file is pulled from the Android device
866and processed on the host by Inferno. The callstacks are merged together to
867visualize in which part of an app the CPU cycles are spent.
868
869### How to use it
870
871Open a terminal and from `simpleperf` directory type:
872```
873./inferno.sh (on Linux/Mac)
874./inferno.bat (on Windows)
875```
876
877Inferno will collect data, process them and automatically open your web browser
878to display the HTML report.
879
880### Parameters
881
882You can select how long to sample for, the color of the node and many other
883things. Use `-h` to get a list of all supported parameters.
884
885```
886./inferno.sh -h
887```
888
889### Troubleshooting
890
891#### Messy flame graph
892A healthy flame graph features a single call site at its base
893(see `inferno/report.html`).
894If you don't see a unique call site like `_start` or `_start_thread` at the base
895from which all flames originate, something went wrong. : Stack unwinding may
896fail to reach the root callsite. These incomplete
897callstack are impossible to merge properly. By default Inferno asks
898 `simpleperf` to unwind the stack via the kernel and frame pointers. Try to
899 perform unwinding with dwarf `-du`, you can further tune this setting.
900
901
902#### No flames
903If you see no flames at all or a mess of 1 level flame without a common base,
904this may be because you compiled without frame pointers. Make sure there is no
905` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
906collect data with dward unwinding `-du`.
907
908
909
910#### High percentage of lost samples
911
912If simpleperf reports a lot of lost sample it is probably because you are
913unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
914processed. Try to use frame pointer unwinding which can be done by the kernel
915and it much faster.
916
917The cost of frame pointer is negligible on arm64 parameter but considerable
918 on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
919 profiling.
920
921#### run-as: package not debuggable
922If you cannot run as root, make sure the app is debuggable otherwise simpleperf
923will not be able to profile it.