README.md
1# Simpleperf Introduction
2## What is simpleperf
3Simpleperf is a native profiling tool for Android. Its command-line interface
4supports broadly the same options as the linux-tools perf, but also supports
5various Android-specific improvements.
6
7Simpleperf is part of the Android Open Source Project. The source code is at
8https://android.googlesource.com/platform/system/extras/+/master/simpleperf/.
9The latest document is at
10https://android.googlesource.com/platform/system/extras/+show/master/simpleperf/README.md.
11Bugs and feature requests can be submitted at
12http://github.com/android-ndk/ndk/issues.
13
14## How simpleperf works
15Modern CPUs have a hardware component called the performance monitoring unit
16(PMU). The PMU has several hardware counters, counting events like how many cpu
17cycles have happened, how many instructions have executed, or how many cache
18misses have happened.
19
20The Linux kernel wraps these hardware counters into hardware perf events. In
21addition, the Linux kernel also provides hardware independent software events
22and tracepoint events. The Linux kernel exposes all this to userspace via the
23perf_event_open system call, which simpleperf uses.
24
25Simpleperf has three main functions: stat, record and report.
26
27The stat command gives a summary of how many events have happened in the
28profiled processes in a time period. Here’s how it works:
291. Given user options, simpleperf enables profiling by making a system call to
30linux kernel.
312. Linux kernel enables counters while scheduling on the profiled processes.
323. After profiling, simpleperf reads counters from linux kernel, and reports a
33counter summary.
34
35The record command records samples of the profiled process in a time period.
36Here’s how it works:
371. Given user options, simpleperf enables profiling by making a system call to
38linux kernel.
392. Simpleperf creates mapped buffers between simpleperf and linux kernel.
403. Linux kernel enable counters while scheduling on the profiled processes.
414. Each time a given number of events happen, linux kernel dumps a sample to a
42mapped buffer.
435. Simpleperf reads samples from the mapped buffers and generates perf.data.
44
45The report command reads a "perf.data" file and any shared libraries used by
46the profiled processes, and outputs a report showing where the time was spent.
47
48## Main simpleperf commands
49Simpleperf supports several subcommands, including list, stat, record, report.
50Each subcommand supports different options. This section only covers the most
51important subcommands and options. To see all subcommands and options,
52use --help.
53
54 # List all subcommands.
55 $simpleperf --help
56
57 # Print help message for record subcommand.
58 $simpleperf record --help
59
60### simpleperf list
61simpleperf list is used to list all events available on the device. Different
62devices may support different events because of differences in hardware and
63kernel.
64
65 $simpleperf list
66 List of hw-cache events:
67 branch-loads
68 ...
69 List of hardware events:
70 cpu-cycles
71 instructions
72 ...
73 List of software events:
74 cpu-clock
75 task-clock
76 ...
77
78### simpleperf stat
79simpleperf stat is used to get a raw event counter information of the profiled program
80or system-wide. By passing options, we can select which events to use, which
81processes/threads to monitor, how long to monitor and the print interval.
82Below is an example.
83
84 # Stat using default events (cpu-cycles,instructions,...), and monitor
85 # process 7394 for 10 seconds.
86 $simpleperf stat -p 7394 --duration 10
87 Performance counter statistics:
88
89 1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
90 510,426,028 instructions # 2.587047 cycles per instruction (100%)
91 4,692,338 branch-misses # 468.118 K/sec (100%)
92 886.008130(ms) task-clock # 0.088390 cpus used (100%)
93 753 context-switches # 75.121 /sec (100%)
94 870 page-faults # 86.793 /sec (100%)
95
96 Total test time: 10.023829 seconds.
97
98#### Select events
99We can select which events to use via -e option. Below are examples:
100
101 # Stat event cpu-cycles.
102 $simpleperf stat -e cpu-cycles -p 11904 --duration 10
103
104 # Stat event cache-references and cache-misses.
105 $simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
106
107When running the stat command, if the number of hardware events is larger than
108the number of hardware counters available in the PMU, the kernel shares hardware
109counters between events, so each event is only monitored for part of the total
110time. In the example below, there is a percentage at the end of each row,
111showing the percentage of the total time that each event was actually monitored.
112
113 # Stat using event cache-references, cache-references:u,....
114 $simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
115 Performance counter statistics:
116
117 4,331,018 cache-references # 4.861 M/sec (87%)
118 3,064,089 cache-references:u # 3.439 M/sec (87%)
119 1,364,959 cache-references:k # 1.532 M/sec (87%)
120 91,721 cache-misses # 102.918 K/sec (87%)
121 45,735 cache-misses:u # 51.327 K/sec (87%)
122 38,447 cache-misses:k # 43.131 K/sec (87%)
123 9,688,515 instructions # 10.561 M/sec (89%)
124
125 Total test time: 1.026802 seconds.
126
127In the example above, each event is monitored about 87% of the total time. But
128there is no guarantee that any pair of events are always monitored at the same
129time. If we want to have some events monitored at the same time, we can use
130--group option. Below is an example.
131
132 # Stat using event cache-references, cache-references:u,....
133 $simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
134 Performance counter statistics:
135
136 3,638,900 cache-references # 4.786 M/sec (74%)
137 65,171 cache-misses # 1.790953% miss rate (74%)
138 2,390,433 cache-references:u # 3.153 M/sec (74%)
139 32,280 cache-misses:u # 1.350383% miss rate (74%)
140 879,035 cache-references:k # 1.251 M/sec (68%)
141 30,303 cache-misses:k # 3.447303% miss rate (68%)
142 8,921,161 instructions # 10.070 M/sec (86%)
143
144 Total test time: 1.029843 seconds.
145
146#### Select target to monitor
147We can select which processes or threads to monitor via -p option or -t option.
148Monitoring a process is the same as monitoring all threads in the process.
149Simpleperf can also fork a child process to run the new command and then monitor
150the child process. Below are examples.
151
152 # Stat process 11904 and 11905.
153 $simpleperf stat -p 11904,11905 --duration 10
154
155 # Stat thread 11904 and 11905.
156 $simpleperf stat -t 11904,11905 --duration 10
157
158 # Start a child process running `ls`, and stat it.
159 $simpleperf stat ls
160
161#### Decide how long to monitor
162When monitoring existing threads, we can use --duration option to decide how long
163to monitor. When monitoring a child process running a new command, simpleperf
164monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
165at any time. Below are examples.
166
167 # Stat process 11904 for 10 seconds.
168 $simpleperf stat -p 11904 --duration 10
169
170 # Stat until the child process running `ls` finishes.
171 $simpleperf stat ls
172
173 # Stop monitoring using Ctrl-C.
174 $simpleperf stat -p 11904 --duration 10
175 ^C
176
177#### Decide the print interval
178When monitoring perf counters, we can also use --interval option to decide the print
179interval. Below are examples.
180
181 # Print stat for process 11904 every 300ms.
182 $simpleperf stat -p 11904 --duration 10 --interval 300
183
184 # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
185 # system wide profiling needs root privilege
186 $su 0 simpleperf stat -a --duration 10 --interval 300
187
188#### Display counters in systrace
189simpleperf can also work with systrace to dump counters in the collected trace.
190Below is an example to do a system wide stat
191
192 # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
193 $su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
194 # on host launch systrace to collect trace for 10 seconds
195 (HOST)$external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
196 # open the collected new.html in browser and perf counters will be shown up
197
198### simpleperf record
199simpleperf record is used to dump records of the profiled program. By passing
200options, we can select which events to use, which processes/threads to monitor,
201what frequency to dump records, how long to monitor, and where to store records.
202
203 # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
204 # using default sample frequency (4000 samples per second), writing records
205 # to perf.data.
206 $simpleperf record -p 7394 --duration 10
207 simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
208
209#### Select events
210In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
211As a hardware event, it is both accurate and efficient. We can also use other
212events via -e option. Below is an example.
213
214 # Record using event instructions.
215 $simpleperf record -e instructions -p 11904 --duration 10
216
217#### Select target to monitor
218The way to select target in record command is similar to that in stat command.
219Below are examples.
220
221 # Record process 11904 and 11905.
222 $simpleperf record -p 11904,11905 --duration 10
223
224 # Record thread 11904 and 11905.
225 $simpleperf record -t 11904,11905 --duration 10
226
227 # Record a child process running `ls`.
228 $simpleperf record ls
229
230#### Set the frequency to record
231We can set the frequency to dump records via the -f or -c options. For example,
232-f 4000 means dumping approximately 4000 records every second when the monitored
233thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
234or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
235records every second. Another way is using -c option. For example, -c 10000
236means dumping one record whenever 10000 events happen. Below are examples.
237
238 # Record with sample frequency 1000: sample 1000 times every second running.
239 $simpleperf record -f 1000 -p 11904,11905 --duration 10
240
241 # Record with sample period 100000: sample 1 time every 100000 events.
242 $simpleperf record -c 100000 -t 11904,11905 --duration 10
243
244#### Decide how long to monitor
245The way to decide how long to monitor in record command is similar to that in
246stat command. Below are examples.
247
248 # Record process 11904 for 10 seconds.
249 $simpleperf record -p 11904 --duration 10
250
251 # Record until the child process running `ls` finishes.
252 $simpleperf record ls
253
254 # Stop monitoring using Ctrl-C.
255 $simpleperf record -p 11904 --duration 10
256 ^C
257
258#### Set the path to store records
259By default, simpleperf stores records in perf.data in current directory. We can
260use -o option to set the path to store records. Below is an example.
261
262 # Write records to data/perf2.data.
263 $simpleperf record -p 11904 -o data/perf2.data --duration 10
264
265### simpleperf report
266simpleperf report is used to report based on perf.data generated by simpleperf
267record command. Report command groups records into different sample entries,
268sorts sample entries based on how many events each sample entry contains, and
269prints out each sample entry. By passing options, we can select where to find
270perf.data and executable binaries used by the monitored program, filter out
271uninteresting records, and decide how to group records.
272
273Below is an example. Records are grouped into 4 sample entries, each entry is
274a row. There are several columns, each column shows piece of information
275belonging to a sample entry. The first column is Overhead, which shows the
276percentage of events inside current sample entry in total events. As the
277perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
278time used in each function.
279
280 # Reports perf.data, using only records sampled in libsudo-game-jni.so,
281 # grouping records using thread name(comm), process id(pid), thread id(tid),
282 # function name(symbol), and showing sample count for each row.
283 $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
284 Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
285 Arch: arm64
286 Event: cpu-cycles (type 0, config 0)
287 Samples: 28235
288 Event count: 546356211
289
290 Overhead Sample Command Pid Tid Symbol
291 59.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int)
292 20.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int)
293 13.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int)
294 6.24% 1756 sudogame 7394 7394 @plt
295
296#### Set the path to read records
297By default, simpleperf reads perf.data in current directory. We can use -i
298option to select another file to read records.
299
300 $simpleperf report -i data/perf2.data
301
302#### Set the path to find executable binaries
303If reporting function symbols, simpleperf needs to read executable binaries
304used by the monitored processes to get symbol table and debug information. By
305default, the paths are the executable binaries used by monitored processes while
306recording. However, these binaries may not exist when reporting or not contain
307symbol table and debug information. So we can use --symfs to redirect the paths.
308Below is an example.
309
310 $simpleperf report
311 # In this case, when simpleperf wants to read executable binary /A/b,
312 # it reads file in /A/b.
313
314 $simpleperf report --symfs /debug_dir
315 # In this case, when simpleperf wants to read executable binary /A/b,
316 # it prefers file in /debug_dir/A/b to file in /A/b.
317
318#### Filter records
319When reporting, it happens that not all records are of interest. Simpleperf
320supports five filters to select records of interest. Below are examples.
321
322 # Report records in threads having name sudogame.
323 $simpleperf report --comms sudogame
324
325 # Report records in process 7394 or 7395
326 $simpleperf report --pids 7394,7395
327
328 # Report records in thread 7394 or 7395.
329 $simpleperf report --tids 7394,7395
330
331 # Report records in libsudo-game-jni.so.
332 $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
333
334 # Report records in function checkValid or canFindSolution_r.
335 $simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
336
337#### Decide how to group records into sample entries
338Simpleperf uses --sort option to decide how to group sample entries. Below are
339examples.
340
341 # Group records based on their process id: records having the same process
342 # id are in the same sample entry.
343 $simpleperf report --sort pid
344
345 # Group records based on their thread id and thread comm: records having
346 # the same thread id and thread name are in the same sample entry.
347 $simpleperf report --sort tid,comm
348
349 # Group records based on their binary and function: records in the same
350 # binary and function are in the same sample entry.
351 $simpleperf report --sort dso,symbol
352
353 # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
354 # thread, and belong to the same function in the same binary.
355 $simpleperf report
356
357## Features of simpleperf
358Simpleperf works similar to linux-tools-perf, but it has following improvements:
3591. Aware of Android environment. Simpleperf handles some Android specific
360situations when profiling. For example, it can profile embedded shared libraries
361in apk, read symbol table and debug information from .gnu_debugdata section. If
362possible, it gives suggestions when facing errors, like how to disable
363perf_harden to enable profiling.
3642. Support unwinding while recording. If we want to use -g option to record and
365report call-graph of a program, we need to dump user stack and register set in
366each record, and then unwind the stack to find the call chain. Simpleperf
367supports unwinding while recording, so it doesn’t need to store user stack in
368perf.data. So we can profile for a longer time with limited space on device.'
3693. Support scripts to make profiling on Android more convenient.
3704. Build in static binaries. Simpleperf is a static binary, so it doesn’t need
371supporting shared libraries to run. It means there is no limitation of Android
372version that simpleperf can run on, although some devices don’t support
373profiling.
374
375# Simpleperf tools in ndk
376Simpleperf tools in ndk contain three parts: simpleperf executable running on
377Android device, simpleperf executable running on host, and python scripts.
378
379## Simpleperf on device
380Simpleperf running on device is located at bin/android directory. It contains
381static binaries running on Android on different architectures. They can be used
382to profile processes running device, and generate perf.data.
383
384## Simpleperf on host
385Simpleperfs running on host are located at bin/darwin, bin/linux and
386bin/windows.They can be used to parse perf.data on host.
387
388## Scripts
389Scripts are used to make it convenient to profile and parse profiling results.
390app_profiler.py is used to profile an android application. It prepares
391profiling environments, downloads simpleperf on device, generates and pulls
392perf.data on host. It is configured by app_profiler.config.
393binary_cache_builder.py is used to pull native binaries from device to host.
394It is used by app_profiler.py.
395annotate.py is used to annotate source files using perf.data. It is configured
396by annotate.config.
397report.py reports perf.data in a GUI window.
398simpleperf_report_lib.py is used to enumerate samples in perf.data. Internally
399it uses libsimpleperf_report.so to parse perf.data. It can be used to translate
400samples in perf.data to other forms. One example using simpleperf_report_lib.py
401is report_sample.py.
402
403# Examples of using simpleperf tools
404This section shows how to use simpleperf tools to profile an Android
405application.
406
407## Prepare a debuggable Application
408The package name of the application is com.example.sudogame. It has both java
409code and c++ code. We need to run a copy of the app with
410android:debuggable=”true” in its AndroidManifest.xml <application> element,
411because we can’t use run-as for non-debuggable apps. The application should
412has been installed on device, and we can connect device via adb.
413
414## Profile using command line
415To record profiling data, we need to download simpleperf and native libraries
416with debug information on device, run simpleperf to generate profiling data
417file: perf.data, and run simpleperf to report perf.data. Below are the steps.
418
419### 1. Enable profiling
420
421 $adb shell setprop security.perf_harden 0
422
423### 2. Find the process running the app
424Run `ps` in the app’s context. On >=O devices, run `ps -e` instead.
425
426 $adb shell
427 angler:/ $ run-as com.example.sudogame
428 angler:/data/data/com.example.sudogame $ ps
429 u0_a93 10324 570 1030480 58104 SyS_epoll_ 00f41b7528 S com.example.sudogame
430 u0_a93 10447 10441 7716 1588 sigsuspend 753c515d34 S sh
431 u0_a93 10453 10447 9112 1644 0 7ba07ff664 R ps
432
433So process 10324 runs the app.
434
435### 3. Download simpleperf to the app’s data directory
436First we need to find out which architecture the app is using. There are many
437ways, here we just check the map of the process.
438
439 angler:/data/data/com.example.sudogame $cat /proc/10324/maps | grep boot.art
440 70f34000-7144e000 r--p 00000000 fd:00 1082 /system/framework/arm/boot.oat
441
442The file path shows it is arm. So we download simpleperf in arm directory on
443device.
444
445 $adb push bin/android/arm/simpleperf /data/local/tmp
446 $adb shell
447 angler:/ $ run-as com.example.sudogame
448 angler:/data/data/com.example.sudogame $ cp /data/local/tmp/simpleperf .
449
450### 4. Record perf.data
451
452 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --duration 30
453 simpleperf I 01-01 09:26:39 10598 10598 cmd_record.cpp:341] Samples recorded: 49471. Samples lost: 0.
454 angler:/data/data/com.example.sudogame $ls -lh perf.data
455 -rw-rw-rw- 1 u0_a93 u0_a93 2.6M 2017-01-01 09:26 perf.data
456
457Don’t forget to run the app while recording. Otherwise, we may get no samples
458because the process is always sleeping.
459
460### 5. Report perf.data
461There are different ways to report perf.data. Below shows some examples.
462
463Report samples in different threads.
464
465 angler:/data/data/com.example.sudogame $./simpleperf report --sort pid,tid,comm
466 Cmdline: /data/data/com.example.sudogame/simpleperf record -p 10324 --duration 30
467 Arch: arm64
468 Event: cpu-cycles (type 0, config 0)
469 Samples: 49471
470 Event count: 16700769019
471
472 Overhead Pid Tid Command
473 66.31% 10324 10324 xample.sudogame
474 30.97% 10324 10340 RenderThread
475 ...
476
477Report samples in different binaries in the main thread.
478
479 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --sort dso -n
480 ...
481 Overhead Sample Shared Object
482 37.71% 9970 /system/lib/libc.so
483 35.45% 9786 [kernel.kallsyms]
484 8.71% 3305 /system/lib/libart.so
485 6.44% 2405 /system/framework/arm/boot-framework.oat
486 5.64% 1480 /system/lib/libcutils.so
487 1.55% 426 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so
488 ...
489
490Report samples in different functions in libsudo-game-jni.so in the main thread.
491
492 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n
493 ...
494 Overhead Sample Symbol
495 8.94% 35 libsudo-game-jni.so[+1d54]
496 5.71% 25 libsudo-game-jni.so[+1dae]
497 5.70% 23 @plt
498 5.09% 22 libsudo-game-jni.so[+1d88]
499 4.54% 19 libsudo-game-jni.so[+1d82]
500 3.61% 14 libsudo-game-jni.so[+1f3c]
501 ...
502
503In the above result, most symbols are binary name[+virual_addr]. It is because
504libsudo-game-jni.so used on device has stripped .symbol section. We can
505download libsudo-game-jni.so having debug information on device. In android
506studio project, it locates at
507app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so.
508We have to download libsudo-game-jni.so to the same relative path as recorded
509in perf.data (otherwise, simpleperf can’t find it). In this case, it is
510/data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so.
511
512Report symbols using libraries with debug information.
513
514 $adb push app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so /data/local/tmp
515 $adb shell
516 angler:/ $ run-as com.example.sudogame
517 angler:/data/data/com.example.sudogame $ mkdir -p data/app/com.example.sudogame-1/lib/arm
518 angler:/data/data/com.example.sudogame $cp /data/local/tmp/libsudo-game-jni.so data/app/com.example.sudogame-1/lib/arm
519 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n --symfs .
520 ...
521 Overhead Sample Symbol
522 75.18% 317 checkValid(Board const&, int, int)
523 14.43% 60 canFindSolution_r(Board&, int, int)
524 5.70% 23 @plt
525 3.51% 20 randomBlock_r(Board&, int, int, int, int, int)
526 ...
527
528Report samples in one function
529
530 angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --symbols “checkValid(Board const&, int, int)” --sort vaddr_in_file -n --symfs .
531 ...
532 Overhead Sample VaddrInFile
533 11.89% 35 0x1d54
534 7.59% 25 0x1dae
535 6.77% 22 0x1d88
536 6.03% 19 0x1d82
537 ...
538
539### 6. Record and report call graph
540A call graph is a tree showing function call relations. Below is an example.
541
542 main() {
543 FunctionOne();
544 FunctionTwo();
545 }
546 FunctionOne() {
547 FunctionTwo();
548 FunctionThree();
549 }
550 callgraph:
551 main-> FunctionOne
552 | |
553 | |-> FunctionTwo
554 | |-> FunctionThree
555 |
556 |-> FunctionTwo
557
558#### Record dwarf based call graph
559To generate call graph, simpleperf needs to generate call chain for each record.
560Simpleperf requests kernel to dump user stack and user register set for each
561record, then it backtraces the user stack to find the function call chain. To
562parse the call chain, it needs support of dwarf call frame information, which
563usually resides in .eh_frame or .debug_frame section of the binary. So we need
564to use --symfs to point out where is libsudo-game-jni.so with debug information.
565
566 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 -g --symfs . --duration 30
567 simpleperf I 01-01 09:59:42 11021 11021 cmd_record.cpp:341] Samples recorded: 60700. Samples lost: 1240.
568
569Note that kernel can’t dump user stack >= 64K, so the dwarf based call graph
570doesn’t contain call chains consuming >= 64K stack. What’s more, because we
571need to dump stack in each record, it is likely to lost records. Usually, it
572doesn’t matter to lost some records.
573
574#### Record stack frame based call graph
575Another way to generate call graph is to rely on the kernel parsing the call
576chain for each record. To make it possible, kernel has to be able to identify
577the stack frame of each function call. This is not always possible, because
578compilers can optimize away stack frames, or use a stack frame style not
579recognized by the kernel. So how well it works depends (It works well on arm64,
580but not well on arm).
581
582 angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --call-graph fp --symfs . --duration 30
583 simpleperf I 01-01 10:03:58 11267 11267 cmd_record.cpp:341] Samples recorded: 56736. Samples lost: 0.
584
585#### Report call graph
586Report accumulated period. In the table below, the first column is “Children”,
587it is the cpu cycle percentage of a function and functions called by that
588function. The second column is “Self”, it is the cpu cycle percentage of just a
589function. For example, checkValid() itself takes 1.28% cpus, but it takes
59029.43% by running itself and calling other functions.
591
592 angler:/data/data/com.example.sudogame $./simpleperf report --children --symfs .
593 ...
594 Children Self Command Pid Tid Shared Object Symbol
595 31.94% 0.00% xample.sudogame 10324 10324 [kernel.kallsyms] [kernel.kallsyms][+ffffffc000204268]
596 31.10% 0.92% xample.sudogame 10324 10324 /system/lib/libc.so writev
597 29.43% 1.28% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so checkValid(Board const&, int, int)
598 28.43% 0.34% xample.sudogame 10324 10324 /system/lib/liblog.so __android_log_print
599 28.24% 0.00% xample.sudogame 10324 10324 /system/lib/libcutils.so libcutils.so[+107b7]
600 28.10% 0.27% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so canFindSolution_r(Board&, int, int)
601 ...
602
603Report call graph.
604
605 angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
606 angler:/data/data/com.example.sudogame $exit
607 angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
608 angler:/ $exit
609 $adb pull /data/local/tmp/report .
610 $cat report
611 ...
612 29.43% 1.28% xample.sudogame 10324 10324 /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so checkValid(Board const&, int, int)
613 |
614 -- checkValid(Board const&, int, int)
615 |
616 |--95.50%-- __android_log_print
617 | |--0.68%-- [hit in function]
618 | |
619 | |--51.84%-- __android_log_buf_write
620 | | |--2.07%-- [hit in function]
621 | | |
622 | | |--30.74%-- libcutils.so[+c69d]
623 ...
624
625Report call graph in callee mode. We can also show how a function is called by
626other functions.
627
628 angler:/data/data/com.example.sudogame $./simpleperf report -g callee --symfs . >report
629 $adb shell run-as com.example.sudogame cat report >report
630 $cat report
631 …
632 28.43% 0.34% xample.sudogame 10324 10324 /system/lib/liblog.so __android_log_print
633 |
634 -- __android_log_print
635 |
636 |--97.82%-- checkValid(Board const&, int, int)
637 | |--0.13%-- [hit in function]
638 | |
639 | |--94.89%-- canFindSolution_r(Board&, int, int)
640 | | |--0.01%-- [hit in function]
641 | | |
642 ...
643
644## Profile java code
645Simpleperf only supports profiling native instructions in binaries in ELF
646format. If the java code is executed by interpreter, or with jit cache, it
647can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation,
648it can compile java bytecode into native instructions with debug information.
649On devices with Android version <= M, we need root privilege to compile java
650bytecode with debug information. However, on devices with Android version >= N,
651we don't need root privilege to do so.
652
653### On Android N
654#### 1. Fully compile java code into native instructions.
655
656 $adb shell setprop debug.generate-debug-info true
657 $adb shell cmd package compile -f -m speed com.example.sudogame
658 // restart the app to take effect
659
660#### 2. Record perf.data
661
662 angler:/data/data/com.example.sudogame $./simpleperf record -p 11826 -g --symfs . --duration 30
663 simpleperf I 01-01 10:31:40 11859 11859 cmd_record.cpp:341] Samples recorded: 50576. Samples lost: 2139.
664
665#### 3. Report perf.data
666
667 angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
668 angler:/data/data/com.example.sudogame $exit
669 angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
670 angler:/ $exit
671 $adb pull /data/local/tmp/report .
672 $cat report
673 ...
674 21.14% 0.00% xample.sudogame 11826 11826 /data/app/com.example.sudogame-1/oat/arm/base.odex boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
675 |
676 -- boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
677 |
678 --99.99%-- void com.example.sudogame.GameView.startNewGame()
679 |--0.01%-- [hit in function]
680 |
681 |--99.87%-- void com.example.sudogame.GameModel.reInit()
682 | |--0.01%-- [hit in function]
683 | |
684 | |--89.65%-- boolean com.example.sudogame.GameModel.canFindSolution(int[][])
685 | | |
686 | | |--99.95%-- Java_com_example_sudogame_GameModel_canFindSolution
687 | | | |
688 | | | |--99.49%-- canFindSolution(Board&)
689 | | | | |--0.01%-- [hit in function]
690 | | | | |
691 | | | | |--99.97%-- canFindSolution_r(Board&, int, int)
692 | | | | | canFindSolution_r(Board&, int, int)
693 ...
694
695### On Android M
696On M devices, We need root privilege to force Android fully compiling java code
697into native instructions in ELF binaries with debug information. We also need
698root privilege to read compiled native binaries (because installd writes them
699to a directory whose uid/gid is system:install). So profiling java code can
700only be done on rooted devices.
701
702 $adb root
703 $adb shell setprop dalvik.vm.dex2oat-flags -g
704
705 # Reinstall the app.
706 $adb install -r app-debug.apk
707
708 # Change to the app’s data directory.
709 $ adb root && adb shell
710 device# cd `run-as com.example.sudogame pwd`
711
712 # Record as root as simpleperf needs to read the generated native binary.
713 device#./simpleperf record -p 25636 -g --symfs . -f 1000 --duration 30
714 simpleperf I 01-02 07:18:20 27182 27182 cmd_record.cpp:323] Samples recorded: 23552. Samples lost: 39.
715
716### On Android L
717On L devices, we also need root privilege to compile the app with debug info
718and access the native binaries.
719
720 $adb root
721 $adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
722
723 # Reinstall the app.
724 $adb install -r app-debug.apk
725
726## Profile using scripts
727Although using command line is flexible, it can be too complex. So we have
728python scripts to help running commands.
729
730### Record using app_profiler.py
731app_profiler.py is used to profile an Android application. It sets up profiling
732environment, downloads simpleperf and native libraries with debug information,
733runs simpleperf to generate perf.data, and pulls perf.data and binaries from
734device to host.
735It is configured by app_profiler.config. Below is an example.
736
737app_profiler.config:
738
739 app_package_name = “com.example.sudogame”
740 android_studio_project_dir = “/AndroidStudioProjects/SudoGame” # absolute path of the project
741 ...
742 record_options = "-e cpu-cycles:u -f 4000 -g --dump-symbols --duration 30"
743 ...
744
745run app_profiler.py:
746
747 $python app_profiler.py
748 ...
749 INFO:root:profiling is finished.
750
751It pulls generated perf.data on host, and collects binaries from device in
752binary_cache.
753
754### Report using report.py
755
756 $python report.py -g
757
758It generates a GUI interface to report data.
759
760### Process samples using simpleperf_report_lib.py
761simpleperf_report_lib.py provides an interface reading samples from perf.data.
762An example is report_sample.py.
763
764### Show flamegraph
765
766 $python report_sample.py >out.perf
767 $stackcollapse-perf.pl out.perf >out.folded
768 $./flamegraph.pl out.folded >a.svg
769
770### Visualize using pprof
771pprof is a tool for visualization and analysis of profiling data. It can
772be got from https://github.com/google/pprof. pprof_proto_generator.py can
773generate profiling data in a format acceptable by pprof.
774
775 $python pprof_proto_generator.py
776 $pprof -pdf pprof.profile
777
778### Annotate source code
779annotate.py reads perf.data and binaries in binary_cache. Then it knows which
780source file:line each sample hits. So it can annotate source code. annotate.py
781is configured by annotate.config. Below is an example.
782
783annotate.config:
784
785 ...
786 source_dirs = [“/AndroidStudio/SudoGame”] # It is a directory containing source code.
787 ...
788
789run annotate.py:
790
791 $python annotate.py
792
793It generates annotated_files directory.
794annotated_files/summary file contains summary information for each source file.
795An example is as below.
796
797 /AndroidStudioProjects/SudoGame/app/src/main/jni/sudo-game-jni.cpp: accumulated_period: 25.587937%, period: 1.250961%
798 function (checkValid(Board const&, int, int)): line 99, accumulated_period: 23.564356%, period: 0.908457%
799 function (canFindSolution_r(Board&, int, int)): line 135, accumulated_period: 22.260125%, period: 0.142359%
800 function (canFindSolution(Board&)): line 166, accumulated_period: 22.233101%, period: 0.000000%
801 function (Java_com_example_sudogame_GameModel_canFindSolution): line 470, accumulated_period: 21.983184%, period: 0.000000%
802 function (Java_com_example_sudogame_GameModel_initRandomBoard): line 430, accumulated_period: 2.226896%, period: 0.000000%
803
804 line 27: accumulated_period: 0.011729%, period: 0.000000%
805 line 32: accumulated_period: 0.004362%, period: 0.000000%
806 line 33: accumulated_period: 0.004427%, period: 0.000000%
807 line 36: accumulated_period: 0.003303%, period: 0.000000%
808 line 39: accumulated_period: 0.010367%, period: 0.004123%
809 line 41: accumulated_period: 0.162219%, period: 0.000000%
810
811annotated_files/ also contains annotated source files which are found by
812annotate.py. For example, part of checkValid() function in libsudo-game-jni.cpp
813is annotated as below.
814
815 /* [func] acc_p: 23.564356%, p: 0.908457% */static bool checkValid(const Board& board, int curR, int curC) {
816 /* acc_p: 0.037933%, p: 0.037933% */ int digit = board.digits[curR][curC];
817 /* acc_p: 0.162355%, p: 0.162355% */ for (int r = 0; r < BOARD_ROWS; ++r) {
818 /* acc_p: 0.020880%, p: 0.020880% */ if (r == curR) {
819 /* acc_p: 0.034691%, p: 0.034691% */ continue;
820 }
821 /* acc_p: 0.176490%, p: 0.176490% */ if (board.digits[r][curC] == digit) {
822 /* acc_p: 14.957673%, p: 0.059022% */ LOGI("conflict (%d, %d) (%d, %d)", curR, curC, r, curC);
823 /* acc_p: 0.016296%, p: 0.016296% */ return false;
824 }
825 }
826