• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..--

nonlinux_support/22-Nov-2023-9220

runtest/22-Nov-2023-1,101889

scripts/22-Nov-2023-3,5222,721

testdata/22-Nov-2023-2217

.gitignoreD22-Nov-20235 11

Android.mkD22-Nov-202313.5 KiB417323

IOEventLoop.cppD22-Nov-20234.6 KiB185147

IOEventLoop.hD22-Nov-20232.8 KiB8735

IOEventLoop_test.cppD22-Nov-20234.7 KiB208176

InplaceSamplerClient.cppD22-Nov-20236.1 KiB192160

InplaceSamplerClient.hD22-Nov-20231.8 KiB5633

NOTICED22-Nov-202310.4 KiB191158

README.mdD22-Nov-202336.6 KiB826662

SampleComparator.hD22-Nov-20233.7 KiB11475

SampleDisplayer.hD22-Nov-20238.7 KiB264208

UnixSocket.cppD22-Nov-20237 KiB234190

UnixSocket.hD22-Nov-20237.9 KiB240150

UnixSocket_test.cppD22-Nov-20237.1 KiB208173

build_id.hD22-Nov-20232.5 KiB9357

callchain.hD22-Nov-20234.9 KiB159128

cmd_dumprecord.cppD22-Nov-20237.6 KiB230194

cmd_dumprecord_test.cppD22-Nov-20231,017 3312

cmd_help.cppD22-Nov-20232.7 KiB8858

cmd_kmem.cppD22-Nov-202325.7 KiB717634

cmd_kmem_test.cppD22-Nov-20234.6 KiB148114

cmd_list.cppD22-Nov-20233 KiB9969

cmd_list_test.cppD22-Nov-20231.1 KiB4219

cmd_record.cppD22-Nov-202337.5 KiB1,113989

cmd_record_test.cppD22-Nov-202313.5 KiB398337

cmd_report.cppD22-Nov-202329 KiB825727

cmd_report_sample.cppD22-Nov-202319 KiB568495

cmd_report_sample_test.cppD22-Nov-20231.9 KiB5733

cmd_report_test.cppD22-Nov-202316.6 KiB507438

cmd_stat.cppD22-Nov-202321.3 KiB633547

cmd_stat_test.cppD22-Nov-20235 KiB169128

command.cppD22-Nov-20232.9 KiB9866

command.hD22-Nov-20232 KiB7040

command_test.cppD22-Nov-20231.5 KiB4624

cpu_hotplug_test.cppD22-Nov-202315.4 KiB499417

dso.cppD22-Nov-202314.5 KiB491416

dso.hD22-Nov-20235 KiB183121

dwarf_unwind.cppD22-Nov-20236.7 KiB146119

dwarf_unwind.hD22-Nov-20231 KiB3411

environment.cppD22-Nov-202314.1 KiB478408

environment.hD22-Nov-20232.4 KiB9458

environment_test.cppD22-Nov-2023988 278

event_attr.cppD22-Nov-20239.4 KiB243185

event_attr.hD22-Nov-20231.5 KiB4520

event_fd.cppD22-Nov-20239.2 KiB281223

event_fd.hD22-Nov-20234.2 KiB13269

event_selection_set.cppD22-Nov-202326.7 KiB866764

event_selection_set.hD22-Nov-20236.6 KiB191121

event_type.cppD22-Nov-20235.4 KiB166137

event_type.hD22-Nov-20232.2 KiB7843

event_type_table.hD22-Nov-20237.7 KiB6861

generate_event_type_table.pyD22-Nov-20234.6 KiB12793

get_test_data.hD22-Nov-20235.3 KiB10935

gtest_main.cppD22-Nov-20237 KiB237188

inplace_sampler_lib.cppD22-Nov-202310.5 KiB363309

inplace_sampler_lib.hD22-Nov-20232.1 KiB8512

main.cppD22-Nov-20232.3 KiB7550

perf_clock.cppD22-Nov-20234.9 KiB170131

perf_clock.hD22-Nov-20231.1 KiB326

perf_event.hD22-Nov-2023843 278

perf_regs.cppD22-Nov-20236.7 KiB235194

perf_regs.hD22-Nov-20232.7 KiB10172

read_apk.cppD22-Nov-20236.6 KiB193149

read_apk.hD22-Nov-20233.5 KiB10355

read_apk_test.cppD22-Nov-20232.6 KiB6640

read_elf.cppD22-Nov-202317 KiB494430

read_elf.hD22-Nov-20233.1 KiB8348

read_elf_test.cppD22-Nov-20235.8 KiB156115

record.cppD22-Nov-202333.4 KiB1,025927

record.hD22-Nov-202316.5 KiB573390

record_equal_test.hD22-Nov-20233.1 KiB8160

record_file.hD22-Nov-20236.4 KiB188117

record_file_format.hD22-Nov-20232.4 KiB10549

record_file_reader.cppD22-Nov-202314 KiB466418

record_file_test.cppD22-Nov-20235.6 KiB160104

record_file_writer.cppD22-Nov-202311.8 KiB414346

record_test.cppD22-Nov-20235 KiB145106

report_lib_interface.cppD22-Nov-202311.3 KiB393320

report_sample.protoD22-Nov-20231.6 KiB5948

sample_tree.hD22-Nov-202311.2 KiB325247

sample_tree_test.cppD22-Nov-20238.4 KiB240197

test_util.hD22-Nov-20231.7 KiB4219

thread_tree.cppD22-Nov-202310.9 KiB329271

thread_tree.hD22-Nov-20235.2 KiB153102

tracing.cppD22-Nov-202313.5 KiB427371

tracing.hD22-Nov-20232.2 KiB9059

utils.cppD22-Nov-20238.6 KiB340289

utils.hD22-Nov-20234.3 KiB173118

utils_test.cppD22-Nov-20232.4 KiB7652

workload.cppD22-Nov-20235.2 KiB172137

workload.hD22-Nov-20232.2 KiB7748

workload_test.cppD22-Nov-20232.4 KiB8356

README.md

1# Simpleperf Introduction
2## What is simpleperf
3Simpleperf is a native profiling tool for Android. Its command-line interface
4supports broadly the same options as the linux-tools perf, but also supports
5various Android-specific improvements.
6
7Simpleperf is part of the Android Open Source Project. The source code is at
8https://android.googlesource.com/platform/system/extras/+/master/simpleperf/.
9The latest document is at
10https://android.googlesource.com/platform/system/extras/+show/master/simpleperf/README.md.
11Bugs and feature requests can be submitted at
12http://github.com/android-ndk/ndk/issues.
13
14## How simpleperf works
15Modern CPUs have a hardware component called the performance monitoring unit
16(PMU). The PMU has several hardware counters, counting events like how many cpu
17cycles have happened, how many instructions have executed, or how many cache
18misses have happened.
19
20The Linux kernel wraps these hardware counters into hardware perf events. In
21addition, the Linux kernel also provides hardware independent software events
22and tracepoint events. The Linux kernel exposes all this to userspace via the
23perf_event_open system call, which simpleperf uses.
24
25Simpleperf has three main functions: stat, record and report.
26
27The stat command gives a summary of how many events have happened in the
28profiled processes in a time period. Here’s how it works:
291. Given user options, simpleperf enables profiling by making a system call to
30linux kernel.
312. Linux kernel enables counters while scheduling on the profiled processes.
323. After profiling, simpleperf reads counters from linux kernel, and reports a
33counter summary.
34
35The record command records samples of the profiled process in a time period.
36Here’s how it works:
371. Given user options, simpleperf enables profiling by making a system call to
38linux kernel.
392. Simpleperf creates mapped buffers between simpleperf and linux kernel.
403. Linux kernel enable counters while scheduling on the profiled processes.
414. Each time a given number of events happen, linux kernel dumps a sample to a
42mapped buffer.
435. Simpleperf reads samples from the mapped buffers and generates perf.data.
44
45The report command reads a "perf.data" file and any shared libraries used by
46the profiled processes, and outputs a report showing where the time was spent.
47
48## Main simpleperf commands
49Simpleperf supports several subcommands, including list, stat, record, report.
50Each subcommand supports different options. This section only covers the most
51important subcommands and options. To see all subcommands and options,
52use --help.
53
54    # List all subcommands.
55    $simpleperf --help
56
57    # Print help message for record subcommand.
58    $simpleperf record --help
59
60### simpleperf list
61simpleperf list is used to list all events available on the device. Different
62devices may support different events because of differences in hardware and
63kernel.
64
65    $simpleperf list
66    List of hw-cache events:
67      branch-loads
68      ...
69    List of hardware events:
70      cpu-cycles
71      instructions
72      ...
73    List of software events:
74      cpu-clock
75      task-clock
76      ...
77
78### simpleperf stat
79simpleperf stat is used to get a raw event counter information of the profiled program
80or system-wide. By passing options, we can select which events to use, which
81processes/threads to monitor, how long to monitor and the print interval.
82Below is an example.
83
84    # Stat using default events (cpu-cycles,instructions,...), and monitor
85    # process 7394 for 10 seconds.
86    $simpleperf stat -p 7394 --duration 10
87    Performance counter statistics:
88
89     1,320,496,145  cpu-cycles         # 0.131736 GHz                     (100%)
90       510,426,028  instructions       # 2.587047 cycles per instruction  (100%)
91         4,692,338  branch-misses      # 468.118 K/sec                    (100%)
92    886.008130(ms)  task-clock         # 0.088390 cpus used               (100%)
93               753  context-switches   # 75.121 /sec                      (100%)
94               870  page-faults        # 86.793 /sec                      (100%)
95
96    Total test time: 10.023829 seconds.
97
98#### Select events
99We can select which events to use via -e option. Below are examples:
100
101    # Stat event cpu-cycles.
102    $simpleperf stat -e cpu-cycles -p 11904 --duration 10
103
104    # Stat event cache-references and cache-misses.
105    $simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
106
107When running the stat command, if the number of hardware events is larger than
108the number of hardware counters available in the PMU, the kernel shares hardware
109counters between events, so each event is only monitored for part of the total
110time. In the example below, there is a percentage at the end of each row,
111showing the percentage of the total time that each event was actually monitored.
112
113    # Stat using event cache-references, cache-references:u,....
114    $simpleperf stat -p 7394 -e     cache-references,cache-references:u,cache-references:k,cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
115    Performance counter statistics:
116
117    4,331,018  cache-references     # 4.861 M/sec    (87%)
118    3,064,089  cache-references:u   # 3.439 M/sec    (87%)
119    1,364,959  cache-references:k   # 1.532 M/sec    (87%)
120       91,721  cache-misses         # 102.918 K/sec  (87%)
121       45,735  cache-misses:u       # 51.327 K/sec   (87%)
122       38,447  cache-misses:k       # 43.131 K/sec   (87%)
123    9,688,515  instructions         # 10.561 M/sec   (89%)
124
125    Total test time: 1.026802 seconds.
126
127In the example above, each event is monitored about 87% of the total time. But
128there is no guarantee that any pair of events are always monitored at the same
129time. If we want to have some events monitored at the same time, we can use
130--group option. Below is an example.
131
132    # Stat using event cache-references, cache-references:u,....
133    $simpleperf stat -p 7394 --group cache-references,cache-misses --group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k -e instructions --duration 1
134    Performance counter statistics:
135
136    3,638,900  cache-references     # 4.786 M/sec          (74%)
137       65,171  cache-misses         # 1.790953% miss rate  (74%)
138    2,390,433  cache-references:u   # 3.153 M/sec          (74%)
139       32,280  cache-misses:u       # 1.350383% miss rate  (74%)
140      879,035  cache-references:k   # 1.251 M/sec          (68%)
141       30,303  cache-misses:k       # 3.447303% miss rate  (68%)
142    8,921,161  instructions         # 10.070 M/sec         (86%)
143
144    Total test time: 1.029843 seconds.
145
146#### Select target to monitor
147We can select which processes or threads to monitor via -p option or -t option.
148Monitoring a process is the same as monitoring all threads in the process.
149Simpleperf can also fork a child process to run the new command and then monitor
150the child process. Below are examples.
151
152    # Stat process 11904 and 11905.
153    $simpleperf stat -p 11904,11905 --duration 10
154
155    # Stat thread 11904 and 11905.
156    $simpleperf stat -t 11904,11905 --duration 10
157
158    # Start a child process running `ls`, and stat it.
159    $simpleperf stat ls
160
161#### Decide how long to monitor
162When monitoring existing threads, we can use --duration option to decide how long
163to monitor. When monitoring a child process running a new command, simpleperf
164monitors until the child process ends. In this case, we can use Ctrl-C to stop monitoring
165at any time. Below are examples.
166
167    # Stat process 11904 for 10 seconds.
168    $simpleperf stat -p 11904 --duration 10
169
170    # Stat until the child process running `ls` finishes.
171    $simpleperf stat ls
172
173    # Stop monitoring using Ctrl-C.
174    $simpleperf stat -p 11904 --duration 10
175    ^C
176
177#### Decide the print interval
178When monitoring perf counters, we can also use --interval option to decide the print
179interval. Below are examples.
180
181    # Print stat for process 11904 every 300ms.
182    $simpleperf stat -p 11904 --duration 10 --interval 300
183
184    # Print system wide stat at interval of 300ms for 10 seconds (rooted device only).
185    # system wide profiling needs root privilege
186    $su 0 simpleperf stat -a --duration 10 --interval 300
187
188#### Display counters in systrace
189simpleperf can also work with systrace to dump counters in the collected trace.
190Below is an example to do a system wide stat
191
192    # capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15 seconds
193    $su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
194    # on host launch systrace to collect trace for 10 seconds
195    (HOST)$external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
196    # open the collected new.html in browser and perf counters will be shown up
197
198### simpleperf record
199simpleperf record is used to dump records of the profiled program. By passing
200options, we can select which events to use, which processes/threads to monitor,
201what frequency to dump records, how long to monitor, and where to store records.
202
203    # Record on process 7394 for 10 seconds, using default event (cpu-cycles),
204    # using default sample frequency (4000 samples per second), writing records
205    # to perf.data.
206    $simpleperf record -p 7394 --duration 10
207    simpleperf I 07-11 21:44:11 17522 17522 cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
208
209#### Select events
210In most cases, the cpu-cycles event is used to evaluate consumed cpu time.
211As a hardware event, it is both accurate and efficient. We can also use other
212events via -e option. Below is an example.
213
214    # Record using event instructions.
215    $simpleperf record -e instructions -p 11904 --duration 10
216
217#### Select target to monitor
218The way to select target in record command is similar to that in stat command.
219Below are examples.
220
221    # Record process 11904 and 11905.
222    $simpleperf record -p 11904,11905 --duration 10
223
224    # Record thread 11904 and 11905.
225    $simpleperf record -t 11904,11905 --duration 10
226
227    # Record a child process running `ls`.
228    $simpleperf record ls
229
230#### Set the frequency to record
231We can set the frequency to dump records via the -f or -c options. For example,
232-f 4000 means dumping approximately 4000 records every second when the monitored
233thread runs. If a monitored thread runs 0.2s in one second (it can be preempted
234or blocked in other times), simpleperf dumps about 4000 * 0.2 / 1.0 = 800
235records every second. Another way is using -c option. For example, -c 10000
236means dumping one record whenever 10000 events happen. Below are examples.
237
238    # Record with sample frequency 1000: sample 1000 times every second running.
239    $simpleperf record -f 1000 -p 11904,11905 --duration 10
240
241    # Record with sample period 100000: sample 1 time every 100000 events.
242    $simpleperf record -c 100000 -t 11904,11905 --duration 10
243
244#### Decide how long to monitor
245The way to decide how long to monitor in record command is similar to that in
246stat command. Below are examples.
247
248    # Record process 11904 for 10 seconds.
249    $simpleperf record -p 11904 --duration 10
250
251    # Record until the child process running `ls` finishes.
252    $simpleperf record ls
253
254    # Stop monitoring using Ctrl-C.
255    $simpleperf record -p 11904 --duration 10
256    ^C
257
258#### Set the path to store records
259By default, simpleperf stores records in perf.data in current directory. We can
260use -o option to set the path to store records. Below is an example.
261
262    # Write records to data/perf2.data.
263    $simpleperf record -p 11904 -o data/perf2.data --duration 10
264
265### simpleperf report
266simpleperf report is used to report based on perf.data generated by simpleperf
267record command. Report command groups records into different sample entries,
268sorts sample entries based on how many events each sample entry contains, and
269prints out each sample entry. By passing options, we can select where to find
270perf.data and executable binaries used by the monitored program, filter out
271uninteresting records, and decide how to group records.
272
273Below is an example. Records are grouped into 4 sample entries, each entry is
274a row. There are several columns, each column shows piece of information
275belonging to a sample entry. The first column is Overhead, which shows the
276percentage of events inside current sample entry in total events. As the
277perf event is cpu-cycles, the overhead can be seen as the percentage of cpu
278time used in each function.
279
280    # Reports perf.data, using only records sampled in libsudo-game-jni.so,
281    # grouping records using thread name(comm), process id(pid), thread id(tid),
282    # function name(symbol), and showing sample count for each row.
283    $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so --sort comm,pid,tid,symbol -n
284    Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
285    Arch: arm64
286    Event: cpu-cycles (type 0, config 0)
287    Samples: 28235
288    Event count: 546356211
289
290    Overhead  Sample  Command    Pid   Tid   Symbol
291    59.25%    16680   sudogame  7394  7394  checkValid(Board const&, int, int)
292    20.42%    5620    sudogame  7394  7394  canFindSolution_r(Board&, int, int)
293    13.82%    4088    sudogame  7394  7394  randomBlock_r(Board&, int, int, int, int, int)
294    6.24%     1756    sudogame  7394  7394  @plt
295
296#### Set the path to read records
297By default, simpleperf reads perf.data in current directory. We can use -i
298option to select another file to read records.
299
300    $simpleperf report -i data/perf2.data
301
302#### Set the path to find executable binaries
303If reporting function symbols, simpleperf needs to read executable binaries
304used by the monitored processes to get symbol table and debug information. By
305default, the paths are the executable binaries used by monitored processes while
306recording. However, these binaries may not exist when reporting or not contain
307symbol table and debug information. So we can use --symfs to redirect the paths.
308Below is an example.
309
310    $simpleperf report
311    # In this case, when simpleperf wants to read executable binary /A/b,
312    # it reads file in /A/b.
313
314    $simpleperf report --symfs /debug_dir
315    # In this case, when simpleperf wants to read executable binary /A/b,
316    # it prefers file in /debug_dir/A/b to file in /A/b.
317
318#### Filter records
319When reporting, it happens that not all records are of interest. Simpleperf
320supports five filters to select records of interest. Below are examples.
321
322    # Report records in threads having name sudogame.
323    $simpleperf report --comms sudogame
324
325    # Report records in process 7394 or 7395
326    $simpleperf report --pids 7394,7395
327
328    # Report records in thread 7394 or 7395.
329    $simpleperf report --tids 7394,7395
330
331    # Report records in libsudo-game-jni.so.
332    $simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
333
334    # Report records in function checkValid or canFindSolution_r.
335    $simpleperf report --symbols "checkValid(Board const&, int, int);canFindSolution_r(Board&, int, int)"
336
337#### Decide how to group records into sample entries
338Simpleperf uses --sort option to decide how to group sample entries. Below are
339examples.
340
341    # Group records based on their process id: records having the same process
342    # id are in the same sample entry.
343    $simpleperf report --sort pid
344
345    # Group records based on their thread id and thread comm: records having
346    # the same thread id and thread name are in the same sample entry.
347    $simpleperf report --sort tid,comm
348
349    # Group records based on their binary and function: records in the same
350    # binary and function are in the same sample entry.
351    $simpleperf report --sort dso,symbol
352
353    # Default option: --sort comm,pid,tid,dso,symbol. Group records in the same
354    # thread, and belong to the same function in the same binary.
355    $simpleperf report
356
357## Features of simpleperf
358Simpleperf works similar to linux-tools-perf, but it has following improvements:
3591. Aware of Android environment. Simpleperf handles some Android specific
360situations when profiling. For example, it can profile embedded shared libraries
361in apk, read symbol table and debug information from .gnu_debugdata section. If
362possible, it gives suggestions when facing errors, like how to disable
363perf_harden to enable profiling.
3642. Support unwinding while recording. If we want to use -g option to record and
365report call-graph of a program, we need to dump user stack and register set in
366each record, and then unwind the stack to find the call chain. Simpleperf
367supports unwinding while recording, so it doesn’t need to store user stack in
368perf.data. So we can profile for a longer time with limited space on device.'
3693. Support scripts to make profiling on Android more convenient.
3704. Build in static binaries. Simpleperf is a static binary, so it doesn’t need
371supporting shared libraries to run. It means there is no limitation of Android
372version that simpleperf can run on, although some devices don’t support
373profiling.
374
375# Simpleperf tools in ndk
376Simpleperf tools in ndk contain three parts: simpleperf executable running on
377Android device, simpleperf executable running on host, and python scripts.
378
379## Simpleperf on device
380Simpleperf running on device is located at bin/android directory. It contains
381static binaries running on Android on different architectures. They can be used
382to profile processes running device, and generate perf.data.
383
384## Simpleperf on host
385Simpleperfs running on host are located at bin/darwin, bin/linux and
386bin/windows.They can be used to parse perf.data on host.
387
388## Scripts
389Scripts are used to make it convenient to profile and parse profiling results.
390app_profiler.py is used to profile an android application. It prepares
391profiling environments, downloads simpleperf on device, generates and pulls
392perf.data on host. It is configured by app_profiler.config.
393binary_cache_builder.py is used to pull native binaries from device to host.
394It is used by app_profiler.py.
395annotate.py is used to annotate source files using perf.data. It is configured
396by annotate.config.
397report.py reports perf.data in a GUI window.
398simpleperf_report_lib.py is used to enumerate samples in perf.data. Internally
399it uses libsimpleperf_report.so to parse perf.data. It can be used to translate
400samples in perf.data to other forms. One example using simpleperf_report_lib.py
401is report_sample.py.
402
403# Examples of using simpleperf tools
404This section shows how to use simpleperf tools to profile an Android
405application.
406
407## Prepare a debuggable Application
408The package name of the application is com.example.sudogame. It has both java
409code and c++ code. We need to run a copy of the app with
410android:debuggable=”true” in its AndroidManifest.xml <application> element,
411because we can’t use run-as for non-debuggable apps. The application should
412has been installed on device, and we can connect device via adb.
413
414## Profile using command line
415To record profiling data, we need to download simpleperf and native libraries
416with debug information on device, run simpleperf to generate profiling data
417file: perf.data, and run simpleperf to report perf.data. Below are the steps.
418
419### 1. Enable profiling
420
421    $adb shell setprop security.perf_harden 0
422
423### 2. Find the process running the app
424Run `ps` in the app’s context. On >=O devices, run `ps -e` instead.
425
426    $adb shell
427    angler:/ $ run-as com.example.sudogame
428    angler:/data/data/com.example.sudogame $ ps
429    u0_a93    10324 570   1030480 58104 SyS_epoll_ 00f41b7528 S com.example.sudogame
430    u0_a93    10447 10441 7716   1588  sigsuspend 753c515d34 S sh
431    u0_a93    10453 10447 9112   1644           0 7ba07ff664 R ps
432
433So process 10324 runs the app.
434
435### 3. Download simpleperf to the app’s data directory
436First we need to find out which architecture the app is using. There are many
437ways, here we just check the map of the process.
438
439    angler:/data/data/com.example.sudogame $cat /proc/10324/maps | grep boot.art
440    70f34000-7144e000 r--p 00000000 fd:00 1082  /system/framework/arm/boot.oat
441
442The file path shows it is arm. So we download simpleperf in arm directory on
443device.
444
445    $adb push bin/android/arm/simpleperf /data/local/tmp
446    $adb shell
447    angler:/ $ run-as com.example.sudogame
448    angler:/data/data/com.example.sudogame $ cp /data/local/tmp/simpleperf .
449
450### 4. Record perf.data
451
452    angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --duration 30
453    simpleperf I 01-01 09:26:39 10598 10598 cmd_record.cpp:341] Samples recorded: 49471. Samples lost: 0.
454    angler:/data/data/com.example.sudogame $ls -lh perf.data
455    -rw-rw-rw- 1 u0_a93 u0_a93 2.6M 2017-01-01 09:26 perf.data
456
457Don’t forget to run the app while recording. Otherwise, we may get no samples
458because the process is always sleeping.
459
460### 5. Report perf.data
461There are different ways to report perf.data. Below shows some examples.
462
463Report samples in different threads.
464
465    angler:/data/data/com.example.sudogame $./simpleperf report --sort pid,tid,comm
466    Cmdline: /data/data/com.example.sudogame/simpleperf record -p 10324 --duration 30
467    Arch: arm64
468    Event: cpu-cycles (type 0, config 0)
469    Samples: 49471
470    Event count: 16700769019
471
472    Overhead  Pid    Tid    Command
473    66.31%    10324  10324  xample.sudogame
474    30.97%    10324  10340  RenderThread
475    ...
476
477Report samples in different binaries in the main thread.
478
479    angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --sort dso -n
480    ...
481    Overhead  Sample  Shared Object
482    37.71%    9970    /system/lib/libc.so
483    35.45%    9786    [kernel.kallsyms]
484    8.71%     3305    /system/lib/libart.so
485    6.44%     2405    /system/framework/arm/boot-framework.oat
486    5.64%     1480    /system/lib/libcutils.so
487    1.55%     426     /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so
488    ...
489
490Report samples in different functions in libsudo-game-jni.so in the main thread.
491
492    angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n
493    ...
494    Overhead  Sample  Symbol
495    8.94%     35      libsudo-game-jni.so[+1d54]
496    5.71%     25      libsudo-game-jni.so[+1dae]
497    5.70%     23      @plt
498    5.09%     22      libsudo-game-jni.so[+1d88]
499    4.54%     19      libsudo-game-jni.so[+1d82]
500    3.61%     14      libsudo-game-jni.so[+1f3c]
501    ...
502
503In the above result, most symbols are binary name[+virual_addr]. It is because
504libsudo-game-jni.so used on device has stripped .symbol section. We can
505download libsudo-game-jni.so having debug information on device. In android
506studio project, it locates at
507app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so.
508We have to download libsudo-game-jni.so to the same relative path as recorded
509in perf.data (otherwise, simpleperf can’t find it). In this case, it is
510/data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so.
511
512Report symbols using libraries with debug information.
513
514    $adb push app/build/intermediates/binaries/debug/arm/obj/armeabi-v7a/libsudo-game-jni.so /data/local/tmp
515    $adb shell
516    angler:/ $ run-as com.example.sudogame
517    angler:/data/data/com.example.sudogame $ mkdir -p data/app/com.example.sudogame-1/lib/arm
518    angler:/data/data/com.example.sudogame $cp /data/local/tmp/libsudo-game-jni.so data/app/com.example.sudogame-1/lib/arm
519    angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --sort symbol -n --symfs .
520    ...
521    Overhead  Sample  Symbol
522    75.18%    317     checkValid(Board const&, int, int)
523    14.43%    60      canFindSolution_r(Board&, int, int)
524    5.70%     23      @plt
525    3.51%     20      randomBlock_r(Board&, int, int, int, int, int)
526    ...
527
528Report samples in one function
529
530    angler:/data/data/com.example.sudogame $./simpleperf report --tids 10324 --dsos  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so --symbols “checkValid(Board const&, int, int)” --sort vaddr_in_file -n --symfs .
531    ...
532    Overhead  Sample  VaddrInFile
533    11.89%    35      0x1d54
534    7.59%     25      0x1dae
535    6.77%     22      0x1d88
536    6.03%     19      0x1d82
537    ...
538
539### 6. Record and report call graph
540A call graph is a tree showing function call relations. Below is an example.
541
542    main() {
543        FunctionOne();
544        FunctionTwo();
545    }
546    FunctionOne() {
547        FunctionTwo();
548        FunctionThree();
549    }
550    callgraph:
551        main-> FunctionOne
552           |    |
553           |    |-> FunctionTwo
554           |    |-> FunctionThree
555           |
556           |-> FunctionTwo
557
558#### Record dwarf based call graph
559To generate call graph, simpleperf needs to generate call chain for each record.
560Simpleperf requests kernel to dump user stack and user register set for each
561record, then it backtraces the user stack to find the function call chain. To
562parse the call chain, it needs support of dwarf call frame information, which
563usually resides in .eh_frame or .debug_frame section of the binary.  So we need
564to use --symfs to point out where is libsudo-game-jni.so with debug information.
565
566    angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 -g --symfs . --duration 30
567    simpleperf I 01-01 09:59:42 11021 11021 cmd_record.cpp:341] Samples recorded: 60700. Samples lost: 1240.
568
569Note that kernel can’t dump user stack >= 64K, so the dwarf based call graph
570doesn’t contain call chains consuming >= 64K stack. What’s more, because we
571need to dump stack in each record, it is likely to lost records. Usually, it
572doesn’t matter to lost some records.
573
574#### Record stack frame based call graph
575Another way to generate call graph is to rely on the kernel parsing the call
576chain for each record. To make it possible, kernel has to be able to identify
577the stack frame of each function call. This is not always possible, because
578compilers can optimize away stack frames, or use a stack frame style not
579recognized by the kernel. So how well it works depends (It works well on arm64,
580but not well on arm).
581
582    angler:/data/data/com.example.sudogame $./simpleperf record -p 10324 --call-graph fp --symfs . --duration 30
583    simpleperf I 01-01 10:03:58 11267 11267 cmd_record.cpp:341] Samples recorded: 56736. Samples lost: 0.
584
585#### Report call graph
586Report accumulated period. In the table below, the first column is “Children”,
587it is the cpu cycle percentage of a function and functions called by that
588function. The second column is “Self”, it is the cpu cycle percentage of just a
589function. For example, checkValid() itself takes 1.28% cpus, but it takes
59029.43% by running itself and calling other functions.
591
592    angler:/data/data/com.example.sudogame $./simpleperf report --children --symfs .
593    ...
594    Children  Self   Command          Pid    Tid    Shared Object                                                 Symbol
595    31.94%    0.00%  xample.sudogame  10324  10324  [kernel.kallsyms]                                             [kernel.kallsyms][+ffffffc000204268]
596    31.10%    0.92%  xample.sudogame  10324  10324  /system/lib/libc.so                                           writev
597    29.43%    1.28%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  checkValid(Board const&, int, int)
598    28.43%    0.34%  xample.sudogame  10324  10324  /system/lib/liblog.so                                         __android_log_print
599    28.24%    0.00%  xample.sudogame  10324  10324  /system/lib/libcutils.so                                      libcutils.so[+107b7]
600    28.10%    0.27%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  canFindSolution_r(Board&, int, int)
601    ...
602
603Report call graph.
604
605    angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
606    angler:/data/data/com.example.sudogame $exit
607    angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
608    angler:/ $exit
609    $adb pull /data/local/tmp/report .
610    $cat report
611    ...
612    29.43%    1.28%  xample.sudogame  10324  10324  /data/app/com.example.sudogame-1/lib/arm/libsudo-game-jni.so  checkValid(Board const&, int, int)
613           |
614           -- checkValid(Board const&, int, int)
615              |
616              |--95.50%-- __android_log_print
617              |    |--0.68%-- [hit in function]
618              |    |
619              |    |--51.84%-- __android_log_buf_write
620              |    |    |--2.07%-- [hit in function]
621              |    |    |
622              |    |    |--30.74%-- libcutils.so[+c69d]
623    ...
624
625Report call graph in callee mode. We can also show how a function is called by
626other functions.
627
628    angler:/data/data/com.example.sudogame $./simpleperf report -g callee --symfs . >report
629    $adb shell run-as com.example.sudogame cat report >report
630    $cat report
631632    28.43%    0.34%  xample.sudogame  10324  10324  /system/lib/liblog.so                                         __android_log_print
633           |
634           -- __android_log_print
635              |
636              |--97.82%-- checkValid(Board const&, int, int)
637              |    |--0.13%-- [hit in function]
638              |    |
639              |    |--94.89%-- canFindSolution_r(Board&, int, int)
640              |    |    |--0.01%-- [hit in function]
641              |    |    |
642    ...
643
644## Profile java code
645Simpleperf only supports profiling native instructions in binaries in ELF
646format. If the java code is executed by interpreter, or with jit cache, it
647can’t be profiled by simpleperf. As Android supports Ahead-of-time compilation,
648it can compile java bytecode into native instructions with debug information.
649On devices with Android version <= M, we need root privilege to compile java
650bytecode with debug information. However, on devices with Android version >= N,
651we don't need root privilege to do so.
652
653### On Android N
654#### 1. Fully compile java code into native instructions.
655
656    $adb shell setprop debug.generate-debug-info true
657    $adb shell cmd package compile -f -m speed com.example.sudogame
658    // restart the app to take effect
659
660#### 2. Record perf.data
661
662    angler:/data/data/com.example.sudogame $./simpleperf record -p 11826 -g --symfs . --duration 30
663    simpleperf I 01-01 10:31:40 11859 11859 cmd_record.cpp:341] Samples recorded: 50576. Samples lost: 2139.
664
665#### 3. Report perf.data
666
667    angler:/data/data/com.example.sudogame $./simpleperf report -g --symfs . >report
668    angler:/data/data/com.example.sudogame $exit
669    angler:/ $cp /data/data/com.example.sudogame/report /data/local/tmp
670    angler:/ $exit
671    $adb pull /data/local/tmp/report .
672    $cat report
673    ...
674    21.14%    0.00%  xample.sudogame  11826  11826  /data/app/com.example.sudogame-1/oat/arm/base.odex            boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
675           |
676           -- boolean com.example.sudogame.MainActivity.onOptionsItemSelected(android.view.MenuItem)
677              |
678               --99.99%-- void com.example.sudogame.GameView.startNewGame()
679                   |--0.01%-- [hit in function]
680                   |
681                   |--99.87%-- void com.example.sudogame.GameModel.reInit()
682                   |    |--0.01%-- [hit in function]
683                   |    |
684                   |    |--89.65%-- boolean com.example.sudogame.GameModel.canFindSolution(int[][])
685                   |    |    |
686                   |    |    |--99.95%-- Java_com_example_sudogame_GameModel_canFindSolution
687                   |    |    |    |
688                   |    |    |    |--99.49%-- canFindSolution(Board&)
689                   |    |    |    |    |--0.01%-- [hit in function]
690                   |    |    |    |    |
691                   |    |    |    |    |--99.97%-- canFindSolution_r(Board&, int, int)
692                   |    |    |    |    |           canFindSolution_r(Board&, int, int)
693    ...
694
695### On Android M
696On M devices, We need root privilege to force Android fully compiling java code
697into native instructions in ELF binaries with debug information. We also need
698root privilege to read compiled native binaries (because installd writes them
699to a directory whose uid/gid is system:install). So profiling java code can
700only be done on rooted devices.
701
702    $adb root
703    $adb shell setprop dalvik.vm.dex2oat-flags -g
704
705    # Reinstall the app.
706    $adb install -r app-debug.apk
707
708    # Change to the app’s data directory.
709    $ adb root && adb shell
710    device# cd `run-as com.example.sudogame pwd`
711
712    # Record as root as simpleperf needs to read the generated native binary.
713    device#./simpleperf record -p 25636 -g --symfs . -f 1000 --duration 30
714    simpleperf I 01-02 07:18:20 27182 27182 cmd_record.cpp:323] Samples recorded: 23552. Samples lost: 39.
715
716### On Android L
717On L devices, we also need root privilege to compile the app with debug info
718and access the native binaries.
719
720    $adb root
721    $adb shell setprop dalvik.vm.dex2oat-flags --include-debug-symbols
722
723    # Reinstall the app.
724    $adb install -r app-debug.apk
725
726## Profile using scripts
727Although using command line is flexible, it can be too complex. So we have
728python scripts to help running commands.
729
730### Record using app_profiler.py
731app_profiler.py is used to profile an Android application. It sets up profiling
732environment, downloads simpleperf and native libraries with debug information,
733runs simpleperf to generate perf.data, and pulls perf.data and binaries from
734device to host.
735It is configured by app_profiler.config. Below is an example.
736
737app_profiler.config:
738
739    app_package_name = “com.example.sudogame740    android_studio_project_dir = “/AndroidStudioProjects/SudoGame”  # absolute path of the project
741    ...
742    record_options = "-e cpu-cycles:u -f 4000 -g --dump-symbols --duration 30"
743    ...
744
745run app_profiler.py:
746
747    $python app_profiler.py
748    ...
749    INFO:root:profiling is finished.
750
751It pulls generated perf.data on host, and collects binaries from device in
752binary_cache.
753
754### Report using report.py
755
756    $python report.py -g
757
758It generates a GUI interface to report data.
759
760### Process samples using simpleperf_report_lib.py
761simpleperf_report_lib.py provides an interface reading samples from perf.data.
762An example is report_sample.py.
763
764### Show flamegraph
765
766    $python report_sample.py >out.perf
767    $stackcollapse-perf.pl out.perf >out.folded
768    $./flamegraph.pl out.folded >a.svg
769
770### Visualize using pprof
771pprof is a tool for visualization and analysis of profiling data. It can
772be got from https://github.com/google/pprof. pprof_proto_generator.py can
773generate profiling data in a format acceptable by pprof.
774
775    $python pprof_proto_generator.py
776    $pprof -pdf pprof.profile
777
778### Annotate source code
779annotate.py reads perf.data and binaries in binary_cache. Then it knows which
780source file:line each sample hits. So it can annotate source code. annotate.py
781is configured by annotate.config. Below is an example.
782
783annotate.config:
784
785    ...
786    source_dirs = [“/AndroidStudio/SudoGame”]  # It is a directory containing source code.
787    ...
788
789run annotate.py:
790
791    $python annotate.py
792
793It generates annotated_files directory.
794annotated_files/summary file contains summary information for each source file.
795An example is as below.
796
797    /AndroidStudioProjects/SudoGame/app/src/main/jni/sudo-game-jni.cpp: accumulated_period: 25.587937%, period: 1.250961%
798      function (checkValid(Board const&, int, int)): line 99, accumulated_period: 23.564356%, period: 0.908457%
799      function (canFindSolution_r(Board&, int, int)): line 135, accumulated_period: 22.260125%, period: 0.142359%
800      function (canFindSolution(Board&)): line 166, accumulated_period: 22.233101%, period: 0.000000%
801      function (Java_com_example_sudogame_GameModel_canFindSolution): line 470, accumulated_period: 21.983184%, period: 0.000000%
802      function (Java_com_example_sudogame_GameModel_initRandomBoard): line 430, accumulated_period: 2.226896%, period: 0.000000%
803
804      line 27: accumulated_period: 0.011729%, period: 0.000000%
805      line 32: accumulated_period: 0.004362%, period: 0.000000%
806      line 33: accumulated_period: 0.004427%, period: 0.000000%
807      line 36: accumulated_period: 0.003303%, period: 0.000000%
808      line 39: accumulated_period: 0.010367%, period: 0.004123%
809      line 41: accumulated_period: 0.162219%, period: 0.000000%
810
811annotated_files/ also contains annotated source files which are found by
812annotate.py. For example, part of checkValid() function in libsudo-game-jni.cpp
813is annotated as below.
814
815    /* [func] acc_p: 23.564356%, p: 0.908457% */static bool checkValid(const Board& board, int curR, int curC) {
816    /* acc_p: 0.037933%, p: 0.037933%         */    int digit = board.digits[curR][curC];
817    /* acc_p: 0.162355%, p: 0.162355%         */    for (int r = 0; r < BOARD_ROWS; ++r) {
818    /* acc_p: 0.020880%, p: 0.020880%         */        if (r == curR) {
819    /* acc_p: 0.034691%, p: 0.034691%         */            continue;
820                                                        }
821    /* acc_p: 0.176490%, p: 0.176490%         */        if (board.digits[r][curC] == digit) {
822    /* acc_p: 14.957673%, p: 0.059022%        */            LOGI("conflict (%d, %d) (%d, %d)", curR, curC, r, curC);
823    /* acc_p: 0.016296%, p: 0.016296%         */            return false;
824                                                        }
825                                                    }
826