1# Native heap profiler
2
3NOTE: **heapprofd requires Android 10 or higher**
4
5Heapprofd is a tool that tracks native heap allocations & deallocations of an
6Android process within a given time period. The resulting profile can be used to
7attribute memory usage to particular call-stacks, supporting a mix of both
8native and java code. The tool can be used by Android platform and app
9developers to investigate memory issues.
10
11On debug Android builds, you can profile all apps and most system services.
12On "user" builds, you can only use it on apps with the debuggable or
13profileable manifest flag.
14
15## Quickstart
16
17See the [Memory Guide](/docs/case-studies/memory.md#heapprofd) for getting
18started with heapprofd.
19
20## UI
21
22Dumps from heapprofd are shown as flamegraphs in the UI after clicking on the
23diamond. Each diamond corresponds to a snapshot of the allocations and
24callstacks collected at that point in time.
25
26![heapprofd snapshots in the UI tracks](/docs/images/profile-diamond.png)
27
28![heapprofd flamegraph](/docs/images/native-flamegraph.png)
29
30## SQL
31
32Information about callstacks is written to the following tables:
33
34* [`stack_profile_mapping`](/docs/analysis/sql-tables.autogen#stack_profile_mapping)
35* [`stack_profile_frame`](/docs/analysis/sql-tables.autogen#stack_profile_frame)
36* [`stack_profile_callsite`](/docs/analysis/sql-tables.autogen#stack_profile_callsite)
37
38The allocations themselves are written to
39[`heap_profile_allocation`](/docs/analysis/sql-tables.autogen#heap_profile_allocation).
40
41Offline symbolization data is stored in
42[`stack_profile_symbol`](/docs/analysis/sql-tables.autogen#stack_profile_symbol).
43
44See [Example Queries](#heapprofd-example-queries) for example SQL queries.
45
46## Recording
47
48Heapprofd can be configured and started in three ways.
49
50#### Manual configuration
51
52This requires manually setting the
53[HeapprofdConfig](/docs/reference/trace-config-proto.autogen#HeapprofdConfig)
54section of the trace config. The only benefit of doing so is that in this way
55heap profiling can be enabled alongside any other tracing data sources.
56
57#### Using the tools/heap_profile script (recommended)
58
59You can use the `tools/heap_profile` script. If you are having trouble
60make sure you are using the
61[latest version](
62https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile).
63
64You can target processes either by name (`-n com.example.myapp`) or by PID
65(`-p 1234`). In the first case, the heap profile will be initiated on both on
66already-running processes that match the package name and new processes launched
67after the profiling session is started.
68For the full arguments list see the
69[heap_profile cmdline reference page](/docs/reference/heap_profile-cli).
70
71#### Using the Recording page of Perfetto UI
72
73You can also use the [Perfetto UI](https://ui.perfetto.dev/#!/record?p=memory)
74to record heapprofd profiles. Tick "Heap profiling" in the trace configuration,
75enter the processes you want to target, click "Add Device" to pair your phone,
76and record profiles straight from your browser. This is also possible on
77Windows.
78
79## Viewing the data
80
81The resulting profile proto contains four views on the data
82
83* **space**: how many bytes were allocated but not freed at this callstack the
84  moment the dump was created.
85* **alloc\_space**: how many bytes were allocated (including ones freed at the
86  moment of the dump) at this callstack
87* **objects**: how many allocations without matching frees were done at this
88  callstack.
89* **alloc\_objects**: how many allocations (including ones with matching frees)
90  were done at this callstack.
91
92_(Googlers: You can also open the gzipped protos using http://pprof/)_
93
94TIP: you might want to put `libart.so` as a "Hide regex" when profiling apps.
95
96You can use the [Perfetto UI](https://ui.perfetto.dev) to visualize heap dumps.
97Upload the `raw-trace` file in your output directory. You will see all heap
98dumps as diamonds on the timeline, click any of them to get a flamegraph.
99
100Alternatively [Speedscope](https://speedscope.app) can be used to visualize
101the gzipped protos, but will only show the space view.
102
103TIP: Click Left Heavy on the top left for a good visualization.
104
105## Sampling interval
106
107Heapprofd samples heap allocations by hooking calls to malloc/free and C++'s
108operator new/delete. Given a sampling interval of n bytes, one allocation is
109sampled, on average, every n bytes allocated. This allows to reduce the
110performance impact on the target process. The default sampling rate
111is 4096 bytes.
112
113The easiest way to reason about this is to imagine the memory allocations as a
114stream of one byte allocations. From this stream, every byte has a 1/n
115probability of being selected as a sample, and the corresponding callstack
116gets attributed the complete n bytes. For more accuracy, allocations larger than
117the sampling interval bypass the sampling logic and are recorded with their true
118size.
119
120## Startup profiling
121
122When specifying a target process name (as opposite to the PID), new processes
123matching that name are profiled from their startup. The resulting profile will
124contain all allocations done between the start of the process and the end
125of the profiling session.
126
127On Android, Java apps are usually not exec()-ed from scratch, but fork()-ed from
128the [zygote], which then specializes into the desired app. If the app's name
129matches a name specified in the profiling session, profiling will be enabled as
130part of the zygote specialization. The resulting profile contains all
131allocations done between that point in zygote specialization and the end of the
132profiling session. Some allocations done early in the specialization process are
133not accounted for.
134
135At the trace proto level, the resulting [ProfilePacket] will have the
136`from_startup` field set to true in the corresponding `ProcessHeapSamples`
137message. This is not surfaced in the converted pprof compatible proto.
138
139[ProfilePacket]: /docs/reference/trace-packet-proto.autogen#ProfilePacket
140[zygote]: https://developer.android.com/topic/performance/memory-overview#SharingRAM
141
142## Runtime profiling
143
144When a profiling session is started, all matching processes (by name or PID)
145are enumerated and are signalled to request profiling. Profiling isn't actually
146enabled until a few hundred milliseconds after the next allocation that is
147done by the application. If the application is idle when profiling is
148requested, and then does a burst of allocations, these may be missed.
149
150The resulting profile will contain all allocations done between when profiling
151is enabled, and the end of the profiling session.
152
153The resulting [ProfilePacket] will have `from_startup` set to false in the
154corresponding `ProcessHeapSamples` message. This does not get surfaced in the
155converted pprof compatible proto.
156
157## Concurrent profiling sessions
158
159If multiple sessions name the same target process (either by name or PID),
160only the first relevant session will profile the process. The other sessions
161will report that the process had already been profiled when converting to
162the pprof compatible proto.
163
164If you see this message but do not expect any other sessions, run
165
166```shell
167adb shell killall perfetto
168```
169
170to stop any concurrent sessions that may be running.
171
172The resulting [ProfilePacket] will have `rejected_concurrent` set  to true in
173otherwise empty corresponding `ProcessHeapSamples` message. This does not get
174surfaced in the converted pprof compatible proto.
175
176## {#heapprofd-targets} Target processes
177
178Depending on the build of Android that heapprofd is run on, some processes
179are not be eligible to be profiled.
180
181On _user_ (i.e. production, non-rootable) builds, only Java applications with
182either the profileable or the debuggable manifest flag set can be profiled.
183Profiling requests for non-profileable/debuggable processes will result in an
184empty profile.
185
186On userdebug builds, all processes except for a small set of critical
187services can be profiled (to find the set of disallowed targets, look for
188`never_profile_heap` in [heapprofd.te](
189https://cs.android.com/android/platform/superproject/+/master:system/sepolicy/private/heapprofd.te?q=never_profile_heap).
190This restriction can be lifted by disabling SELinux by running
191`adb shell su root setenforce 0` or by passing `--disable-selinux` to the
192`heap_profile` script.
193
194<center>
195
196|                         | userdebug setenforce 0 | userdebug | user |
197|-------------------------|:----------------------:|:---------:|:----:|
198| critical native service |            Y           |     N     |  N   |
199| native service          |            Y           |     Y     |  N   |
200| app                     |            Y           |     Y     |  N   |
201| profileable app         |            Y           |     Y     |  Y   |
202| debuggable app          |            Y           |     Y     |  Y   |
203
204</center>
205
206To mark an app as profileable, put `<profileable android:shell="true"/>` into
207the `<application>` section of the app manifest.
208
209```xml
210<manifest ...>
211    <application>
212        <profileable android:shell="true"/>
213        ...
214    </application>
215</manifest>
216```
217
218## DEDUPED frames
219
220If the name of a Java method includes `[DEDUPED]`, this means that multiple
221methods share the same code. ART only stores the name of a single one in its
222metadata, which is displayed here. This is not necessarily the one that was
223called.
224
225## Triggering heap snapshots on demand
226
227Heap snapshot are recorded into the trace either at regular time intervals, if
228using the `continuous_dump_config` field, or at the end of the session.
229
230You can also trigger a snapshot of all currently profiled processes by running
231`adb shell killall -USR1 heapprofd`. This can be useful in lab tests for
232recording the current memory usage of the target in a specific state.
233
234This dump will show up in addition to the dump at the end of the profile that is
235always produced. You can create multiple of these dumps, and they will be
236enumerated in the output directory.
237
238## Symbolization
239
240NOTE: Symbolization is currently only available on Linux and MacOS.
241
242### Set up llvm-symbolizer
243
244You only need to do this once.
245
246To use symbolization, your system must have llvm-symbolizer installed and
247accessible from `$PATH` as `llvm-symbolizer`. On Debian, you can install it
248using `sudo apt install llvm-9`.
249This will create `/usr/bin/llvm-symbolizer-9`. Symlink that to somewhere in
250your `$PATH` as `llvm-symbolizer`.
251
252For instance, `ln -s /usr/bin/llvm-symbolizer-9 ~/bin/llvm-symbolizer`, and
253add `~/bin` to your path (or run the commands below with `PATH=~/bin:$PATH`
254prefixed).
255
256### Symbolize your profile
257
258If the profiled binary or libraries do not have symbol names, you can
259symbolize profiles offline. Even if they do, you might want to symbolize in
260order to get inlined function and line number information. All tools
261(traceconv, trace_processor_shell, the heap_profile script) support specifying
262the `PERFETTO_BINARY_PATH` as an environment variable.
263
264```
265PERFETTO_BINARY_PATH=somedir tools/heap_profile --name ${NAME}
266```
267
268You can persist symbols for a trace by running
269`PERFETTO_BINARY_PATH=somedir tools/traceconv symbolize raw-trace > symbols`.
270You can then concatenate the symbols to the trace (
271`cat raw-trace symbols > symbolized-trace`) and the symbols will part of
272`symbolized-trace`. The `tools/heap_profile` script will also generate this
273file in your output directory, if `PERFETTO_BINARY_PATH` is used.
274
275The symbol file is the first with matching Build ID in the following order:
276
2771. absolute path of library file relative to binary path.
2782. absolute path of library file relative to binary path, but with base.apk!
279    removed from filename.
2803. basename of library file relative to binary path.
2814. basename of library file relative to binary path, but with base.apk!
282    removed from filename.
2835. in the subdirectory .build-id: the first two hex digits of the build-id
284    as subdirectory, then the rest of the hex digits, with ".debug" appended.
285    See
286    https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID
287
288For example, "/system/lib/base.apk!foo.so" with build id abcd1234,
289is looked for at:
290
2911. $PERFETTO_BINARY_PATH/system/lib/base.apk!foo.so
2922. $PERFETTO_BINARY_PATH/system/lib/foo.so
2933. $PERFETTO_BINARY_PATH/base.apk!foo.so
2944. $PERFETTO_BINARY_PATH/foo.so
2955. $PERFETTO_BINARY_PATH/.build-id/ab/cd1234.debug
296
297Alternatively, you can set the `PERFETTO_SYMBOLIZER_MODE` environment variable
298to `index`, and the symbolizer will recursively search the given directory for
299an ELF file with the given build id. This way, you will not have to worry
300about correct filenames.
301
302## Deobfuscation
303
304If your profile contains obfuscated Java methods (like `fsd.a`), you can
305provide a deobfuscation map to turn them back into human readable.
306To do so, use the `PERFETTO_PROGUARD_MAP` environment variable, using the
307format `packagename=filename[:packagename=filename...]`, e.g.
308`PERFETTO_PROGUARD_MAP=com.example.pkg1=foo.txt:com.example.pkg2=bar.txt`.
309All tools
310(traceconv, trace_processor_shell, the heap_profile script) support specifying
311the `PERFETTO_PROGUARD_MAP` as an environment variable.
312
313You can get a deobfuscation map for your trace using
314`tools/traceconv deobfuscate`. Then concatenate the resulting file to your
315trace to get a deobfuscated version of it.
316
317```
318PERFETTO_PROGUARD_MAP=com.example.pkg tools/traceconv deobfuscate ${TRACE} > deobfuscation_map
319cat ${TRACE} deobfuscation_map > deobfuscated_trace
320```
321
322## Troubleshooting
323
324### Buffer overrun
325
326If the rate of allocations is too high for heapprofd to keep up, the profiling
327session will end early due to a buffer overrun. If the buffer overrun is
328caused by a transient spike in allocations, increasing the shared memory buffer
329size (passing `--shmem-size` to `tools/heap_profile`) can resolve the issue.
330Otherwise the sampling interval can be increased (at the expense of lower
331accuracy in the resulting profile) by passing `--interval=16000` or higher.
332
333### Profile is empty
334
335Check whether your target process is eligible to be profiled by consulting
336[Target processes](#heapprofd-targets) above.
337
338Also check the [Known Issues](#known-issues).
339
340### Implausible callstacks
341
342If you see a callstack that seems to impossible from looking at the code, make
343sure no [DEDUPED frames](#deduped-frames) are involved.
344
345Also, if your code is linked using _Identical Code Folding_
346(ICF), i.e. passing `-Wl,--icf=...` to the linker, most trivial functions, often
347constructors and destructors, can be aliased to binary-equivalent operators
348of completely unrelated classes.
349
350### Symbolization: Could not find library
351
352When symbolizing a profile, you might come across messages like this:
353
354```bash
355Could not find /data/app/invalid.app-wFgo3GRaod02wSvPZQ==/lib/arm64/somelib.so
356(Build ID: 44b7138abd5957b8d0a56ce86216d478).
357```
358
359Check whether your library (in this example somelib.so) exists in
360`PERFETTO_BINARY_PATH`. Then compare the Build ID to the one in your
361symbol file, which you can get by running
362`readelf -n /path/in/binary/path/somelib.so`. If it does not match, the
363symbolized file has a different version than the one on device, and cannot
364be used for symbolization.
365If it does, try moving somelib.so to the root of `PERFETTO_BINARY_PATH` and
366try again.
367
368### Only one frame shown
369If you only see a single frame for functions in a specific library, make sure
370that the library has unwind information. We need one of
371
372* `.gnu_debugdata`
373* `.eh_frame` (+ preferably `.eh_frame_hdr`)
374* `.debug_frame`.
375
376Frame-pointer unwinding is *not supported*.
377
378To check if an ELF file has any of those, run
379
380```console
381$ readelf -S file.so | grep "gnu_debugdata\|eh_frame\|debug_frame"
382  [12] .eh_frame_hdr     PROGBITS         000000000000c2b0  0000c2b0
383  [13] .eh_frame         PROGBITS         0000000000011000  00011000
384  [24] .gnu_debugdata    PROGBITS         0000000000000000  000f7292
385```
386
387If this does not show one or more of the sections, change your build system
388to not strip them.
389
390## (non-Android) Linux support
391
392NOTE: Do not use this for production purposes.
393
394You can use a standalone library to profile memory allocations on Linux.
395First [build Perfetto](/docs/contributing/build-instructions.md). You only need
396to do this once.
397
398```
399tools/build_all_configs.py
400ninja -C out/linux_clang_release
401```
402
403Then, run traced
404
405```
406out/linux_clang_release/traced
407```
408
409Start the profile (e.g. targeting trace_processor_shell)
410
411```
412tools/heap_profile -n trace_processor_shell --print-config  | \
413out/linux_clang_release/perfetto \
414  -c - --txt \
415  -o ~/heapprofd-trace
416```
417
418Finally, run your target (e.g. trace_processor_shell) with LD_PRELOAD
419
420```
421LD_PRELOAD=out/linux_clang_release/libheapprofd_glibc_preload.so out/linux_clang_release/trace_processor_shell <trace>
422```
423
424Then, Ctrl-C the Perfetto invocation and upload ~/heapprofd-trace to the
425[Perfetto UI](https://ui.perfetto.dev).
426
427## Known Issues
428
429### {#known-issues-android11} Android 11
430
431* 32-bit programs cannot be targeted on 64-bit devices.
432* Setting `sampling_interval_bytes` to 0 crashes the target process.
433  This is an invalid config that should be rejected instead.
434* For startup profiles, some frame names might be missing. This will be
435  resolved in Android 12.
436* `Failed to send control socket byte.` is displayed in logcat at the end of
437  every profile. This is benign.
438* The object count may be incorrect in `dump_at_max` profiles.
439
440### {#known-issues-android10} Android 10
441* Function names in libraries with load bias might be incorrect. Use
442  [offline symbolization](#symbolization) to resolve this issue.
443* For startup profiles, some frame names might be missing. This will be
444  resolved in Android 12.
445* 32-bit programs cannot be targeted on 64-bit devices.
446* x86 / x86_64 platforms are not supported. This includes the Android
447_Cuttlefish_.
448  emulator.
449* On ARM32, the bottom-most frame is always `ERROR 2`. This is harmless and
450  the callstacks are still complete.
451* If heapprofd is run standalone (by running `heapprofd` in a root shell, rather
452  than through init), `/dev/socket/heapprofd` get assigned an incorrect SELinux
453  domain. You will not be able to profile any processes unless you disable
454  SELinux enforcement.
455  Run `restorecon /dev/socket/heapprofd` in a root shell to resolve.
456* Using `vfork(2)` or `clone(2)` with `CLONE_VM` and allocating / freeing
457  memory in the child process will prematurely end the profile.
458  `java.lang.Runtime.exec` does this, calling it will prematurely end
459  the profile. Note that this is in violation of the POSIX standard.
460* Setting `sampling_interval_bytes` to 0 crashes the target process.
461  This is an invalid config that should be rejected instead.
462* `Failed to send control socket byte.` is displayed in logcat at the end of
463  every profile. This is benign.
464* The object count may be incorrect in `dump_at_max` profiles.
465
466## Heapprofd vs malloc_info() vs RSS
467
468When using heapprofd and interpreting results, it is important to know the
469precise meaning of the different memory metrics that can be obtained from the
470operating system.
471
472**heapprofd** gives you the number of bytes the target program
473requested from the default C/C++ allocator. If you are profiling a Java app from
474startup, allocations that happen early in the application's initialization will
475not be visible to heapprofd. Native services that do not fork from the Zygote
476are not affected by this.
477
478**malloc\_info** is a libc function that gives you information about the
479allocator. This can be triggered on userdebug builds by using
480`am dumpheap -m <PID> /data/local/tmp/heap.txt`. This will in general be more
481than the memory seen by heapprofd, depending on the allocator not all memory
482is immediately freed. In particular, jemalloc retains some freed memory in
483thread caches.
484
485**Heap RSS** is the amount of memory requested from the operating system by the
486allocator. This is larger than the previous two numbers because memory can only
487be obtained in page size chunks, and fragmentation causes some of that memory to
488be wasted. This can be obtained by running `adb shell dumpsys meminfo <PID>` and
489looking at the "Private Dirty" column.
490RSS can also end up being smaller than the other two if the device kernel uses
491memory compression (ZRAM, enabled by default on recent versions of android) and
492the memory of the process get swapped out onto ZRAM.
493
494|                     | heapprofd         | malloc\_info | RSS |
495|---------------------|:-----------------:|:------------:|:---:|
496| from native startup |          x        |      x       |  x  |
497| after zygote init   |          x        |      x       |  x  |
498| before zygote init  |                   |      x       |  x  |
499| thread caches       |                   |      x       |  x  |
500| fragmentation       |                   |              |  x  |
501
502If you observe high RSS or malloc\_info metrics but heapprofd does not match,
503you might be hitting some patological fragmentation problem in the allocator.
504
505## Convert to pprof
506
507You can use [traceconv](/docs/quickstart/traceconv.md) to convert the heap dumps
508in a trace into the [pprof](https://github.com/google/pprof) format. These can
509then be viewed using the pprof CLI or a UI (e.g. Speedscope, or Google-internal
510pprof/).
511
512```bash
513tools/traceconv profile /tmp/profile
514```
515
516This will create a directory in `/tmp/` containing the heap dumps. Run:
517
518```bash
519gzip /tmp/heap_profile-XXXXXX/*.pb
520```
521
522to get gzipped protos, which tools handling pprof profile protos expect.
523
524## {#heapprofd-example-queries} Example SQL Queries
525
526We can get the callstacks that allocated using an SQL Query in the
527Trace Processor. For each frame, we get one row for the number of allocated
528bytes, where `count` and `size` is positive, and, if any of them were already
529freed, another line with negative `count` and `size`. The sum of those gets us
530the `space` view.
531
532```sql
533select a.callsite_id, a.ts, a.upid, f.name, f.rel_pc, m.build_id, m.name as mapping_name,
534        sum(a.size) as space_size, sum(a.count) as space_count
535      from heap_profile_allocation a join
536           stack_profile_callsite c ON (a.callsite_id = c.id) join
537           stack_profile_frame f ON (c.frame_id = f.id) join
538           stack_profile_mapping m ON (f.mapping = m.id)
539      group by 1, 2, 3, 4, 5, 6, 7 order by space_size desc;
540```
541
542| callsite_id | ts | upid | name | rel_pc | build_id | mapping_name | space_size | space_count |
543|-------------|----|------|-------|-----------|------|--------|----------|------|
544|6660|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |106496|4|
545|192 |5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
546|1421|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
547|1537|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
548|8843|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26424 |1|
549|8618|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |24576 |4|
550|3750|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |12288 |1|
551|2820|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
552|3788|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
553
554We can see all the functions are "malloc" and "realloc", which is not terribly
555informative. Usually we are interested in the _cumulative_ bytes allocated in
556a function (otherwise, we will always only see malloc / realloc). Chasing the
557parent_id of a callsite (not shown in this table) recursively is very hard in
558SQL.
559
560There is an **experimental** table that surfaces this information. The **API is
561subject to change**.
562
563```sql
564select name, map_name, cumulative_size
565       from experimental_flamegraph(8300973884377,1,'native')
566       order by abs(cumulative_size) desc;
567```
568
569| name | map_name | cumulative_size |
570|------|----------|----------------|
571|__start_thread|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
572|_ZL15__pthread_startPv|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
573|_ZN13thread_data_t10trampolineEPKS|/system/lib64/libutils.so|199496|
574|_ZN7android14AndroidRuntime15javaThreadShellEPv|/system/lib64/libandroid_runtime.so|199496|
575|_ZN7android6Thread11_threadLoopEPv|/system/lib64/libutils.so|199496|
576|_ZN3art6Thread14CreateCallbackEPv|/apex/com.android.art/lib64/libart.so|193112|
577|_ZN3art35InvokeVirtualOrInterface...|/apex/com.android.art/lib64/libart.so|193112|
578|_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc|/apex/com.android.art/lib64/libart.so|193112|
579|art_quick_invoke_stub|/apex/com.android.art/lib64/libart.so|193112|
580