• Home
  • History
  • Annotate
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# bcc Tutorial
2
3This tutorial covers how to use [bcc](https://github.com/iovisor/bcc) tools to quickly solve performance, troubleshooting, and networking issues. If you want to develop new bcc tools, see [tutorial_bcc_python_developer.md](tutorial_bcc_python_developer.md) for that tutorial.
4
5It is assumed for this tutorial that bcc is already installed, and you can run tools like execsnoop successfully. See [INSTALL.md](../INSTALL.md). This uses enhancements added to the Linux 4.x series.
6
7## Observability
8
9Some quick wins.
10
11### 0. Before bcc
12
13Before using bcc, you should start with the Linux basics. One reference is the [Linux Performance Analysis in 60s](http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html) post, which covers these commands:
14
151. uptime
161. dmesg | tail
171. vmstat 1
181. mpstat -P ALL 1
191. pidstat 1
201. iostat -xz 1
211. free -m
221. sar -n DEV 1
231. sar -n TCP,ETCP 1
241. top
25
26### 1. General Performance
27
28Here is a generic checklist for performance investigations with bcc, first as a list, then in detail:
29
301. execsnoop
311. opensnoop
321. ext4slower (or btrfs\*, xfs\*, zfs\*)
331. biolatency
341. biosnoop
351. cachestat
361. tcpconnect
371. tcpaccept
381. tcpretrans
391. runqlat
401. profile
41
42These tools may be installed on your system under /usr/share/bcc/tools, or you can run them from the bcc github repo under /tools where they have a .py extension. Browse the 50+ tools available for more analysis options.
43
44#### 1.1 execsnoop
45
46```
47# ./execsnoop
48PCOMM            PID    RET ARGS
49supervise        9660     0 ./run
50supervise        9661     0 ./run
51mkdir            9662     0 /bin/mkdir -p ./main
52run              9663     0 ./run
53[...]
54```
55
56execsnoop prints one line of output for each new process. Check for short-lived processes. These can consume CPU resources, but not show up in most monitoring tools that periodically take snapshots of which processes are running.
57
58It works by tracing exec(), not the fork(), so it will catch many types of new processes but not all (eg, it won't see an application launching working processes, that doesn't exec() anything else).
59
60More [examples](../tools/execsnoop_example.txt).
61
62#### 1.2. opensnoop
63
64```
65# ./opensnoop
66PID    COMM      FD ERR PATH
671565   redis-server        5   0 /proc/1565/stat
681565   redis-server        5   0 /proc/1565/stat
691565   redis-server        5   0 /proc/1565/stat
701603   snmpd               9   0 /proc/net/dev
711603   snmpd              11   0 /proc/net/if_inet6
721603   snmpd              -1   2 /sys/class/net/eth0/device/vendor
731603   snmpd              11   0 /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
741603   snmpd              11   0 /proc/sys/net/ipv6/neigh/eth0/retrans_time_ms
751603   snmpd              11   0 /proc/sys/net/ipv6/conf/eth0/forwarding
76[...]
77```
78
79opensnoop prints one line of output for each open() syscall, including details.
80
81Files that are opened can tell you a lot about how applications work: identifying their data files, config files, and log files. Sometimes applications can misbehave, and perform poorly, when they are constantly attempting to read files that do not exist. opensnoop gives you a quick look.
82
83More [examples](../tools/opensnoop_example.txt).
84
85#### 1.3. ext4slower (or btrfs\*, xfs\*, zfs\*)
86
87```
88# ./ext4slower
89Tracing ext4 operations slower than 10 ms
90TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
9106:35:01 cron           16464  R 1249    0          16.05 common-auth
9206:35:01 cron           16463  R 1249    0          16.04 common-auth
9306:35:01 cron           16465  R 1249    0          16.03 common-auth
9406:35:01 cron           16465  R 4096    0          10.62 login.defs
9506:35:01 cron           16464  R 4096    0          10.61 login.defs
96```
97
98ext4slower traces the ext4 file system and times common operations, and then only prints those that exceed a threshold.
99
100This is great for identifying or exonerating one type of performance issue: slow individually slow disk i/O via the file system. Disks process I/O asynchronously, and it can be difficult to associate latency at that layer with the latency applications experience. Tracing higher up in the kernel stack, at the VFS -> file system interface, will more closely match what an application suffers. Use this tool to identify if file system latency exceeds a given threshold.
101
102Similar tools exist in bcc for other file systems: btrfsslower, xfsslower, and zfsslower. There is also fileslower, which works at the VFS layer and traces everything (although at some higher overhead).
103
104More [examples](../tools/ext4slower_example.txt).
105
106#### 1.4. biolatency
107
108```
109# ./biolatency
110Tracing block device I/O... Hit Ctrl-C to end.
111^C
112     usecs           : count     distribution
113       0 -> 1        : 0        |                                      |
114       2 -> 3        : 0        |                                      |
115       4 -> 7        : 0        |                                      |
116       8 -> 15       : 0        |                                      |
117      16 -> 31       : 0        |                                      |
118      32 -> 63       : 0        |                                      |
119      64 -> 127      : 1        |                                      |
120     128 -> 255      : 12       |********                              |
121     256 -> 511      : 15       |**********                            |
122     512 -> 1023     : 43       |*******************************       |
123    1024 -> 2047     : 52       |**************************************|
124    2048 -> 4095     : 47       |**********************************    |
125    4096 -> 8191     : 52       |**************************************|
126    8192 -> 16383    : 36       |**************************            |
127   16384 -> 32767    : 15       |**********                            |
128   32768 -> 65535    : 2        |*                                     |
129   65536 -> 131071   : 2        |*                                     |
130```
131
132biolatency traces disk I/O latency (time from device issue to completion), and when the tool ends (Ctrl-C, or a given interval), it prints a histogram summary of the latency.
133
134This is great for understanding disk I/O latency beyond the average times given by tools like iostat. I/O latency outliers will be visible at the end of the distribution, as well as multi-mode distributions.
135
136More [examples](../tools/biolatency_example.txt).
137
138#### 1.5. biosnoop
139
140```
141# ./biosnoop
142TIME(s)        COMM           PID    DISK    T  SECTOR    BYTES   LAT(ms)
1430.000004001    supervise      1950   xvda1   W  13092560  4096       0.74
1440.000178002    supervise      1950   xvda1   W  13092432  4096       0.61
1450.001469001    supervise      1956   xvda1   W  13092440  4096       1.24
1460.001588002    supervise      1956   xvda1   W  13115128  4096       1.09
1471.022346001    supervise      1950   xvda1   W  13115272  4096       0.98
1481.022568002    supervise      1950   xvda1   W  13188496  4096       0.93
149[...]
150```
151
152biosnoop prints a line of output for each disk I/O, with details including latency (time from device issue to completion).
153
154This allows you to examine disk I/O in more detail, and look for time-ordered patterns (eg, reads queueing behind writes). Note that the output will be verbose if your system performance a high rate of disk I/O.
155
156More [examples](../tools/biosnoop_example.txt).
157
158#### 1.6. cachestat
159
160```
161# ./cachestat
162    HITS   MISSES  DIRTIES  READ_HIT% WRITE_HIT%   BUFFERS_MB  CACHED_MB
163    1074       44       13      94.9%       2.9%            1        223
164    2195      170        8      92.5%       6.8%            1        143
165     182       53       56      53.6%       1.3%            1        143
166   62480    40960    20480      40.6%      19.8%            1        223
167       7        2        5      22.2%      22.2%            1        223
168     348        0        0     100.0%       0.0%            1        223
169[...]
170```
171
172cachestat prints a one line summary every second (or every custom interval) showing statistics from the file system cache.
173
174Use this to identify a low cache hit ratio, and a high rate of misses: which gives one lead for performance tuning.
175
176More [examples](../tools/cachestat_example.txt).
177
178#### 1.7. tcpconnect
179
180```
181# ./tcpconnect
182PID    COMM         IP SADDR            DADDR            DPORT
1831479   telnet       4  127.0.0.1        127.0.0.1        23
1841469   curl         4  10.201.219.236   54.245.105.25    80
1851469   curl         4  10.201.219.236   54.67.101.145    80
1861991   telnet       6  ::1              ::1              23
1872015   ssh          6  fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22
188[...]
189```
190
191tcpconnect prints one line of output for every active TCP connection (eg, via connect()), with details including source and destination addresses.
192
193Look for unexpected connections that may point to inefficiencies in application configuration, or an intruder.
194
195More [examples](../tools/tcpconnect_example.txt).
196
197#### 1.8. tcpaccept
198
199```
200# ./tcpaccept
201PID    COMM         IP RADDR            LADDR            LPORT
202907    sshd         4  192.168.56.1     192.168.56.102   22
203907    sshd         4  127.0.0.1        127.0.0.1        22
2045389   perl         6  1234:ab12:2040:5020:2299:0:5:0 1234:ab12:2040:5020:2299:0:5:0 7001
205[...]
206```
207
208tcpaccept prints one line of output for every passive TCP connection (eg, via accept()), with details including source and destination addresses.
209
210Look for unexpected connections that may point to inefficiencies in application configuration, or an intruder.
211
212More [examples](../tools/tcpaccept_example.txt).
213
214#### 1.9. tcpretrans
215
216```
217# ./tcpretrans
218TIME     PID    IP LADDR:LPORT          T> RADDR:RPORT          STATE
21901:55:05 0      4  10.153.223.157:22    R> 69.53.245.40:34619   ESTABLISHED
22001:55:05 0      4  10.153.223.157:22    R> 69.53.245.40:34619   ESTABLISHED
22101:55:17 0      4  10.153.223.157:22    R> 69.53.245.40:22957   ESTABLISHED
222[...]
223```
224
225tcprerans prints one line of output for every TCP retransmit packet, with details including source and destination addresses, and kernel state of the TCP connection.
226
227TCP retransmissions cause latency and throughput issues. For ESTABLISHED retransmits, look for patterns with networks. For SYN_SENT, this may point to target kernel CPU saturation and kernel packet drops.
228
229More [examples](../tools/tcpretrans_example.txt).
230
231#### 1.10. runqlat
232
233```
234# ./runqlat
235Tracing run queue latency... Hit Ctrl-C to end.
236^C
237     usecs               : count     distribution
238         0 -> 1          : 233      |***********                             |
239         2 -> 3          : 742      |************************************    |
240         4 -> 7          : 203      |**********                              |
241         8 -> 15         : 173      |********                                |
242        16 -> 31         : 24       |*                                       |
243        32 -> 63         : 0        |                                        |
244        64 -> 127        : 30       |*                                       |
245       128 -> 255        : 6        |                                        |
246       256 -> 511        : 3        |                                        |
247       512 -> 1023       : 5        |                                        |
248      1024 -> 2047       : 27       |*                                       |
249      2048 -> 4095       : 30       |*                                       |
250      4096 -> 8191       : 20       |                                        |
251      8192 -> 16383      : 29       |*                                       |
252     16384 -> 32767      : 809      |****************************************|
253     32768 -> 65535      : 64       |***                                     |
254```
255
256runqlat times how long threads were waiting on the CPU run queues, and prints this as a histogram.
257
258This can help quantify time lost waiting for a turn on CPU, during periods of CPU saturation.
259
260More [examples](../tools/runqlat_example.txt).
261
262#### 1.11. profile
263
264```
265# ./profile
266Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
267^C
268    00007f31d76c3251 [unknown]
269    47a2c1e752bf47f7 [unknown]
270    -                sign-file (8877)
271        1
272
273    ffffffff813d0af8 __clear_user
274    ffffffff813d5277 iov_iter_zero
275    ffffffff814ec5f2 read_iter_zero
276    ffffffff8120be9d __vfs_read
277    ffffffff8120c385 vfs_read
278    ffffffff8120d786 sys_read
279    ffffffff817cc076 entry_SYSCALL_64_fastpath
280    00007fc5652ad9b0 read
281    -                dd (25036)
282        4
283
284    0000000000400542 func_a
285    0000000000400598 main
286    00007f12a133e830 __libc_start_main
287    083e258d4c544155 [unknown]
288    -                func_ab (13549)
289        5
290
291[...]
292
293    ffffffff8105eb66 native_safe_halt
294    ffffffff8103659e default_idle
295    ffffffff81036d1f arch_cpu_idle
296    ffffffff810bba5a default_idle_call
297    ffffffff810bbd07 cpu_startup_entry
298    ffffffff8104df55 start_secondary
299    -                swapper/1 (0)
300        75
301```
302
303profile is a CPU profiler, which takes samples of stack traces at timed intervals, and prints a summary of unique stack traces and a count of their occurrence.
304
305Use this tool to understand the code paths that are consuming CPU resources.
306
307More [examples](../tools/profile_example.txt).
308
309### 2. Observatility with Generic Tools
310
311In addition to the above tools for performance tuning, below is a checklist for bcc generic tools, first as a list, and in detail:
312
3131. trace
3141. argdist
3151. funccount
316
317These generic tools may be useful to provide visibility to solve your specific problems.
318
319#### 2.1. trace
320
321##### Example 1
322
323Suppose you want to track file ownership change. There are three syscalls, `chown`, `fchown` and `lchown` which users can use to change file ownership. The corresponding syscall entry is `SyS_[f|l]chown`.  The following command can be used to print out syscall parameters and the calling process user id. You can use `id` command to find the uid of a particular user.
324
325```
326$ trace.py \
327  'p::SyS_chown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
328  'p::SyS_fchown "fd = %d, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
329  'p::SyS_lchown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid'
330PID    TID    COMM         FUNC             -
3311269255 1269255 python3.6    SyS_lchown       file = /tmp/dotsync-usisgezu/tmp, to_uid = 128203, to_gid = 100, from_uid = 128203
3321269441 1269441 zstd         SyS_chown        file = /tmp/dotsync-vic7ygj0/dotsync-package.zst, to_uid = 128203, to_gid = 100, from_uid = 128203
3331269255 1269255 python3.6    SyS_lchown       file = /tmp/dotsync-a40zd7ev/tmp, to_uid = 128203, to_gid = 100, from_uid = 128203
3341269442 1269442 zstd         SyS_chown        file = /tmp/dotsync-gzp413o_/dotsync-package.zst, to_uid = 128203, to_gid = 100, from_uid = 128203
3351269255 1269255 python3.6    SyS_lchown       file = /tmp/dotsync-whx4fivm/tmp/.bash_profile, to_uid = 128203, to_gid = 100, from_uid = 128203
336```
337
338##### Example 2
339
340Suppose you want to count nonvoluntary context switches (`nvcsw`) in your bpf based performance monitoring tools and you do not know what is the proper method. `/proc/<pid>/status` already tells you the number (`nonvoluntary_ctxt_switches`) for a pid and you can use `trace.py` to do a quick experiment to verify your method. With kernel source code, the `nvcsw` is counted at file `linux/kernel/sched/core.c` function `__schedule` and under condition
341```
342!(!preempt && prev->state) // i.e., preempt || !prev->state
343```
344
345The `__schedule` function is marked as `notrace`, and the best place to evaluate the above condition seems in `sched/sched_switch` tracepoint called inside function `__schedule` and defined in `linux/include/trace/events/sched.h`. `trace.py` already has `args` being the pointer to the tracepoint `TP_STRUCT__entry`.  The above condition in function `__schedule` can be represented as
346```
347args->prev_state == TASK_STATE_MAX || args->prev_state == 0
348```
349
350The below command can be used to count the involuntary context switches (per process or per pid) and compare to `/proc/<pid>/status` or `/proc/<pid>/task/<task_id>/status` for correctness, as in typical cases, involuntary context switches are not very common.
351```
352$ trace.py -p 1134138 't:sched:sched_switch (args->prev_state == TASK_STATE_MAX || args->prev_state == 0)'
353PID    TID    COMM         FUNC
3541134138 1134140 contention_test sched_switch
3551134138 1134142 contention_test sched_switch
356...
357$ trace.py -L 1134140 't:sched:sched_switch (args->prev_state == TASK_STATE_MAX || args->prev_state == 0)'
358PID    TID    COMM         FUNC
3591134138 1134140 contention_test sched_switch
3601134138 1134140 contention_test sched_switch
361...
362```
363
364##### Example 3
365
366This example is related to issue [1231](https://github.com/iovisor/bcc/issues/1231) and [1516](https://github.com/iovisor/bcc/issues/1516) where uprobe does not work at all in certain cases. First, you can do a `strace` as below
367
368```
369$ strace trace.py 'r:bash:readline "%s", retval'
370...
371perf_event_open(0x7ffd968212f0, -1, 0, -1, 0x8 /* PERF_FLAG_??? */) = -1 EIO (Input/output error)
372...
373```
374
375The `perf_event_open` syscall returns `-EIO`. Digging into kernel uprobe related codes in `/kernel/trace` and `/kernel/events` directories to search `EIO`, the function `uprobe_register` is the most suspicious. Let us find whether this function is called or not and what is the return value if it is called. In one terminal using the following command to print out the return value of uprobe_register,
376```
377$ trace.py 'r::uprobe_register "ret = %d", retval'
378```
379In another terminal run the same bash uretprobe tracing example, and you should get
380```
381$ trace.py 'r::uprobe_register "ret = %d", retval'
382PID    TID    COMM         FUNC             -
3831041401 1041401 python2.7    uprobe_register  ret = -5
384```
385
386The `-5` error code is EIO. This confirms that the following code in function `uprobe_register` is the most suspicious culprit.
387```
388 if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
389        return -EIO;
390```
391The `shmem_mapping` function is defined as
392```
393bool shmem_mapping(struct address_space *mapping)
394{
395        return mapping->a_ops == &shmem_aops;
396}
397```
398
399To confirm the theory, find what is `inode->i_mapping->a_ops` with the following command
400```
401$ trace.py -I 'linux/fs.h' 'p::uprobe_register(struct inode *inode) "a_ops = %llx", inode->i_mapping->a_ops'
402PID    TID    COMM         FUNC             -
403814288 814288 python2.7    uprobe_register  a_ops = ffffffff81a2adc0
404^C$ grep ffffffff81a2adc0 /proc/kallsyms
405ffffffff81a2adc0 R empty_aops
406```
407
408The kernel symbol `empty_aops` does not have `readpage` defined and hence the above suspicious condition is true. Further examining the kernel source code shows that `overlayfs` does not provide its own `a_ops` while some other file systems (e.g., ext4) define their own `a_ops` (e.g., `ext4_da_aops`), and `ext4_da_aops` defines `readpage`. Hence, uprobe works fine on ext4 while not on overlayfs.
409
410More [examples](../tools/trace_example.txt).
411
412#### 2.2. argdist
413
414More [examples](../tools/argdist_example.txt).
415
416#### 2.3. funccount
417
418More [examples](../tools/funccount_example.txt).
419
420## Networking
421
422To do.
423