1# bcc Python Developer Tutorial
2
3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences.
4
5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc.
6
7## Observability
8
9This observability tutorial contains 17 lessons, and 46 enumerated things to learn.
10
11### Lesson 1. Hello World
12
13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md).
14
15```
16# ./examples/hello_world.py
17            bash-13364 [002] d... 24573433.052937: : Hello, World!
18            bash-13364 [003] d... 24573436.642808: : Hello, World!
19[...]
20```
21
22Here's the code for hello_world.py:
23
24```Python
25from bcc import BPF
26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
27```
28
29There are six things to learn from this:
30
311. ```text='...'```: This defines a BPF program inline. The program is written in C.
32
331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```.
34
351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```.
36
371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later.
38
391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)).
40
411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output.
42
43### Lesson 2. sys_sync()
44
45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this.
46
47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python.
48
49### Lesson 3. hello_fields.py
50
51This program is in [examples/tracing/hello_fields.py](../examples/tracing/trace_fields.py). Sample output (run commands in another session):
52
53```
54# ./examples/tracing/hello_fields.py
55TIME(s)            COMM             PID    MESSAGE
5624585001.174885999 sshd             1432   Hello, World!
5724585001.195710000 sshd             15780  Hello, World!
5824585001.991976000 systemd-udevd    484    Hello, World!
5924585002.276147000 bash             15787  Hello, World!
60```
61
62Code:
63
64```Python
65from bcc import BPF
66
67# define BPF program
68prog = """
69int hello(void *ctx) {
70	bpf_trace_printk("Hello, World!\\n");
71	return 0;
72}
73"""
74
75# load BPF program
76b = BPF(text=prog)
77b.attach_kprobe(event="sys_clone", fn_name="hello")
78
79# header
80print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
81
82# format output
83while 1:
84    try:
85        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
86    except ValueError:
87        continue
88    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
89```
90
91This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn:
92
931. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments.
94
951. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it.
96
971. ```b.attach_kprobe(event="sys_clone", fn_name="hello")```: Creates a kprobe for the sys_clone() kernel function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.
98
991. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT().
100
101### Lesson 4. sync_timing.py
102
103Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway.
104
105The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's:
106
107```
108# ./examples/tracing/sync_timing.py
109Tracing for quick sync's... Ctrl-C to end
110At time 0.00 s: multiple syncs detected, last 95 ms ago
111At time 0.10 s: multiple syncs detected, last 96 ms ago
112```
113
114This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py):
115
116```Python
117from bcc import BPF
118
119# load BPF program
120b = BPF(text="""
121#include <uapi/linux/ptrace.h>
122#include <linux/blkdev.h>
123
124BPF_HASH(last);
125
126int do_trace(struct pt_regs *ctx) {
127	u64 ts, *tsp, delta, key = 0;
128
129	// attempt to read stored timestamp
130	tsp = last.lookup(&key);
131	if (tsp != 0) {
132		delta = bpf_ktime_get_ns() - *tsp;
133		if (delta < 1000000000) {
134			// output if time is less than 1 second
135			bpf_trace_printk("%d\\n", delta / 1000000);
136		}
137		last.delete(&key);
138	}
139
140	// update stored timestamp
141	ts = bpf_ktime_get_ns();
142	last.update(&key, &ts);
143	return 0;
144}
145""")
146
147b.attach_kprobe(event="sys_sync", fn_name="do_trace")
148print("Tracing for quick sync's... Ctrl-C to end")
149
150# format output
151start = 0
152while 1:
153    (task, pid, cpu, flags, ts, ms) = b.trace_fields()
154    if start == 0:
155        start = ts
156    ts = ts - start
157    print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms))
158```
159
160Things to learn:
161
1621. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds.
1631. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64.
1641. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero.
1651. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer.
1661. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02).
1671. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp.
168
169### Lesson 5. sync_count.py
170
171Modify the sync_timing.py program (prior lesson) to store the count of all sys_sync() calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.
172
173### Lesson 6. disksnoop.py
174
175Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output:
176
177```
178# ./disksnoop.py
179TIME(s)            T  BYTES    LAT(ms)
18016458043.436012    W  4096        3.13
18116458043.437326    W  4096        4.44
18216458044.126545    R  4096       42.82
18316458044.129872    R  4096        3.24
184[...]
185```
186
187And a code snippet:
188
189```Python
190[...]
191REQ_WRITE = 1		# from include/linux/blk_types.h
192
193# load BPF program
194b = BPF(text="""
195#include <uapi/linux/ptrace.h>
196#include <linux/blkdev.h>
197
198BPF_HASH(start, struct request *);
199
200void trace_start(struct pt_regs *ctx, struct request *req) {
201	// stash start timestamp by request ptr
202	u64 ts = bpf_ktime_get_ns();
203
204	start.update(&req, &ts);
205}
206
207void trace_completion(struct pt_regs *ctx, struct request *req) {
208	u64 *tsp, delta;
209
210	tsp = start.lookup(&req);
211	if (tsp != 0) {
212		delta = bpf_ktime_get_ns() - *tsp;
213		bpf_trace_printk("%d %x %d\\n", req->__data_len,
214			req->cmd_flags, delta / 1000);
215		start.delete(&req);
216	}
217}
218""")
219
220b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
221b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
222b.attach_kprobe(event="blk_account_io_completion", fn_name="trace_completion")
223[...]
224```
225
226Things to learn:
227
2281. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes.
2291. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```.
2301. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return).
2311. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read()``` directly.
232
233This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next.
234
235### Lesson 7. hello_perf_output.py
236
237Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session:
238
239```
240# ./hello_perf_output.py
241TIME(s)            COMM             PID    MESSAGE
2420.000000000        bash             22986  Hello, perf_output!
2430.021080275        systemd-udevd    484    Hello, perf_output!
2440.021359520        systemd-udevd    484    Hello, perf_output!
2450.021590610        systemd-udevd    484    Hello, perf_output!
246[...]
247```
248
249Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py):
250
251```Python
252from bcc import BPF
253import ctypes as ct
254
255# define BPF program
256prog = """
257#include <linux/sched.h>
258
259// define output data structure in C
260struct data_t {
261	u32 pid;
262	u64 ts;
263	char comm[TASK_COMM_LEN];
264};
265BPF_PERF_OUTPUT(events);
266
267int hello(struct pt_regs *ctx) {
268	struct data_t data = {};
269
270	data.pid = bpf_get_current_pid_tgid();
271	data.ts = bpf_ktime_get_ns();
272	bpf_get_current_comm(&data.comm, sizeof(data.comm));
273
274	events.perf_submit(ctx, &data, sizeof(data));
275
276	return 0;
277}
278"""
279
280# load BPF program
281b = BPF(text=prog)
282b.attach_kprobe(event="sys_clone", fn_name="hello")
283
284# define output data structure in Python
285TASK_COMM_LEN = 16    # linux/sched.h
286class Data(ct.Structure):
287    _fields_ = [("pid", ct.c_ulonglong),
288                ("ts", ct.c_ulonglong),
289                ("comm", ct.c_char * TASK_COMM_LEN)]
290
291# header
292print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
293
294# process event
295start = 0
296def print_event(cpu, data, size):
297    global start
298    event = ct.cast(data, ct.POINTER(Data)).contents
299    if start == 0:
300            start = event.ts
301    time_s = (float(event.ts - start)) / 1000000000
302    print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
303        "Hello, perf_output!"))
304
305# loop with callback to print_event
306b["events"].open_perf_buffer(print_event)
307while 1:
308    b.perf_buffer_poll()
309```
310
311Things to learn:
312
3131. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space.
3141. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events".
3151. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate.
3161. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user.
3171. ```bpf_get_current_comm()```: Populates the first argument address with the current process name.
3181. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer.
3191. ```class Data(ct.Structure)```: Now define the Python version of the C data structure.
3201. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream.
3211. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream.
3221. ```while 1: b.perf_buffer_poll()```: Block waiting for events.
323
324This may be improved in future bcc versions. Eg, the Python data struct could be auto-generated from the C code.
325
326### Lesson 8. sync_perf_output.py
327
328Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```.
329
330### Lesson 9. bitehist.py
331
332The following tool records a histogram of disk I/O sizes. Sample output:
333
334```
335# ./bitehist.py
336Tracing... Hit Ctrl-C to end.
337^C
338     kbytes          : count     distribution
339       0 -> 1        : 3        |                                      |
340       2 -> 3        : 0        |                                      |
341       4 -> 7        : 211      |**********                            |
342       8 -> 15       : 0        |                                      |
343      16 -> 31       : 0        |                                      |
344      32 -> 63       : 0        |                                      |
345      64 -> 127      : 1        |                                      |
346     128 -> 255      : 800      |**************************************|
347```
348
349Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py):
350
351```Python
352from bcc import BPF
353from time import sleep
354
355# load BPF program
356b = BPF(text="""
357#include <uapi/linux/ptrace.h>
358#include <linux/blkdev.h>
359
360BPF_HISTOGRAM(dist);
361
362int kprobe__blk_account_io_completion(struct pt_regs *ctx, struct request *req)
363{
364	dist.increment(bpf_log2l(req->__data_len / 1024));
365	return 0;
366}
367""")
368
369# header
370print("Tracing... Hit Ctrl-C to end.")
371
372# trace until Ctrl-C
373try:
374    sleep(99999999)
375except KeyboardInterrupt:
376    print
377
378# output
379b["dist"].print_log2_hist("kbytes")
380```
381
382A recap from earlier lessons:
383
384- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe.
385- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_completion()```.
386- ```req->__data_len```: Dereferencing that member.
387
388New things to learn:
389
3901. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist".
3911. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument.
3921. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram.
3931. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient.
394
395### Lesson 10. disklatency.py
396
397Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson.
398
399### Lesson 11. vfsreadlat.py
400
401This example is split into separate Python and C files. Example output:
402
403```
404# ./vfsreadlat.py 1
405Tracing... Hit Ctrl-C to end.
406     usecs               : count     distribution
407         0 -> 1          : 0        |                                        |
408         2 -> 3          : 2        |***********                             |
409         4 -> 7          : 7        |****************************************|
410         8 -> 15         : 4        |**********************                  |
411
412     usecs               : count     distribution
413         0 -> 1          : 29       |****************************************|
414         2 -> 3          : 28       |**************************************  |
415         4 -> 7          : 4        |*****                                   |
416         8 -> 15         : 8        |***********                             |
417        16 -> 31         : 0        |                                        |
418        32 -> 63         : 0        |                                        |
419        64 -> 127        : 0        |                                        |
420       128 -> 255        : 0        |                                        |
421       256 -> 511        : 2        |**                                      |
422       512 -> 1023       : 0        |                                        |
423      1024 -> 2047       : 0        |                                        |
424      2048 -> 4095       : 0        |                                        |
425      4096 -> 8191       : 4        |*****                                   |
426      8192 -> 16383      : 6        |********                                |
427     16384 -> 32767      : 9        |************                            |
428     32768 -> 65535      : 6        |********                                |
429     65536 -> 131071     : 2        |**                                      |
430
431     usecs               : count     distribution
432         0 -> 1          : 11       |****************************************|
433         2 -> 3          : 2        |*******                                 |
434         4 -> 7          : 10       |************************************    |
435         8 -> 15         : 8        |*****************************           |
436        16 -> 31         : 1        |***                                     |
437        32 -> 63         : 2        |*******                                 |
438[...]
439```
440
441Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn:
442
4431. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file.
4441. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry.
4451. ```b["dist"].clear()```: Clears the histogram.
446
447### Lesson 12. urandomread.py
448
449Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run:
450
451```
452# ./urandomread.py
453TIME(s)            COMM             PID    GOTBITS
45424652832.956994001 smtp             24690  384
45524652837.726500999 dd               24692  65536
45624652837.727111001 dd               24692  65536
45724652837.727703001 dd               24692  65536
45824652837.728294998 dd               24692  65536
45924652837.728888001 dd               24692  65536
460```
461
462Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py):
463
464```Python
465from bcc import BPF
466
467# load BPF program
468b = BPF(text="""
469TRACEPOINT_PROBE(random, urandom_read) {
470	// args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
471	bpf_trace_printk("%d\\n", args->got_bits);
472	return 0;
473};
474""")
475
476# header
477print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS"))
478
479# format output
480while 1:
481    try:
482        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
483    except ValueError:
484        continue
485    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
486```
487
488Things to learn:
489
4901. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints.
4911. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg:
492
493```
494# cat /sys/kernel/debug/tracing/events/random/urandom_read/format
495name: urandom_read
496ID: 972
497format:
498	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
499	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
500	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
501	field:int common_pid;	offset:4;	size:4;	signed:1;
502
503	field:int got_bits;	offset:8;	size:4;	signed:1;
504	field:int pool_left;	offset:12;	size:4;	signed:1;
505	field:int input_left;	offset:16;	size:4;	signed:1;
506
507print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left
508```
509
510In this case, we were printing the ```got_bits``` member.
511
512### Lesson 13. disksnoop.py fixed
513
514Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints.
515
516### Lesson 14. strlen_count.py
517
518This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output:
519
520```
521# ./strlen_count.py
522Tracing strlen()... Hit Ctrl-C to end.
523^C     COUNT STRING
524         1 " "
525         1 "/bin/ls"
526         1 "."
527         1 "cpudist.py.1"
528         1 ".bashrc"
529         1 "ls --color=auto"
530         1 "key_t"
531[...]
532        10 "a7:~# "
533        10 "/root"
534        12 "LC_ALL"
535        12 "en_US.UTF-8"
536        13 "en_US.UTF-8"
537        20 "~"
538        70 "#%^,~:-=?+/}"
539       340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# "
540```
541
542These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example.
543
544Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py):
545
546```Python
547from bcc import BPF
548from time import sleep
549
550# load BPF program
551b = BPF(text="""
552#include <uapi/linux/ptrace.h>
553
554struct key_t {
555	char c[80];
556};
557BPF_HASH(counts, struct key_t);
558
559int count(struct pt_regs *ctx) {
560  if (!PT_REGS_PARM1(ctx))
561    return 0;
562
563  struct key_t key = {};
564  u64 zero = 0, *val;
565
566  bpf_probe_read(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
567
568  // another possibility is using `counts.increment(key);`. It allows a second
569  //   optional parameter to specify the increment step
570  val = counts.lookup_or_init(&key, &zero);
571  (*val)++;
572  return 0;
573};
574""")
575b.attach_uprobe(name="c", sym="strlen", fn_name="count")
576
577# header
578print("Tracing strlen()... Hit Ctrl-C to end.")
579
580# sleep until Ctrl-C
581try:
582    sleep(99999999)
583except KeyboardInterrupt:
584    pass
585
586# print output
587print("%10s %s" % ("COUNT", "STRING"))
588counts = b.get_table("counts")
589for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
590    print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))
591```
592
593Things to learn:
594
5951. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string.
5961. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```.
597
598### Lesson 15. nodejs_http_server.py
599
600This program instruments a user-defined static tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output:
601
602```
603# ./nodejs_http_server.py 24728
604TIME(s)            COMM             PID    ARGS
60524653324.561322998 node             24728  path:/index.html
60624653335.343401998 node             24728  path:/images/welcome.png
60724653340.510164998 node             24728  path:/images/favicon.png
608```
609
610Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py):
611
612```Python
613if len(sys.argv) < 2:
614	print("USAGE: nodejs_http_server PID")
615	exit()
616pid = sys.argv[1]
617
618# load BPF program
619bpf_text = """
620#include <uapi/linux/ptrace.h>
621int do_trace(struct pt_regs *ctx) {
622	uint64_t addr;
623	char path[128];
624	bpf_usdt_readarg(6, ctx, &addr);
625	bpf_probe_read(&path, sizeof(path), (void *)addr);
626	bpf_trace_printk("path:%s\\n", path);
627	return 0;
628};
629"""
630
631# enable USDT probe from given PID
632u = USDT(pid=int(pid))
633u.enable_probe(probe="http__server__request", fn_name="do_trace")
634
635# initialize BPF
636b = BPF(text=bpf_text, usdt_contexts=[u])
637```
638
639Things to learn:
640
6411. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```.
6421. ```bpf_probe_read(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable.
6431. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID.
6441. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe.
6451. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation.
646
647### Lesson 16. task_switch.c
648
649This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned.
650
651This is a slightly more complex tracing example than Hello World. This program
652will be invoked for every task change in the kernel, and record in a BPF map
653the new and old pids.
654
655The C program below introduces two new concepts.
656The first is the macro `BPF_TABLE`. This defines a table (type="hash"), with key
657type `key_t` and leaf type `u64` (a single counter). The table name is `stats`,
658containing 1024 entries maximum. One can `lookup`, `lookup_or_init`, `update`,
659and `delete` entries from the table.
660The second concept is the prev argument. This argument is treated specially by
661the BCC frontend, such that accesses to this variable are read from the saved
662context that is passed by the kprobe infrastructure. The prototype of the args
663starting from position 1 should match the prototype of the kernel function being
664kprobed. If done so, the program will have seamless access to the function
665parameters.
666
667```c
668#include <uapi/linux/ptrace.h>
669#include <linux/sched.h>
670
671struct key_t {
672  u32 prev_pid;
673  u32 curr_pid;
674};
675// map_type, key_type, leaf_type, table_name, num_entry
676BPF_HASH(stats, struct key_t, u64, 1024);
677// attach to finish_task_switch in kernel/sched/core.c, which has the following
678// prototype:
679//   struct rq *finish_task_switch(struct task_struct *prev)
680int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
681  struct key_t key = {};
682  u64 zero = 0, *val;
683
684  key.curr_pid = bpf_get_current_pid_tgid();
685  key.prev_pid = prev->pid;
686
687  // another possibility is using `counts.increment(key);`. It allows a second
688  //   optional parameter to specify the increment step
689  val = stats.lookup_or_init(&key, &zero);
690  (*val)++;
691  return 0;
692}
693```
694
695The userspace component loads the file shown above, and attaches it to the
696`finish_task_switch` kernel function.
697The [] operator of the BPF object gives access to each BPF_TABLE in the
698program, allowing pass-through access to the values residing in the kernel. Use
699the object as you would any other python dict object: read, update, and deletes
700are all allowed.
701```python
702from bcc import BPF
703from time import sleep
704
705b = BPF(src_file="task_switch.c")
706b.attach_kprobe(event="finish_task_switch", fn_name="count_sched")
707
708# generate many schedule events
709for i in range(0, 100): sleep(0.01)
710
711for k, v in b["stats"].items():
712    print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value))
713```
714
715These programs have now been merged, and are both in [examples/tracing/task_switch.py](examples/tracing/task_switch.py).
716
717### Lesson 17. Further Study
718
719For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study.
720
721Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contrubite tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing!
722
723## Networking
724
725To do.
726