1# bcc Python Developer Tutorial 2 3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences. 4 5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc. 6 7## Observability 8 9This observability tutorial contains 17 lessons, and 46 enumerated things to learn. 10 11### Lesson 1. Hello World 12 13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md). 14 15``` 16# ./examples/hello_world.py 17 bash-13364 [002] d... 24573433.052937: : Hello, World! 18 bash-13364 [003] d... 24573436.642808: : Hello, World! 19[...] 20``` 21 22Here's the code for hello_world.py: 23 24```Python 25from bcc import BPF 26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print() 27``` 28 29There are six things to learn from this: 30 311. ```text='...'```: This defines a BPF program inline. The program is written in C. 32 331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```. 34 351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```. 36 371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later. 38 391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)). 40 411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output. 42 43### Lesson 2. sys_sync() 44 45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this. 46 47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python. 48 49### Lesson 3. hello_fields.py 50 51This program is in [examples/tracing/hello_fields.py](../examples/tracing/trace_fields.py). Sample output (run commands in another session): 52 53``` 54# ./examples/tracing/hello_fields.py 55TIME(s) COMM PID MESSAGE 5624585001.174885999 sshd 1432 Hello, World! 5724585001.195710000 sshd 15780 Hello, World! 5824585001.991976000 systemd-udevd 484 Hello, World! 5924585002.276147000 bash 15787 Hello, World! 60``` 61 62Code: 63 64```Python 65from bcc import BPF 66 67# define BPF program 68prog = """ 69int hello(void *ctx) { 70 bpf_trace_printk("Hello, World!\\n"); 71 return 0; 72} 73""" 74 75# load BPF program 76b = BPF(text=prog) 77b.attach_kprobe(event="sys_clone", fn_name="hello") 78 79# header 80print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 81 82# format output 83while 1: 84 try: 85 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 86 except ValueError: 87 continue 88 print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 89``` 90 91This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn: 92 931. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments. 94 951. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it. 96 971. ```b.attach_kprobe(event="sys_clone", fn_name="hello")```: Creates a kprobe for the sys_clone() kernel function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions. 98 991. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT(). 100 101### Lesson 4. sync_timing.py 102 103Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway. 104 105The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's: 106 107``` 108# ./examples/tracing/sync_timing.py 109Tracing for quick sync's... Ctrl-C to end 110At time 0.00 s: multiple syncs detected, last 95 ms ago 111At time 0.10 s: multiple syncs detected, last 96 ms ago 112``` 113 114This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py): 115 116```Python 117from bcc import BPF 118 119# load BPF program 120b = BPF(text=""" 121#include <uapi/linux/ptrace.h> 122#include <linux/blkdev.h> 123 124BPF_HASH(last); 125 126int do_trace(struct pt_regs *ctx) { 127 u64 ts, *tsp, delta, key = 0; 128 129 // attempt to read stored timestamp 130 tsp = last.lookup(&key); 131 if (tsp != 0) { 132 delta = bpf_ktime_get_ns() - *tsp; 133 if (delta < 1000000000) { 134 // output if time is less than 1 second 135 bpf_trace_printk("%d\\n", delta / 1000000); 136 } 137 last.delete(&key); 138 } 139 140 // update stored timestamp 141 ts = bpf_ktime_get_ns(); 142 last.update(&key, &ts); 143 return 0; 144} 145""") 146 147b.attach_kprobe(event="sys_sync", fn_name="do_trace") 148print("Tracing for quick sync's... Ctrl-C to end") 149 150# format output 151start = 0 152while 1: 153 (task, pid, cpu, flags, ts, ms) = b.trace_fields() 154 if start == 0: 155 start = ts 156 ts = ts - start 157 print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms)) 158``` 159 160Things to learn: 161 1621. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds. 1631. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64. 1641. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero. 1651. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer. 1661. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02). 1671. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp. 168 169### Lesson 5. sync_count.py 170 171Modify the sync_timing.py program (prior lesson) to store the count of all sys_sync() calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash. 172 173### Lesson 6. disksnoop.py 174 175Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output: 176 177``` 178# ./disksnoop.py 179TIME(s) T BYTES LAT(ms) 18016458043.436012 W 4096 3.13 18116458043.437326 W 4096 4.44 18216458044.126545 R 4096 42.82 18316458044.129872 R 4096 3.24 184[...] 185``` 186 187And a code snippet: 188 189```Python 190[...] 191REQ_WRITE = 1 # from include/linux/blk_types.h 192 193# load BPF program 194b = BPF(text=""" 195#include <uapi/linux/ptrace.h> 196#include <linux/blkdev.h> 197 198BPF_HASH(start, struct request *); 199 200void trace_start(struct pt_regs *ctx, struct request *req) { 201 // stash start timestamp by request ptr 202 u64 ts = bpf_ktime_get_ns(); 203 204 start.update(&req, &ts); 205} 206 207void trace_completion(struct pt_regs *ctx, struct request *req) { 208 u64 *tsp, delta; 209 210 tsp = start.lookup(&req); 211 if (tsp != 0) { 212 delta = bpf_ktime_get_ns() - *tsp; 213 bpf_trace_printk("%d %x %d\\n", req->__data_len, 214 req->cmd_flags, delta / 1000); 215 start.delete(&req); 216 } 217} 218""") 219 220b.attach_kprobe(event="blk_start_request", fn_name="trace_start") 221b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start") 222b.attach_kprobe(event="blk_account_io_completion", fn_name="trace_completion") 223[...] 224``` 225 226Things to learn: 227 2281. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes. 2291. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```. 2301. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return). 2311. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read()``` directly. 232 233This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next. 234 235### Lesson 7. hello_perf_output.py 236 237Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session: 238 239``` 240# ./hello_perf_output.py 241TIME(s) COMM PID MESSAGE 2420.000000000 bash 22986 Hello, perf_output! 2430.021080275 systemd-udevd 484 Hello, perf_output! 2440.021359520 systemd-udevd 484 Hello, perf_output! 2450.021590610 systemd-udevd 484 Hello, perf_output! 246[...] 247``` 248 249Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py): 250 251```Python 252from bcc import BPF 253import ctypes as ct 254 255# define BPF program 256prog = """ 257#include <linux/sched.h> 258 259// define output data structure in C 260struct data_t { 261 u32 pid; 262 u64 ts; 263 char comm[TASK_COMM_LEN]; 264}; 265BPF_PERF_OUTPUT(events); 266 267int hello(struct pt_regs *ctx) { 268 struct data_t data = {}; 269 270 data.pid = bpf_get_current_pid_tgid(); 271 data.ts = bpf_ktime_get_ns(); 272 bpf_get_current_comm(&data.comm, sizeof(data.comm)); 273 274 events.perf_submit(ctx, &data, sizeof(data)); 275 276 return 0; 277} 278""" 279 280# load BPF program 281b = BPF(text=prog) 282b.attach_kprobe(event="sys_clone", fn_name="hello") 283 284# define output data structure in Python 285TASK_COMM_LEN = 16 # linux/sched.h 286class Data(ct.Structure): 287 _fields_ = [("pid", ct.c_ulonglong), 288 ("ts", ct.c_ulonglong), 289 ("comm", ct.c_char * TASK_COMM_LEN)] 290 291# header 292print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 293 294# process event 295start = 0 296def print_event(cpu, data, size): 297 global start 298 event = ct.cast(data, ct.POINTER(Data)).contents 299 if start == 0: 300 start = event.ts 301 time_s = (float(event.ts - start)) / 1000000000 302 print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid, 303 "Hello, perf_output!")) 304 305# loop with callback to print_event 306b["events"].open_perf_buffer(print_event) 307while 1: 308 b.perf_buffer_poll() 309``` 310 311Things to learn: 312 3131. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space. 3141. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events". 3151. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate. 3161. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user. 3171. ```bpf_get_current_comm()```: Populates the first argument address with the current process name. 3181. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer. 3191. ```class Data(ct.Structure)```: Now define the Python version of the C data structure. 3201. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream. 3211. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream. 3221. ```while 1: b.perf_buffer_poll()```: Block waiting for events. 323 324This may be improved in future bcc versions. Eg, the Python data struct could be auto-generated from the C code. 325 326### Lesson 8. sync_perf_output.py 327 328Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```. 329 330### Lesson 9. bitehist.py 331 332The following tool records a histogram of disk I/O sizes. Sample output: 333 334``` 335# ./bitehist.py 336Tracing... Hit Ctrl-C to end. 337^C 338 kbytes : count distribution 339 0 -> 1 : 3 | | 340 2 -> 3 : 0 | | 341 4 -> 7 : 211 |********** | 342 8 -> 15 : 0 | | 343 16 -> 31 : 0 | | 344 32 -> 63 : 0 | | 345 64 -> 127 : 1 | | 346 128 -> 255 : 800 |**************************************| 347``` 348 349Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py): 350 351```Python 352from bcc import BPF 353from time import sleep 354 355# load BPF program 356b = BPF(text=""" 357#include <uapi/linux/ptrace.h> 358#include <linux/blkdev.h> 359 360BPF_HISTOGRAM(dist); 361 362int kprobe__blk_account_io_completion(struct pt_regs *ctx, struct request *req) 363{ 364 dist.increment(bpf_log2l(req->__data_len / 1024)); 365 return 0; 366} 367""") 368 369# header 370print("Tracing... Hit Ctrl-C to end.") 371 372# trace until Ctrl-C 373try: 374 sleep(99999999) 375except KeyboardInterrupt: 376 print 377 378# output 379b["dist"].print_log2_hist("kbytes") 380``` 381 382A recap from earlier lessons: 383 384- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe. 385- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_completion()```. 386- ```req->__data_len```: Dereferencing that member. 387 388New things to learn: 389 3901. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist". 3911. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument. 3921. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram. 3931. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient. 394 395### Lesson 10. disklatency.py 396 397Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson. 398 399### Lesson 11. vfsreadlat.py 400 401This example is split into separate Python and C files. Example output: 402 403``` 404# ./vfsreadlat.py 1 405Tracing... Hit Ctrl-C to end. 406 usecs : count distribution 407 0 -> 1 : 0 | | 408 2 -> 3 : 2 |*********** | 409 4 -> 7 : 7 |****************************************| 410 8 -> 15 : 4 |********************** | 411 412 usecs : count distribution 413 0 -> 1 : 29 |****************************************| 414 2 -> 3 : 28 |************************************** | 415 4 -> 7 : 4 |***** | 416 8 -> 15 : 8 |*********** | 417 16 -> 31 : 0 | | 418 32 -> 63 : 0 | | 419 64 -> 127 : 0 | | 420 128 -> 255 : 0 | | 421 256 -> 511 : 2 |** | 422 512 -> 1023 : 0 | | 423 1024 -> 2047 : 0 | | 424 2048 -> 4095 : 0 | | 425 4096 -> 8191 : 4 |***** | 426 8192 -> 16383 : 6 |******** | 427 16384 -> 32767 : 9 |************ | 428 32768 -> 65535 : 6 |******** | 429 65536 -> 131071 : 2 |** | 430 431 usecs : count distribution 432 0 -> 1 : 11 |****************************************| 433 2 -> 3 : 2 |******* | 434 4 -> 7 : 10 |************************************ | 435 8 -> 15 : 8 |***************************** | 436 16 -> 31 : 1 |*** | 437 32 -> 63 : 2 |******* | 438[...] 439``` 440 441Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn: 442 4431. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file. 4441. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry. 4451. ```b["dist"].clear()```: Clears the histogram. 446 447### Lesson 12. urandomread.py 448 449Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run: 450 451``` 452# ./urandomread.py 453TIME(s) COMM PID GOTBITS 45424652832.956994001 smtp 24690 384 45524652837.726500999 dd 24692 65536 45624652837.727111001 dd 24692 65536 45724652837.727703001 dd 24692 65536 45824652837.728294998 dd 24692 65536 45924652837.728888001 dd 24692 65536 460``` 461 462Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py): 463 464```Python 465from bcc import BPF 466 467# load BPF program 468b = BPF(text=""" 469TRACEPOINT_PROBE(random, urandom_read) { 470 // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format 471 bpf_trace_printk("%d\\n", args->got_bits); 472 return 0; 473}; 474""") 475 476# header 477print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS")) 478 479# format output 480while 1: 481 try: 482 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 483 except ValueError: 484 continue 485 print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 486``` 487 488Things to learn: 489 4901. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints. 4911. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg: 492 493``` 494# cat /sys/kernel/debug/tracing/events/random/urandom_read/format 495name: urandom_read 496ID: 972 497format: 498 field:unsigned short common_type; offset:0; size:2; signed:0; 499 field:unsigned char common_flags; offset:2; size:1; signed:0; 500 field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 501 field:int common_pid; offset:4; size:4; signed:1; 502 503 field:int got_bits; offset:8; size:4; signed:1; 504 field:int pool_left; offset:12; size:4; signed:1; 505 field:int input_left; offset:16; size:4; signed:1; 506 507print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left 508``` 509 510In this case, we were printing the ```got_bits``` member. 511 512### Lesson 13. disksnoop.py fixed 513 514Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints. 515 516### Lesson 14. strlen_count.py 517 518This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output: 519 520``` 521# ./strlen_count.py 522Tracing strlen()... Hit Ctrl-C to end. 523^C COUNT STRING 524 1 " " 525 1 "/bin/ls" 526 1 "." 527 1 "cpudist.py.1" 528 1 ".bashrc" 529 1 "ls --color=auto" 530 1 "key_t" 531[...] 532 10 "a7:~# " 533 10 "/root" 534 12 "LC_ALL" 535 12 "en_US.UTF-8" 536 13 "en_US.UTF-8" 537 20 "~" 538 70 "#%^,~:-=?+/}" 539 340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# " 540``` 541 542These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example. 543 544Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py): 545 546```Python 547from bcc import BPF 548from time import sleep 549 550# load BPF program 551b = BPF(text=""" 552#include <uapi/linux/ptrace.h> 553 554struct key_t { 555 char c[80]; 556}; 557BPF_HASH(counts, struct key_t); 558 559int count(struct pt_regs *ctx) { 560 if (!PT_REGS_PARM1(ctx)) 561 return 0; 562 563 struct key_t key = {}; 564 u64 zero = 0, *val; 565 566 bpf_probe_read(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx)); 567 568 // another possibility is using `counts.increment(key);`. It allows a second 569 // optional parameter to specify the increment step 570 val = counts.lookup_or_init(&key, &zero); 571 (*val)++; 572 return 0; 573}; 574""") 575b.attach_uprobe(name="c", sym="strlen", fn_name="count") 576 577# header 578print("Tracing strlen()... Hit Ctrl-C to end.") 579 580# sleep until Ctrl-C 581try: 582 sleep(99999999) 583except KeyboardInterrupt: 584 pass 585 586# print output 587print("%10s %s" % ("COUNT", "STRING")) 588counts = b.get_table("counts") 589for k, v in sorted(counts.items(), key=lambda counts: counts[1].value): 590 print("%10d \"%s\"" % (v.value, k.c.encode('string-escape'))) 591``` 592 593Things to learn: 594 5951. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string. 5961. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```. 597 598### Lesson 15. nodejs_http_server.py 599 600This program instruments a user-defined static tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output: 601 602``` 603# ./nodejs_http_server.py 24728 604TIME(s) COMM PID ARGS 60524653324.561322998 node 24728 path:/index.html 60624653335.343401998 node 24728 path:/images/welcome.png 60724653340.510164998 node 24728 path:/images/favicon.png 608``` 609 610Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py): 611 612```Python 613if len(sys.argv) < 2: 614 print("USAGE: nodejs_http_server PID") 615 exit() 616pid = sys.argv[1] 617 618# load BPF program 619bpf_text = """ 620#include <uapi/linux/ptrace.h> 621int do_trace(struct pt_regs *ctx) { 622 uint64_t addr; 623 char path[128]; 624 bpf_usdt_readarg(6, ctx, &addr); 625 bpf_probe_read(&path, sizeof(path), (void *)addr); 626 bpf_trace_printk("path:%s\\n", path); 627 return 0; 628}; 629""" 630 631# enable USDT probe from given PID 632u = USDT(pid=int(pid)) 633u.enable_probe(probe="http__server__request", fn_name="do_trace") 634 635# initialize BPF 636b = BPF(text=bpf_text, usdt_contexts=[u]) 637``` 638 639Things to learn: 640 6411. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```. 6421. ```bpf_probe_read(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable. 6431. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID. 6441. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe. 6451. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation. 646 647### Lesson 16. task_switch.c 648 649This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned. 650 651This is a slightly more complex tracing example than Hello World. This program 652will be invoked for every task change in the kernel, and record in a BPF map 653the new and old pids. 654 655The C program below introduces two new concepts. 656The first is the macro `BPF_TABLE`. This defines a table (type="hash"), with key 657type `key_t` and leaf type `u64` (a single counter). The table name is `stats`, 658containing 1024 entries maximum. One can `lookup`, `lookup_or_init`, `update`, 659and `delete` entries from the table. 660The second concept is the prev argument. This argument is treated specially by 661the BCC frontend, such that accesses to this variable are read from the saved 662context that is passed by the kprobe infrastructure. The prototype of the args 663starting from position 1 should match the prototype of the kernel function being 664kprobed. If done so, the program will have seamless access to the function 665parameters. 666 667```c 668#include <uapi/linux/ptrace.h> 669#include <linux/sched.h> 670 671struct key_t { 672 u32 prev_pid; 673 u32 curr_pid; 674}; 675// map_type, key_type, leaf_type, table_name, num_entry 676BPF_HASH(stats, struct key_t, u64, 1024); 677// attach to finish_task_switch in kernel/sched/core.c, which has the following 678// prototype: 679// struct rq *finish_task_switch(struct task_struct *prev) 680int count_sched(struct pt_regs *ctx, struct task_struct *prev) { 681 struct key_t key = {}; 682 u64 zero = 0, *val; 683 684 key.curr_pid = bpf_get_current_pid_tgid(); 685 key.prev_pid = prev->pid; 686 687 // another possibility is using `counts.increment(key);`. It allows a second 688 // optional parameter to specify the increment step 689 val = stats.lookup_or_init(&key, &zero); 690 (*val)++; 691 return 0; 692} 693``` 694 695The userspace component loads the file shown above, and attaches it to the 696`finish_task_switch` kernel function. 697The [] operator of the BPF object gives access to each BPF_TABLE in the 698program, allowing pass-through access to the values residing in the kernel. Use 699the object as you would any other python dict object: read, update, and deletes 700are all allowed. 701```python 702from bcc import BPF 703from time import sleep 704 705b = BPF(src_file="task_switch.c") 706b.attach_kprobe(event="finish_task_switch", fn_name="count_sched") 707 708# generate many schedule events 709for i in range(0, 100): sleep(0.01) 710 711for k, v in b["stats"].items(): 712 print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) 713``` 714 715These programs have now been merged, and are both in [examples/tracing/task_switch.py](examples/tracing/task_switch.py). 716 717### Lesson 17. Further Study 718 719For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study. 720 721Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contrubite tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing! 722 723## Networking 724 725To do. 726