1Demonstrations of zfsslower, the Linux eBPF/bcc version.
2
3
4zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold.
5It has been written to work on ZFS on Linux (http://zfsonlinux.org). For
6example:
7
8# ./zfsslower
9Tracing ZFS operations slower than 10 ms
10TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
1106:31:28 dd             25570  W 131072  38784     303.92 data1
1206:31:34 dd             25686  W 131072  38784     388.28 data1
1306:31:35 dd             25686  W 131072  78720     519.66 data1
1406:31:35 dd             25686  W 131072  116992    405.94 data1
1506:31:35 dd             25686  W 131072  153600    433.52 data1
1606:31:36 dd             25686  W 131072  188672    314.37 data1
1706:31:36 dd             25686  W 131072  222336    372.33 data1
1806:31:36 dd             25686  W 131072  254592    309.59 data1
1906:31:37 dd             25686  W 131072  285440    304.52 data1
2006:31:37 dd             25686  W 131072  315008    236.45 data1
2106:31:37 dd             25686  W 131072  343424    193.54 data1
2206:31:38 dd             25686  W 131072  370560    286.07 data1
2306:31:38 dd             25686  W 131072  396672    251.92 data1
24[...]
25
26This shows writes to a "data1" file, each taking well over the 10 ms threshold.
27the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte
28write by the "dd" command.
29
30This "latency" is measured from when the operation was issued from the VFS
31interface to the file system (via the ZFS POSIX layer), to when it completed.
32This spans everything: block device I/O (disk I/O), file system CPU cycles,
33file system locks, run queue latency, etc. This is a better measure of the
34latency suffered by applications reading from the file system than measuring
35this down at the block device interface.
36
37Note that this only traces the common file system operations previously
38listed: other file system operations (eg, inode operations including
39getattr()) are not traced.
40
41
42A threshold of 0 will trace all operations. Warning: the output will be
43verbose, as it will include all file system cache hits.
44
45# ./zfsslower 0
46Tracing ZFS operations
47TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
4806:36:07 dd             32242  O 0       0           0.01 data1
4906:36:07 dd             32242  W 131072  0           0.25 data1
5006:36:07 dd             32242  W 131072  128         0.03 data1
5106:36:07 dd             32242  W 131072  256         0.04 data1
5206:36:07 dd             32242  W 131072  384         0.04 data1
5306:36:07 dd             32242  W 131072  512         0.04 data1
5406:36:07 dd             32242  W 131072  640         0.03 data1
5506:36:07 dd             32242  W 131072  768         0.03 data1
5606:36:07 dd             32242  W 131072  896         0.04 data1
5706:36:07 dd             32242  W 131072  1024        0.28 data1
5806:36:07 dd             32242  W 131072  1152        0.04 data1
5906:36:07 dd             32242  W 131072  1280        0.03 data1
60[...]
6106:36:07 dd             32242  W 131072  13824       0.04 data1
6206:36:07 dd             32242  W 131072  13952       0.04 data1
6306:36:07 dd             32242  W 131072  14080       0.04 data1
6406:36:07 dd             32242  W 131072  14208     398.92 data1
6506:36:07 dd             32242  W 131072  14336       0.04 data1
6606:36:07 dd             32242  W 131072  14464       0.04 data1
6706:36:07 dd             32242  W 131072  15104       0.03 data1
68[...]
69
70The output now includes the open operation for this file ("O"), and then the
71writes. Most of the writes are very fast, with only an occasional outlier that
72is in the hundreds of milliseconds.
73
74Fortunately this is not a real world environment: I setup a zpool on top of a
75XFS file system for testing purposes. More debugging using other tools will
76explain these outliers: possibly XFS flushing.
77
78
79Here's a random read workload, and showing operations slower than 1 ms:
80
81# ./zfsslower 1
82Tracing ZFS operations slower than 1 ms
83TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
8406:47:30 randread.pl    15431  R 8192    97840       1.03 data1
8506:47:30 randread.pl    15431  R 8192    416744      1.12 data1
8606:47:31 randread.pl    15431  R 8192    228856      1.96 data1
8706:47:31 randread.pl    15431  R 8192    452248      1.02 data1
8806:47:31 randread.pl    15431  R 8192    315288      5.90 data1
8906:47:31 randread.pl    15431  R 8192    752696      1.20 data1
9006:47:31 randread.pl    15431  R 8192    481832      1.39 data1
9106:47:31 randread.pl    15431  R 8192    673752      1.39 data1
9206:47:31 randread.pl    15431  R 8192    691736      1.01 data1
9306:47:31 randread.pl    15431  R 8192    694776      1.78 data1
9406:47:31 randread.pl    15431  R 8192    403328      3.75 data1
9506:47:31 randread.pl    15431  R 8192    567688      1.08 data1
9606:47:31 randread.pl    15431  R 8192    694280      1.31 data1
9706:47:31 randread.pl    15431  R 8192    669280      1.06 data1
9806:47:31 randread.pl    15431  R 8192    426608      1.56 data1
9906:47:31 randread.pl    15431  R 8192    42512       1.01 data1
10006:47:31 randread.pl    15431  R 8192    22944       1.33 data1
10106:47:31 randread.pl    15431  R 8192    427432      1.48 data1
10206:47:31 randread.pl    15431  R 8192    261320      1.28 data1
10306:47:31 randread.pl    15431  R 8192    132248      1.23 data1
10406:47:31 randread.pl    15431  R 8192    96936       1.04 data1
10506:47:31 randread.pl    15431  R 8192    482800      2.63 data1
106[...]
107
108
109A -j option will print just the fields (parsable output, csv):
110
111# ./zfsslower -j 1
112ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
113252305490911,randread.pl,17922,R,8192,163446784,1156,data1
114252305493852,randread.pl,17922,R,8192,321437696,1129,data1
115252305498839,randread.pl,17922,R,8192,475152384,1154,data1
116252305505515,randread.pl,17922,R,8192,49094656,1082,data1
117252305506774,randread.pl,17922,R,8192,470401024,1245,data1
118252305509265,randread.pl,17922,R,8192,553246720,2412,data1
119252305512365,randread.pl,17922,R,8192,20963328,1093,data1
120252305513755,randread.pl,17922,R,8192,304111616,1350,data1
121252305583330,randread.pl,17922,R,8192,166174720,1154,data1
122252305593913,randread.pl,17922,R,8192,175079424,1241,data1
123252305602833,randread.pl,17922,R,8192,305340416,3307,data1
124252305608663,randread.pl,17922,R,8192,655958016,2704,data1
125252305611212,randread.pl,17922,R,8192,40951808,1033,data1
126252305614609,randread.pl,17922,R,8192,318922752,2687,data1
127252305623800,randread.pl,17922,R,8192,246734848,2983,data1
128252305711125,randread.pl,17922,R,8192,581795840,1091,data1
129252305728694,randread.pl,17922,R,8192,710483968,1034,data1
130252305762046,randread.pl,17922,R,8192,329367552,1405,data1
131252305798215,randread.pl,17922,R,8192,44482560,1030,data1
132252305806748,randread.pl,17922,R,8192,660602880,1069,data1
133252305826360,randread.pl,17922,R,8192,616144896,2327,data1
134[...]
135
136
137USAGE message:
138
139# ./zfsslower -h
140usage: zfsslower [-h] [-j] [-p PID] [min_ms]
141
142Trace common ZFS file operations slower than a threshold
143
144positional arguments:
145  min_ms             minimum I/O duration to trace, in ms (default 10)
146
147optional arguments:
148  -h, --help         show this help message and exit
149  -j, --csv          just print fields: comma-separated values
150  -p PID, --pid PID  trace this PID only
151
152examples:
153    ./zfsslower             # trace operations slower than 10 ms (default)
154    ./zfsslower 1           # trace operations slower than 1 ms
155    ./zfsslower -j 1        # ... 1 ms, parsable output (csv)
156    ./zfsslower 0           # trace all operations (warning: verbose)
157    ./zfsslower -p 185      # trace PID 185 only
158