1Demonstrations of nfsslower, the Linux eBPF/bcc version.
2
3nfsslower show NFS reads, writes, opens and getattrs, slower than a
4threshold. For example:
5
6./nfsslower.py
7Tracing NFS operations that are slower than 10 ms
8TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
911:25:16 dd             21295  W 1048576 15360      14.84 1.test
1011:25:16 dd             21295  W 1048576 16384      12.73 1.test
1111:25:16 dd             21295  W 1048576 17408      24.27 1.test
1211:25:16 dd             21295  W 1048576 18432      22.93 1.test
1311:25:16 dd             21295  W 1048576 19456      14.65 1.test
1411:25:16 dd             21295  W 1048576 20480      12.58 1.test
1511:25:16 dd             21297  W 1048576 6144       10.50 1.test.w
1611:25:16 dd             21297  W 1048576 7168       16.65 1.test.w
1711:25:16 dd             21297  W 1048576 8192       13.01 1.test.w
1811:25:16 dd             21297  W 1048576 9216       14.06 1.test.w
19
20This shows NFS writes from dd each 1MB in size to 2 different files. The
21writes all had latency higher than 10ms.
22
23This "latency" is measured from when the operation was issued from the VFS
24interface to the file system, to when it completed. This spans everything:
25RPC latency, network latency, file system CPU cycles, file system locks, run
26queue latency, etc. This is a better measure of the latency suffered by
27applications reading from a NFS share and can better expose problems
28experienced by NFS clients.
29
30Note that this only traces the common NFS operations (read,write,open and
31getattr). I chose to include getattr as a significant percentage of NFS
32traffic end up being getattr calls and are a good indicator of problems
33with an NFS server.
34
35The threshold can be provided as an argument. E.g. I/O slower than 1 ms:
36
37./nfsslower.py 1
38Tracing NFS operations that are slower than 1 ms
39TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
4011:40:16 cp             21583  R 131072  0           4.35 1.test
4111:40:16 cp             21583  R 131072  256         1.87 1.test
4211:40:16 cp             21583  R 131072  384         2.99 1.test
4311:40:16 cp             21583  R 131072  512         4.19 1.test
4411:40:16 cp             21583  R 131072  640         4.25 1.test
4511:40:16 cp             21583  R 131072  768         4.65 1.test
4611:40:16 cp             21583  R 131072  1280        1.08 1.test
4711:40:16 cp             21583  R 131072  1408        3.29 1.test
4811:40:16 cp             21583  R 131072  1792        3.12 1.test
4911:40:16 cp             21583  R 131072  3712        3.55 1.test
5011:40:16 cp             21583  R 131072  3840        1.12 1.test
5111:40:16 cp             21583  R 131072  4096        3.23 1.test
5211:40:16 cp             21583  R 131072  4224        2.73 1.test
5311:40:16 cp             21583  R 131072  4352        2.73 1.test
5411:40:16 cp             21583  R 131072  4480        6.09 1.test
5511:40:16 cp             21583  R 131072  5120        4.40 1.test
56[...]
57
58This shows all NFS_READS that were more than 1ms. Depending on your
59latency to your fileserver, you might need to tweak this value to
60remove
61
62A threshold of 0 will trace all operations. Warning: the output will be
63verbose, as it will include all file system cache hits.
64
65./nfsslower.py 0
66Tracing NFS operations
6711:56:50 dd             21852  W 1048576 0           0.42 1.test
6811:56:50 dd             21852  W 1048576 1024        0.46 1.test
6911:56:50 dd             21852  W 1048576 2048        0.36 1.test
7011:56:50 cp             21854  G 0       0           0.35 1.test
7111:56:50 cp             21854  O 0       0           0.33 1.test
7211:56:50 cp             21854  G 0       0           0.00 1.test
7311:56:50 cp             21854  R 131072  0           0.07 1.test
7411:56:50 cp             21854  R 131072  128         0.02 1.test
7511:56:50 cp             21854  R 131072  256         0.02 1.test
7611:56:50 cp             21854  R 131072  384         0.02 1.test
7711:56:50 cp             21854  R 131072  512         0.02 1.test
7811:56:50 cp             21854  R 131072  640         0.02 1.test
7911:56:50 cp             21854  R 131072  768         0.02 1.test
8011:56:50 cp             21854  R 131072  896         0.02 1.test
8111:56:50 cp             21854  R 131072  1024        0.02 1.test
8211:56:50 cp             21854  R 131072  1152        0.02 1.test
8311:56:50 cp             21854  R 131072  1280        0.02 1.test
8411:56:50 cp             21854  R 131072  1408        0.02 1.test
8511:56:50 cp             21854  R 131072  1536        0.02 1.test
8611:56:50 cp             21854  R 131072  1664        0.02 1.test
8711:56:50 cp             21854  R 131072  1792        0.02 1.test
8811:56:50 cp             21854  R 131072  1920        0.02 1.test
8911:56:50 cp             21854  R 131072  2048        0.02 1.test
9011:56:50 cp             21854  R 131072  2176        0.04 1.test
9111:56:50 cp             21854  R 131072  2304        0.02 1.test
9211:56:50 cp             21854  R 131072  2432        0.03 1.test
9311:56:50 cp             21854  R 131072  2560        0.03 1.test
9411:56:50 cp             21854  R 131072  2688        0.02 1.test
9511:56:50 cp             21854  R 131072  2816        0.03 1.test
9611:56:50 cp             21854  R 131072  2944        0.02 1.test
9711:56:50 cp             21854  R 0       3072        0.00 1.test
9811:56:50 ls             21855  G 0       0           0.00 1.test
9911:56:50 ls             21856  G 0       0           0.36 music
10011:56:50 ls             21856  G 0       0           0.00 music
10111:56:50 ls             21856  G 0       0           0.00 test
10211:56:50 ls             21856  G 0       0           0.00 ff
10311:56:50 ls             21856  G 0       0           0.00 34.log
10411:56:50 ls             21856  G 0       0           0.00 vmlinuz-linux
10511:56:50 ls             21856  G 0       0           0.00 2.test
10611:56:50 ls             21856  G 0       0           0.00 rt.log
10711:56:50 ls             21856  G 0       0           0.00 1.lod
10811:56:50 ls             21856  G 0       0           0.00 COPYRIGHT.txt
10911:56:50 ls             21856  G 0       0           0.00 gg
11011:56:50 ls             21856  G 0       0           0.00 qw.log
11111:56:50 ls             21856  G 0       0           0.00 README.md
11211:56:50 ls             21856  G 0       0           0.00 1.log
113
114The output now includes open operations ("O"), and reads ("R") wand getattrs ("G").
115A cp operation
116
117
118A -j option will print just the fields (parsable output, csv):
119
120./nfsslower.py -j 0
121ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
12287054476520,dd,22754,W,1048576,0,425,1.test
12387054482916,dd,22754,W,1048576,1048576,320,1.test
12487054488179,dd,22754,W,1048576,2097152,389,1.test
12587054511340,cp,22756,G,0,0,371,1.test
12687054511685,cp,22756,O,0,0,306,1.test
12787054511700,cp,22756,G,0,0,2,1.test
12887054512325,cp,22756,R,131072,0,56,1.test
12987054512432,cp,22756,R,131072,131072,22,1.test
13087054512520,cp,22756,R,131072,262144,32,1.test
13187054512600,cp,22756,R,131072,393216,21,1.test
13287054512678,cp,22756,R,131072,524288,21,1.test
13387054512803,cp,22756,R,131072,655360,56,1.test
134
135This may be useful for visualizing with another tool, for example, for
136producing a scatter plot of ENDTIME vs LATENCY, to look for time-based
137patterns.
138
139USAGE message:
140
141usage: nfsslower.py [-h] [-j] [-p PID] [min_ms]
142
143Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4}
144
145positional arguments:
146  min_ms             Minimum IO duration to trace in ms (default=10ms)
147
148optional arguments:
149  -h, --help         show this help message and exit
150  -j, --csv          just print fields: comma-separated values
151  -p PID, --pid PID  Trace this pid only
152
153 ./nfsslower         # trace operations slower than 10ms
154 ./nfsslower 1       # trace operations slower than 1ms
155 ./nfsslower -j 1    # ... 1 ms, parsable output (csv)
156 ./nfsslower 0       # trace all nfs operations
157 ./nfsslower -p 121  # trace pid 121 only
158
159