1Demonstrations of nfsdist, the Linux eBPF/bcc version.
2
3nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their
4latency as a power-of-2 histogram. For example:
5
6
7./nfsdist.py
8
9Tracing NFS operation latency... Hit Ctrl-C to end.
10
11operation = read
12     usecs               : count     distribution
13         0 -> 1          : 4        |                                        |
14         2 -> 3          : 0        |                                        |
15         4 -> 7          : 0        |                                        |
16         8 -> 15         : 7107     |**************                          |
17        16 -> 31         : 19864    |****************************************|
18        32 -> 63         : 1494     |***                                     |
19        64 -> 127        : 491      |                                        |
20       128 -> 255        : 1810     |***                                     |
21       256 -> 511        : 6356     |************                            |
22       512 -> 1023       : 4860     |*********                               |
23      1024 -> 2047       : 3070     |******                                  |
24      2048 -> 4095       : 1853     |***                                     |
25      4096 -> 8191       : 921      |*                                       |
26      8192 -> 16383      : 122      |                                        |
27     16384 -> 32767      : 15       |                                        |
28     32768 -> 65535      : 5        |                                        |
29     65536 -> 131071     : 2        |                                        |
30    131072 -> 262143     : 1        |                                        |
31
32operation = write
33     usecs               : count     distribution
34         0 -> 1          : 0        |                                        |
35         2 -> 3          : 0        |                                        |
36         4 -> 7          : 0        |                                        |
37         8 -> 15         : 1        |                                        |
38        16 -> 31         : 0        |                                        |
39        32 -> 63         : 9        |                                        |
40        64 -> 127        : 19491    |****************************************|
41       128 -> 255        : 3064     |******                                  |
42       256 -> 511        : 940      |*                                       |
43       512 -> 1023       : 365      |                                        |
44      1024 -> 2047       : 312      |                                        |
45      2048 -> 4095       : 119      |                                        |
46      4096 -> 8191       : 31       |                                        |
47      8192 -> 16383      : 84       |                                        |
48     16384 -> 32767      : 31       |                                        |
49     32768 -> 65535      : 5        |                                        |
50     65536 -> 131071     : 3        |                                        |
51    131072 -> 262143     : 0        |                                        |
52    262144 -> 524287     : 1        |                                        |
53
54operation = getattr
55     usecs               : count     distribution
56         0 -> 1          : 27       |****************************************|
57         2 -> 3          : 2        |**                                      |
58         4 -> 7          : 3        |****                                    |
59         8 -> 15         : 0        |                                        |
60        16 -> 31         : 0        |                                        |
61        32 -> 63         : 0        |                                        |
62        64 -> 127        : 0        |                                        |
63       128 -> 255        : 0        |                                        |
64       256 -> 511        : 2        |**                                      |
65       512 -> 1023       : 2        |**                                      |
66
67operation = open
68     usecs               : count     distribution
69         0 -> 1          : 0        |                                        |
70         2 -> 3          : 0        |                                        |
71         4 -> 7          : 0        |                                        |
72         8 -> 15         : 0        |                                        |
73        16 -> 31         : 0        |                                        |
74        32 -> 63         : 0        |                                        |
75        64 -> 127        : 0        |                                        |
76       128 -> 255        : 0        |                                        |
77       256 -> 511        : 2        |****************************************|
78
79
80In this example you can see that the read traffic is rather bi-modal, with about
8126K reads falling within 8 - 30 usecs and about 18K reads spread between 128 -
828191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket.
83The faster read traffic is probably coming from a filesystem cache and the slower
84traffic from disk. The reason why the writes are so consistently fast is because
85this example test was run on a couple of VM's and I believe the hypervisor was
86caching all the write traffic to memory.
87
88This "latency" is measured from when the operation was issued from the VFS
89interface to the file system, to when it completed. This spans everything:
90RPC latency, network latency, file system CPU cycles, file system locks, run
91queue latency, etc. This is a better measure of the latency suffered by
92applications reading from a NFS share and can better expose problems
93experienced by NFS clients.
94
95Note that this only traces the common NFS operations (read, write, open and
96getattr). I chose to include getattr as a significant percentage of NFS
97traffic end up being getattr calls and are a good indicator of problems
98with an NFS server.
99
100An optional interval and a count can be provided, as well as -m to show the
101distributions in milliseconds. For example:
102
103./nfsdist -m 1 5
104Tracing NFS operation latency... Hit Ctrl-C to end.
105
10611:02:39:
107
108operation = write
109     msecs               : count     distribution
110         0 -> 1          : 1        |                                        |
111         2 -> 3          : 24       |********                                |
112         4 -> 7          : 114      |****************************************|
113         8 -> 15         : 9        |***                                     |
114        16 -> 31         : 1        |                                        |
115        32 -> 63         : 1        |                                        |
116
11711:02:40:
118
119operation = write
120     msecs               : count     distribution
121         0 -> 1          : 0        |                                        |
122         2 -> 3          : 11       |***                                     |
123         4 -> 7          : 111      |****************************************|
124         8 -> 15         : 13       |****                                    |
125        16 -> 31         : 1        |                                        |
126
12711:02:41:
128
129operation = write
130     msecs               : count     distribution
131         0 -> 1          : 0        |                                        |
132         2 -> 3          : 21       |******                                  |
133         4 -> 7          : 137      |****************************************|
134         8 -> 15         : 3        |                                        |
135
136This shows a write workload, with writes hovering primarily in the 4-7ms range.
137
138USAGE message:
139
140
141./nfsdist -h
142usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count]
143
144Summarize NFS operation latency
145
146positional arguments:
147  interval            output interval, in seconds
148  count               number of outputs
149
150optional arguments:
151  -h, --help          show this help message and exit
152  -T, --notimestamp   don't include timestamp on interval output
153  -m, --milliseconds  output in milliseconds
154  -p PID, --pid PID   trace this PID only
155
156examples:
157    ./nfsdist            # show operation latency as a histogram
158    ./nfsdist -p 181     # trace PID 181 only
159    ./nfsdist 1 10       # print 1 second summaries, 10 times
160    ./nfsdist -m 5       # 5s summaries, milliseconds
161