1Demonstrations of runqlen, the Linux eBPF/bcc version.
2
3
4This program summarizes scheduler queue length as a histogram, and can also
5show run queue occupancy. It works by sampling the run queue length on all
6CPUs at 99 Hertz.
7
8As an example, here is an idle system:
9
10# ./runqlen.py
11Sampling run queue length... Hit Ctrl-C to end.
12^C
13     runqlen       : count     distribution
14        0          : 1776     |****************************************|
15
16This shows a zero run queue length each time it was sampled.
17
18And now a heavily loaded system:
19
20# ./runqlen.py
21Sampling run queue length... Hit Ctrl-C to end.
22^C
23     runqlen       : count     distribution
24        0          : 1068     |****************************************|
25        1          : 642      |************************                |
26        2          : 369      |*************                           |
27        3          : 183      |******                                  |
28        4          : 104      |***                                     |
29        5          : 42       |*                                       |
30        6          : 13       |                                        |
31        7          : 2        |                                        |
32        8          : 1        |                                        |
33
34Now there is often threads queued, with one sample reaching a queue length
35of 8. This will cause run queue latency, which can be measured by the bcc
36runqlat tool.
37
38
39Here's an example of an issue that runqlen can indentify. Starting with the
40system-wide summary:
41
42# ./runqlen.py
43Sampling run queue length... Hit Ctrl-C to end.
44^C
45     runqlen       : count     distribution
46        0          : 1209     |****************************************|
47        1          : 372      |************                            |
48        2          : 73       |**                                      |
49        3          : 3        |                                        |
50        4          : 1        |                                        |
51        5          : 0        |                                        |
52        6          : 0        |                                        |
53        7          : 237      |*******                                 |
54
55This shows there is often a run queue length of 7. Now using the -C option to
56see per-CPU histograms:
57
58# ./runqlen.py -C
59Sampling run queue length... Hit Ctrl-C to end.
60^C
61
62cpu = 0
63     runqlen       : count     distribution
64        0          : 257      |****************************************|
65        1          : 64       |*********                               |
66        2          : 5        |                                        |
67        3          : 0        |                                        |
68        4          : 0        |                                        |
69        5          : 0        |                                        |
70        6          : 1        |                                        |
71
72cpu = 1
73     runqlen       : count     distribution
74        0          : 226      |****************************************|
75        1          : 90       |***************                         |
76        2          : 11       |*                                       |
77
78cpu = 2
79     runqlen       : count     distribution
80        0          : 264      |****************************************|
81        1          : 52       |*******                                 |
82        2          : 8        |*                                       |
83        3          : 1        |                                        |
84        4          : 0        |                                        |
85        5          : 0        |                                        |
86        6          : 1        |                                        |
87        7          : 0        |                                        |
88        8          : 1        |                                        |
89
90cpu = 3
91     runqlen       : count     distribution
92        0          : 0        |                                        |
93        1          : 0        |                                        |
94        2          : 0        |                                        |
95        3          : 0        |                                        |
96        4          : 0        |                                        |
97        5          : 0        |                                        |
98        6          : 0        |                                        |
99        7          : 327      |****************************************|
100
101cpu = 4
102     runqlen       : count     distribution
103        0          : 255      |****************************************|
104        1          : 63       |*********                               |
105        2          : 9        |*                                       |
106
107cpu = 5
108     runqlen       : count     distribution
109        0          : 244      |****************************************|
110        1          : 78       |************                            |
111        2          : 3        |                                        |
112        3          : 2        |                                        |
113
114cpu = 6
115     runqlen       : count     distribution
116        0          : 253      |****************************************|
117        1          : 66       |**********                              |
118        2          : 6        |                                        |
119        3          : 1        |                                        |
120        4          : 1        |                                        |
121
122cpu = 7
123     runqlen       : count     distribution
124        0          : 243      |****************************************|
125        1          : 74       |************                            |
126        2          : 6        |                                        |
127        3          : 1        |                                        |
128        4          : 0        |                                        |
129        5          : 1        |                                        |
130        6          : 2        |                                        |
131
132The run queue length of 7 is isolated to CPU 3. It was caused by CPU binding
133(taskset). This can sometimes happen by applications that try to auto-bind
134to CPUs, leaving other CPUs idle while work is queued.
135
136
137runqlat accepts an interval and a count. For example, with -T for timestamps:
138
139# ./runqlen.py -T 1 5
140Sampling run queue length... Hit Ctrl-C to end.
141
14219:51:34
143     runqlen       : count     distribution
144        0          : 635      |****************************************|
145        1          : 142      |********                                |
146        2          : 13       |                                        |
147        3          : 0        |                                        |
148        4          : 1        |                                        |
149
15019:51:35
151     runqlen       : count     distribution
152        0          : 640      |****************************************|
153        1          : 136      |********                                |
154        2          : 13       |                                        |
155        3          : 1        |                                        |
156        4          : 0        |                                        |
157        5          : 0        |                                        |
158        6          : 0        |                                        |
159        7          : 0        |                                        |
160        8          : 0        |                                        |
161        9          : 0        |                                        |
162        10         : 1        |                                        |
163
16419:51:36
165     runqlen       : count     distribution
166        0          : 603      |****************************************|
167        1          : 170      |***********                             |
168        2          : 16       |*                                       |
169        3          : 1        |                                        |
170        4          : 0        |                                        |
171        5          : 0        |                                        |
172        6          : 0        |                                        |
173        7          : 0        |                                        |
174        8          : 0        |                                        |
175        9          : 1        |                                        |
176
17719:51:37
178     runqlen       : count     distribution
179        0          : 617      |****************************************|
180        1          : 154      |*********                               |
181        2          : 20       |*                                       |
182        3          : 0        |                                        |
183        4          : 0        |                                        |
184        5          : 0        |                                        |
185        6          : 0        |                                        |
186        7          : 0        |                                        |
187        8          : 0        |                                        |
188        9          : 0        |                                        |
189        10         : 0        |                                        |
190        11         : 1        |                                        |
191
19219:51:38
193     runqlen       : count     distribution
194        0          : 603      |****************************************|
195        1          : 161      |**********                              |
196        2          : 24       |*                                       |
197        3          : 4        |                                        |
198
199The spikes in run queue length of 11 are likely threads waking up at the same
200time (a thundering herd), and then are scheduled and complete their execution
201quickly.
202
203
204The -O option prints run queue occupancy: the percentage of time that there
205was work queued waiting its turn. Eg:
206
207# ./runqlen.py -OT 1
208Sampling run queue length... Hit Ctrl-C to end.
209
21019:54:53
211runqocc: 41.09%
212
21319:54:54
214runqocc: 41.85%
215
21619:54:55
217runqocc: 41.47%
218
21919:54:56
220runqocc: 42.35%
221
22219:54:57
223runqocc: 40.83%
224[...]
225
226This can also be examined per-CPU:
227
228# ./runqlen.py -COT 1
229Sampling run queue length... Hit Ctrl-C to end.
230
23119:55:03
232runqocc, CPU 0    32.32%
233runqocc, CPU 1    26.26%
234runqocc, CPU 2    38.38%
235runqocc, CPU 3   100.00%
236runqocc, CPU 4    26.26%
237runqocc, CPU 5    32.32%
238runqocc, CPU 6    39.39%
239runqocc, CPU 7    46.46%
240
24119:55:04
242runqocc, CPU 0    35.00%
243runqocc, CPU 1    32.32%
244runqocc, CPU 2    37.00%
245runqocc, CPU 3   100.00%
246runqocc, CPU 4    43.43%
247runqocc, CPU 5    31.31%
248runqocc, CPU 6    28.00%
249runqocc, CPU 7    31.31%
250
25119:55:05
252runqocc, CPU 0    43.43%
253runqocc, CPU 1    32.32%
254runqocc, CPU 2    45.45%
255runqocc, CPU 3   100.00%
256runqocc, CPU 4    29.29%
257runqocc, CPU 5    36.36%
258runqocc, CPU 6    36.36%
259runqocc, CPU 7    30.30%
260
26119:55:06
262runqocc, CPU 0    40.00%
263runqocc, CPU 1    38.00%
264runqocc, CPU 2    31.31%
265runqocc, CPU 3   100.00%
266runqocc, CPU 4    31.31%
267runqocc, CPU 5    28.28%
268runqocc, CPU 6    31.00%
269runqocc, CPU 7    29.29%
270[...]
271
272
273USAGE message:
274
275# ./runqlen -h
276usage: runqlen [-h] [-T] [-O] [-C] [interval] [count]
277
278Summarize scheduler run queue length as a histogram
279
280positional arguments:
281  interval         output interval, in seconds
282  count            number of outputs
283
284optional arguments:
285  -h, --help       show this help message and exit
286  -T, --timestamp  include timestamp on output
287  -O, --runqocc    report run queue occupancy
288  -C, --cpus       print output for each CPU separately
289
290examples:
291    ./runqlen            # summarize run queue length as a histogram
292    ./runqlen 1 10       # print 1 second summaries, 10 times
293    ./runqlen -T 1       # 1s summaries and timestamps
294    ./runqlen -O         # report run queue occupancy
295    ./runqlen -C         # show each CPU separately
296