1# CPU Scheduling events
2
3On Android and Linux Perfetto can gather scheduler traces via the Linux Kernel
4[ftrace](https://www.kernel.org/doc/Documentation/trace/ftrace.txt)
5infrastructure.
6
7This allows to get fine grained scheduling events such as:
8
9* Which threads were scheduling on which CPU cores at any point in time, with
10  nanosecond accuracy.
11* The reason why a running thread got descheduled (e.g. pre-emption, blocked on
12  a mutex, blocking syscall or any other wait queue).
13* The point in time when a thread became eligible to be executed, even if it was
14  not put immediately on any CPU run queue, together with the source thread that
15  made it executable.
16
17## UI
18
19When zoomed out, the UI shows a quantized view of CPU usage, which collapses the
20scheduling information:
21
22![](/docs/images/cpu-bar-graphs.png "Quantized view of CPU run queues")
23
24However, by zooming in, the individual scheduling events become visible:
25
26![](/docs/images/cpu-zoomed.png "Detailed view of CPU run queues")
27
28Clicking on a CPU slice shows the relevant information in the details panel:
29
30![](/docs/images/cpu-sched-details.png "CPU scheduling details")
31
32Scrolling down, when expanding individual processes, the scheduling events also
33create one track for each thread, which allows to follow the evolution of the
34state of individual threads:
35
36![](/docs/images/thread-states.png "States of individual threads")
37
38
39```protobuf
40data_sources {
41  config {
42    name: "linux.ftrace"
43    ftrace_config {
44      ftrace_events: "sched/sched_switch"
45      ftrace_events: "sched/sched_waking"
46    }
47  }
48}
49```
50
51## SQL
52
53At the SQL level, the scheduling data is exposed in the
54[`sched_slice`](/docs/analysis/sql-tables.autogen#sched_slice) table.
55
56```sql
57select ts, dur, cpu, end_state, priority, process.name, thread.name
58from sched_slice left join thread using(utid) left join process using(upid)
59```
60
61ts | dur | cpu | end_state | priority | process.name, | thread.name
62---|-----|-----|-----------|----------|---------------|------------
63261187012170995 | 247188 | 2 | S | 130 | /system/bin/logd | logd.klogd
64261187012418183 | 12812 | 2 | D | 120 | /system/bin/traced_probes | traced_probes0
65261187012421099 | 220000 | 4 | D | 120 | kthreadd | kworker/u16:2
66261187012430995 | 72396 | 2 | D | 120 | /system/bin/traced_probes | traced_probes1
67261187012454537 | 13958 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0
68261187012460318 | 46354 | 3 | S | 120 | /system/bin/traced_probes | traced_probes2
69261187012468495 | 10625 | 0 | R | 120 | [NULL] | swapper/0
70261187012479120 | 6459 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0
71261187012485579 | 7760 | 0 | R | 120 | [NULL] | swapper/0
72261187012493339 | 34896 | 0 | D | 120 | /system/bin/traced_probes | traced_probes0
73
74## TraceConfig
75
76```protobuf
77data_sources: {
78    config {
79        name: "linux.ftrace"
80        ftrace_config {
81            ftrace_events: "sched/sched_switch"
82            ftrace_events: "sched/sched_process_exit"
83            ftrace_events: "sched/sched_process_free"
84            ftrace_events: "task/task_newtask"
85            ftrace_events: "task/task_rename"
86        }
87    }
88}
89
90# This is to get full process name and thread<>process relationships.
91data_sources: {
92    config {
93        name: "linux.process_stats"
94    }
95}
96```
97
98## Scheduling wakeups and latency analysis
99
100By further enabling the following in the TraceConfig, the ftrace data source
101will record also scheduling wake up events:
102
103```protobuf
104  ftrace_events: "sched/sched_wakeup_new"
105  ftrace_events: "sched/sched_waking"
106```
107
108While `sched_switch` events are emitted only when a thread is in the
109`R(unnable)` state AND is running on a CPU run queue, `sched_waking` events are
110emitted when any event causes a thread state to change.
111
112Consider the following example:
113
114```
115Thread A
116condition_variable.wait()
117                                     Thread B
118                                     condition_variable.notify()
119```
120
121When Thread A suspends on the wait() it will enter the state `S(sleeping)` and
122get removed from the CPU run queue. When Thread B notifies the variable, the
123kernel will transition Thread A into the `R(unnable)` state. Thread A at that
124point is eligible to be put back on a run queue. However this might not happen
125for some time because, for instance:
126
127* All CPUs might be busy running some other thread, and Thread A needs to wait
128  to get a run queue slot assigned (or the other threads have higher priority).
129* Some other CPUs other than the current one, but the scheduler load balancer
130  might take some time to move the thread on another CPU.
131
132Unless using real-time thread priorities, most Linux Kernel scheduler
133configurations are not strictly work-conserving. For instance the scheduler
134might prefer to wait some time in the hope that the thread running on the
135current CPU goes to idle, avoiding a cross-cpu migration which might be more
136costly both in terms of overhead and power.
137
138NOTE: `sched_waking` and `sched_wakeup` provide nearly the same information. The
139      difference lies in wakeup events across CPUs, which involve
140      inter-processor interrupts. The former is emitted on the source (wakee)
141      CPU, the latter on the destination (waked) CPU. `sched_waking` is usually
142      sufficient for latency analysis, unless you are looking into breaking down
143      latency due to inter-processor signaling.
144
145When enabling `sched_waking` events, the following will appear in the UI when
146selecting a CPU slice:
147
148![](/docs/images/latency.png "Scheduling wake-up events in the UI")
149
150