1# Trace Processor
2
3![Trace Processor Shell](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-shell.png)
4
5The Trace Processor is a C++ library
6([/src/trace_processor](/src/trace_processor)) that is able to ingest traces of
7various format and expose them in a massaged, higher level format, queryable
8through SQL queries. The trace processor is used:
9* By the [Perfetto UI](https://ui.perfetto.dev/), in the form of a
10  Web Assembly module.
11* Standalone, using the `trace_processor_shell` target
12  (`ninja -C out/xxx trace_processor_shell`).
13* In internal google3 pipelines for batch processing.
14
15Supported input formats:
16 * Perfetto .proto traces
17 * [Partial support] Chrome .json trace events
18 * [NOT IMPLEMENTED YET] ftrace format as per `/sys/kernel/debug/tracing/trace`.
19
20![Trace Processor](https://storage.googleapis.com/perfetto/markdown_img/trace-processor-small.png)
21
22Rationale
23---------
24Traces are raw because they are optimized for fast & low overhead writing.
25Despite being protos, their output is not ideal for being consumed by third
26parties as-is. Some data massaging is required.
27Examples:
28* Ftrace sched_switch events only provide thread names and thread IDs.
29  In order to attribute execution times to the package/process that data needs
30  to be joined with the process_tree events to join TIDs with their parent PID
31  and process name.
32* Even after this join, sched_switch events produce two slices (one at the
33  beginning, one at the end) per sched event. What most consumers want to see
34  instead is one "interval" per thread execution time-slice.
35* Similarly ftrace ext4 events provide only inode numbers and those need to be
36  joined with inode->path events.
37
38
39Schema
40------
41
42### sched table
43The sched table holds data about scheduling slices in the trace.
44
45`ts`
46Timestamp of the scheduling event, in nanoseconds. This comes from the
47CLOCK_BOOTTIME, when available.
48
49`dur`
50Duration of the scheduling event, in nanoseconds.
51
52`utid`
53ID of the thread. This is NOT the UNIX pid/tid (see below).
54This can be joined with the thread table.
55
56`cpu`
57CPU number where the scheduling event happened.
58
59
60### counters table
61The counters table contains the data about counter events (both kernel
62and userspace) in the trace. This includes sources like memory, battery,
63cpufreq events etc.
64
65`id`
66A unique identifier for the counter row.
67
68`ts`
69The timestamp of the counter event.
70
71`name`
72The name of the counter event.
73
74`value`
75The value of the counter event.
76
77`ref`
78The identifier of the `ref`erence metadata associated with the counter event.
79See ref_type for what data this can contain.
80
81`ref_type`
82The type of reference metadata associated to the counter event. Will be one
83of the following values `utid` (the ref is an identifier for the thread table),
84`upid` (same for process table), `cpu` (the cpu the event occurred on), `irq`
85and `softirq`.
86
87`arg_set_id`
88The identifier into the args table. (see below)
89
90
91### instants table
92The instants table contains the data about instant events (both kernel
93and userspace) in the trace. This includes sources like the lmk, sched_wakeup
94events etc.
95
96`id`
97A unique identifier for the instant row.
98
99`ts`
100The timestamp of the instant event.
101
102`name`
103The name of the instant event.
104
105'value'
106The value of the instant event.
107
108`ref`
109The identifier of the `ref`erence metadata associated with the instant event.
110See ref_type for what data this can contain.
111
112`ref_type`
113The type of reference metadata associated to the instant event. Will be one
114of the following values `utid` (the ref is an identifier for the thread table),
115`upid` (same for process table), `cpu` (the cpu the event occurred on), `irq`
116and `softirq`.
117
118`arg_set_id`
119The identifier into the args table. (see below)
120
121
122### slices table
123The slices table holds data about the userspace slices (from Chrome or Android)
124seen in the trace. These slices can be nested within each other forming 'stacks'
125of slices.
126
127`ts`
128The timestamp of the userspace slice in nanoseconds.
129
130`dur`
131Duration of the userspace slice, in nanoseconds.
132
133`utid`
134ID of the thread. This is NOT the UNIX pid/tid (see below).
135This can be joined with the thread table.
136
137`cat`
138The category of the slice. Only non-null and meaningful for Chrome traces.
139
140`name`
141The name of the slice.
142
143`depth`
144The nesting depth of the slice within the stack. Starts at 0 for root slices
145and counts upward for child slices.
146
147`stack_id`
148A unique identifier for the whole stack of slices to the current point. This
149identifier is useful when deriving metrics on unique stacks of slices.
150
151`parent_stack_id`
152The 'stack_id' for the parent stack of slices. This is 0 for all root slices
153and a reference to a 'stack_id' otherwise.
154
155
156### process table
157The process table holds data about the processes seen in the trace.
158
159`upid`
160Unique process ID. This is NOT the UNIX pid. This is a sequence number generated
161by the trace processor to uniquely identify a process in the trace. This is to
162deal with the fact that UNIX pids can be recycled and two distinct processes
163which lifetimes don't overlap can be assigned the same pid.
164
165`name`
166Process name, as per /proc/pid/cmdline.
167
168`pid`
169The UNIX pid (also known as Thread Group ID in Linux). This also matches the
170tid of the process' main thread.
171
172
173Example:
174
175| upid              |               name |                pid |
176|-------------------|--------------------|--------------------|
177|                 1 | /system/bin/logd   |                601 |
178|                 2 | rcu_preempt        |                  7 |
179|                 3 | rcuop/4            |                 44 |
180|                 4 | rcuop/6            |                 60 |
181
182### thread table
183The thread table holds data about the threads seen in the trace.
184
185`utid`
186Unique thread ID. This is NOT the Linux pid or tid. Like the above, this is a
187sequence number generated by the trace processor to uniquely identify a thread
188in the trace.
189
190`upid`
191ID of the parent process in the `process` table.
192This can be used to JOIN a thread with its process.
193
194`name`
195Thread name, as per /proc/pid/task/tid/comm.
196
197`tid`
198The Linux thread id (confusingly named "pid" in the Linux-world).
199For the process' main thread `tid` == `tgid` == `pid`.
200
201Example:
202
203| utid   | upid  | name             | tid  |
204|--------|-------|------------------|------|
205|      1 |     1 | logd.klogd       |  632 |
206|      2 |     2 | rcu_preempt      |    7 |
207|      3 |     4 | rcuop/6          |   60 |
208|      4 |     6 | rcuop/3          |   36 |
209|      5 |     8 | sugov:0          |  588 |
210|      6 |     9 | kworker/u16:6    | 9283 |
211|      7 |    12 | sensors@1.0-ser  | 1021 |
212|      8 |    12 | HwBinder:797_1   | 1626 |
213
214
215### stats table
216The stats table holds the statistics from the trace collection tool
217as well as counters from the trace processor collected during parsing and
218ingesting the trace
219
220`name`
221The name of the stat.
222
223`idx`
224The index of the stat in the array. This value is only non-null for
225stats which are indexed (e.g. ftrace overrun events are indexed per CPU).
226
227`severity`
228The severity of the value indicated by the stat. Can be one of 'info' and
229'error'.
230
231`source`
232The source of the stat i.e. whether is is coming from the trace collection
233time or parsing/ingestion time. One of 'trace' (i.e. trace collection time)
234or 'analysis' (parsing/ingestion time).
235
236`value`
237The value of the statistic.
238
239
240### args table
241The args table is a generic store of key value pairs deduplicated across the
242entire trace. A 'set' of arguments is given a unique identifier and can be
243referenced from other tables.
244
245`arg_set_id`
246The identifier for the set of arguments this arg belongs to.
247
248`flat_key`
249The key of the arg excluding any indexing for args which are arrays.
250
251`key`
252The long form key of the arg (including any indexes for array args.)
253
254`int_value`, `real_value`, `string_value`
255The value of the arg. One of these columns will be non-null depending on the
256type of the arg with the other two being null.
257
258
259Sample queries for the `sched` (sched_switch events) table
260----------------------------------------------------------
261
262### Trace duration
263``` sql
264select ((select max(ts) from sched) - (select min(ts) from sched)) / 1e9 as duration_sec
265```
266
267### Total CPU usage
268``` sql
269select cpu, sum(dur)/1e9 as cpu_time_sec from sched group by cpu order by cpu
270```
271
272### List all processes
273``` sql
274select process.name, pid from process limit 100
275```
276
277### List all processes and threads
278``` sql
279select process.name as proc_name, pid, thread.name as thread_name, tid from thread left join process using(upid) limit 100
280```
281
282### CPU time for top 100 threads
283``` sql
284select thread.name as thread_name, tid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) inner join thread using(utid)
285```
286
287With matching process names
288``` sql
289select thread.name as thread_name, process.name as proc_name, tid, pid, cpu_sec from (select utid, sum(dur)/1e9 as cpu_sec from sched group by utid order by dur desc limit 100) left outer join thread using(utid) left outer join process using(upid)
290```
291
292### CPU time for top 100 processes
293``` sql
294select process.name, tot_proc/1e9 as cpu_sec from (select upid, sum(tot_thd) as
295tot_proc from (select utid, sum(dur) as tot_thd from sched group by utid) join
296thread using(utid) group by upid) join process using(upid) order by cpu_sec desc
297limit 100;
298```
299
300### CPU time for top 100 processes broken down by cpu
301``` sql
302select proc_name, cpu, cpu_sec from (select process.name as proc_name, upid, cpu, cpu_sec from (select cpu, utid, sum(dur)/1e9 as cpu_sec from sched group by utid) left join thread using(utid) left join process using(upid)) group by upid, cpu order by cpu_sec desc limit 100
303```
304