1# Trace-based metrics
2
3_The metrics subsystem is a part of the
4[trace processor](/docs/analysis/trace-processor.md) which uses traces to
5compute reproducible metrics. It can be used in a wide range of situations;
6examples include benchmarks, lab tests and on large corpuses of traces._
7
8![Block diagram of metrics](/docs/images/metrics-summary.png)
9
10## Quickstart
11
12The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview
13on how to compute trace-based metrics traces using trace processor.
14
15## Introduction
16
17### Motivation
18
19Performance metrics are useful to monitor for the health of a system and ensure
20that a system does not regress over time as new features are added.
21
22However, metrics retrieved directly from the system have a downside: if there is
23a regression, it is difficult to root-cause the issue. Often, the problem may
24not be reproducible or may rely on a particular setup.
25
26Trace-based metrics are one possible solution to this problem. Instead of
27collecting metrics directly on the system, a trace is collected and metrics are
28computed from the trace. If a regression in the metric is spotted, the developer
29can look directly at the trace to understand why the regression has occurred
30instead of having to reproduce the issue.
31
32### Metric subsystem
33
34The metric subsystem is a part of the
35[trace processor](/docs/analysis/trace-processor.md) which executes SQL queries
36against traces and produces a metric which summarizes some performance attribute
37(e.g. CPU, memory, startup latency etc.).
38
39For example, generating the Android CPU metrics on a trace is as simple as:
40
41```python
42> ./trace_processor --run-metrics android_cpu <trace>
43android_cpu {
44  process_info {
45    name: "/system/bin/init"
46    threads {
47      name: "init"
48      core {
49        id: 1
50        metrics {
51          mcycles: 1
52          runtime_ns: 570365
53          min_freq_khz: 1900800
54          max_freq_khz: 1900800
55          avg_freq_khz: 1902017
56        }
57      }
58      ...
59    }
60    ...
61  }
62  ...
63}
64```
65
66### Case for upstreaming
67
68NOTE: Googlers: for internal usage of metrics in Google3 (i.e. metrics which are
69confidential), please see [this internal page](https://goto.google.com/viecd).
70
71Authors are strongly encouraged to add all metrics derived on Perfetto traces to
72the Perfetto repo unless there is a clear usecase (e.g. confidentiality) why
73these metrics should not be publicly available.
74
75In return for upstreaming metrics, authors will have first class support for
76running metrics locally and the confidence that their metrics will remain stable
77as trace processor is developed.
78
79As well as scaling upwards while developing from running on a single trace
80locally to running on a large set of traces, the reverse is also very useful.
81When an anomaly is observed in the metrics of a lab benchmark, a representative
82trace can be downloaded and the same metric can be run locally in trace
83processor.
84
85Since the same code is running locally and remotely, developers can be confident
86in reproducing the issue and use the trace processor and/or the Perfetto UI to
87identify the problem.
88
89## Walkthrough: prototyping a metric
90
91TIP: To see how to add a new metric to trace processor, see the checklist
92[here](/docs/contributing/common-tasks.md#new-metric)
93
94This walkthrough will outline how to prototype a metric locally without needing
95to compile trace processor. This metric will compute the CPU time for every
96process in the trace and list the names of the top 5 processes (by CPU time) and
97the number of threads created by the process.
98
99NOTE: See this [GitHub gist][gist] to see how the code should look at the end of
100      the walkthrough. The prerequisites and Step 4 below give instructions on
101      how to get trace processor and run the metrics code.
102
103[gist]: https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080
104
105### Prerequisites
106
107As a setup step, create a folder to act as a scratch workspace; this folder will be referred to using the env variable `$WORKSPACE` in Step 4.
108
109The other requirement is trace processor. This can downloaded from [here](https://get.perfetto.dev/trace_processor) or can be built from source
110using the instructions [here](trace-processor.md). Whichever method is chosen, $TRACE_PROCESSOR env variable will be used to refer to the location of the binary in Step 4.
111
112### Step 1
113
114As all metrics in the metrics platform are defined using protos, the metric needs to be structured as a proto. For this metric, there needs to be some notion of a process name along with its CPU time and number of threads.
115
116Starting off, in a file named `top_five_processes.proto` in our workspace, create a basic proto message called ProcessInfo with those three fields:
117
118```protobuf
119message ProcessInfo {
120  optional string process_name = 1;
121  optional uint64 cpu_time_ms = 2;
122  optional uint32 num_threads = 3;
123}
124```
125
126Next , create a wrapping message which will hold the repeated field containing the top 5 processes.
127
128```protobuf
129message TopProcesses {
130  repeated ProcessInfo process_info = 1;
131}
132```
133
134Finally, define an extension to the root proto for all metrics (the [TraceMetrics](https://android.googlesource.com/platform/external/perfetto/+/HEAD/protos/perfetto/metrics/metrics.proto#39) proto).
135
136```protobuf
137extend TraceMetrics {
138  optional TopProcesses top_five_processes = 450;
139}
140```
141
142Adding this extension field allows trace processor to link the newly defined
143metric to the `TraceMetrics` proto.
144
145_Notes:_
146
147- The field ids 450-500 are reserved for local development so any of them can be used as the field id for the extension field.
148- The choice of field name here is important as the SQL file and the final table generated in SQL will be based on this name.
149
150Putting everything together, along with some boilerplate preamble gives:
151
152```protobuf
153syntax = "proto2";
154
155package perfetto.protos;
156
157import "protos/perfetto/metrics/metrics.proto";
158
159message ProcessInfo {
160  optional string process_name = 1;
161  optional int64 cpu_time_ms = 2;
162  optional uint32 num_threads = 3;
163}
164
165message TopProcesses {
166  repeated ProcessInfo process_info = 1;
167}
168
169extend TraceMetrics {
170  optional TopProcesses top_five_processes = 450;
171}
172```
173
174### Step 2
175
176Next, write the SQL to generate the table of the top 5 processes ordered by the
177sum of the CPU time they ran for and the number of threads which were associated
178with the process.
179
180The following SQL should be added to a file called `top_five_processes.sql` in
181the workspace:
182
183```sql
184CREATE VIEW top_five_processes_by_cpu AS
185SELECT
186  process.name as process_name,
187  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
188  COUNT(DISTINCT utid) as num_threads
189FROM sched
190INNER JOIN thread USING(utid)
191INNER JOIN process USING(upid)
192GROUP BY process.name
193ORDER BY cpu_time_ms DESC
194LIMIT 5;
195```
196
197Let's break this query down:
198
1991. The first table used is the `sched` table. This contains all the scheduling
200   data available in the trace. Each scheduling "slice" is associated with a
201   thread which is uniquely identified in Perfetto traces using its `utid`. The
202   two pieces of information needed from the sched table are the `dur` -
203   short for duration, this is the amount of time the slice lasted - and the
204   `utid` which will be used to join with the thread table.
2052. The next table is the thread table. This gives us a lot of information which
206   is not particularly interesting (including its thread name) but it does give
207   us the `upid`. Similar to `utid`, `upid` is the unique identifier for a
208   process in a Perfetto trace. In this case, `upid` will refer to the process
209   which hosts the thread given by `utid`.
2103. The final table is the process table. This gives the name of the process
211   associated with the original sched slice.
2124. With the process, thread and duration for each sched slice, all the slices
213   for a single processes are collected and their durations summed to get the
214   CPU time (dividing by 1e6 as sched's duration is in nanoseconds) and
215   the number of distinct threads.
2165. Finally, we order by the cpu time and limit to the top 5 results.
217
218### Step 3
219
220Now that the result of the metric has been expressed as an SQL table, it needs
221to be converted to a proto. The metrics platform has built-in support for emitting
222protos using SQL functions; something which is used extensively in this step.
223
224Let's look at how it works for our table above.
225
226```sql
227CREATE VIEW top_five_processes_output AS
228SELECT TopProcesses(
229  'process_info', (
230    SELECT RepeatedField(
231      ProcessInfo(
232        'process_name', process_name,
233        'cpu_time_ms', cpu_time_ms,
234        'num_threads', num_threads
235      )
236    )
237    FROM top_five_processes_by_cpu
238  )
239);
240```
241
242Breaking this down again:
243
2441. Starting from the inner-most SELECT statement, there is what looks like
245   a function call to the ProcessInfo function; in fact this is no coincidence.
246   For each proto that the metrics platform knows about, an SQL function is
247   generated with the same name as the proto. This function takes key value
248   pairs with the key as the name of the proto field to fill and the value being
249   the data to store in the field. The output is the proto created by writing
250   the fields described in the function. (\*)
251
252   In this case, this function is called once for each row in the
253   `top_five_processes_by_cpu` table. The output will be the fully filled
254   ProcessInfo proto.
255
256   The call to the `RepeatedField` function is the most interesting part and
257   also the most important. In technical terms, `RepeatedField` is an aggregate
258   function. Practically, this means that it takes a full table of values and
259   generates a single array which contains all the values passed to it.
260
261   Therefore, the output of this whole SELECT statement is an array of 5
262   ProcessInfo protos.
263
2642. Next is creation of the `TopProcesses` proto. By now, the syntax should
265   already feel somewhat familiar; the proto builder function is called to fill
266   in the `process_info` field with the array of protos from the inner function.
267
268   The output of this SELECT is a single `TopProcesses` proto containing the
269   ProcessInfos as a repeated field.
270
2713. Finally, the view is created. This view is specially named to allow the
272   metrics platform to query it to obtain the root proto for each metric
273   (in this case `TopProcesses`). See the note below as to the pattern behind
274   this view's name.
275
276(\*) _This is not strictly true. To type-check the protos, some metadata
277is returned about the type of the proto but this is unimportant for metric
278authors._
279
280NOTE: It is important that the views be named {name of TraceMetrics extension
281      field}\_output. This is the pattern used and expected by the metrics
282      platform for all metrics.
283
284The final file should look like so:
285
286```sql
287CREATE VIEW top_five_processes_by_cpu AS
288SELECT
289  process.name as process_name,
290  CAST(SUM(sched.dur) / 1e6 as INT64) as cpu_time_ms,
291  COUNT(DISTINCT utid) as num_threads
292FROM sched
293INNER JOIN thread USING(utid)
294INNER JOIN process USING(upid)
295GROUP BY process.name
296ORDER BY cpu_time_ms DESC
297LIMIT 5;
298
299CREATE top_five_processes_output AS
300SELECT TopProcesses(
301  'process_info', (
302    SELECT RepeatedField(
303      ProcessInfo(
304        'process_name', process_name,
305        'cpu_time_ms', cpu_time_ms,
306        'num_threads', num_threads
307      )
308    )
309    FROM top_five_processes_by_cpu
310  )
311);
312```
313
314NOTE: The name of the SQL file should be the same as the name of TraceMetrics
315      extension field. This is to allow the metrics platform to associated the
316      proto extension field with the SQL which needs to be run to generate it.
317
318### Step 4
319
320For this step, invoke trace processor shell to run the metrics (see the
321[Quickstart](/docs/quickstart/trace-analysis.md) for downloading instructions):
322
323```shell
324$TRACE_PROCESSOR --run-metrics $WORKSPACE/top_five_processes.sql $TRACE 2> /dev/null
325```
326
327(For an example trace to test this on, see the Notes section below.)
328
329By passing the SQL file for the metric to be computed, trace processor uses the name of this file to find the proto and to figure out the name of the output table for the proto and the name of the extension field for `TraceMetrics`; this is the reason it was important to choose the names of these other objects carefully.
330
331_Notes:_
332
333- If something doesn't work as intended, check that the workspace looks the same as the contents of this [GitHub gist](https://gist.github.com/LalitMaganti/c221cf0cae17e298dfa82b118edf9080).
334- A good example trace for this metric is the Android example trace used by the Perfetto UI found [here](https://storage.googleapis.com/perfetto-misc/example_android_trace_30s_1).
335- stderr is redirected to remove any noise from parsing the trace that trace processor generates.
336
337If everything went successfully, the following output should be visible (specifically this is the output for the Android example trace linked above):
338
339```
340[perfetto.protos.top_five_processes] {
341  process_info {
342    process_name: "com.google.android.GoogleCamera"
343    cpu_time_ms: 15154
344    num_threads: 125
345  }
346  process_info {
347    process_name: "sugov:4"
348    cpu_time_ms: 6846
349    num_threads: 1
350  }
351  process_info {
352    process_name: "system_server"
353    cpu_time_ms: 6809
354    num_threads: 66
355  }
356  process_info {
357    process_name: "cds_ol_rx_threa"
358    cpu_time_ms: 6684
359    num_threads: 1
360  }
361  process_info {
362    process_name: "com.android.chrome"
363    cpu_time_ms: 5125
364    num_threads: 49
365  }
366}
367```
368
369### Next steps
370
371* The [common tasks](/docs/contributing/common-tasks.md) page gives a list of
372  steps on how new metrics can be added to the trace processor.
373