1<a id="top"></a>
2# Authoring benchmarks
3
4> [Introduced](https://github.com/catchorg/Catch2/issues/1616) in Catch 2.9.0.
5
6_Note that benchmarking support is disabled by default and to enable it,
7you need to define `CATCH_CONFIG_ENABLE_BENCHMARKING`. For more details,
8see the [compile-time configuration documentation](configuration.md#top)._
9
10Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
11always need to take care about various aspects. Understanding a few things about
12the way Catch runs your code will be very helpful when writing your benchmarks.
13
14First off, let's go over some terminology that will be used throughout this
15guide.
16
17- *User code*: user code is the code that the user provides to be measured.
18- *Run*: one run is one execution of the user code.
19- *Sample*: one sample is one data point obtained by measuring the time it takes
20  to perform a certain number of runs. One sample can consist of more than one
21  run if the clock available does not have enough resolution to accurately
22  measure a single run. All samples for a given benchmark execution are obtained
23  with the same number of runs.
24
25## Execution procedure
26
27Now I can explain how a benchmark is executed in Catch. There are three main
28steps, though the first does not need to be repeated for every benchmark.
29
301. *Environmental probe*: before any benchmarks can be executed, the clock's
31resolution is estimated. A few other environmental artifacts are also estimated
32at this point, like the cost of calling the clock function, but they almost
33never have any impact in the results.
34
352. *Estimation*: the user code is executed a few times to obtain an estimate of
36the amount of runs that should be in each sample. This also has the potential
37effect of bringing relevant code and data into the caches before the actual
38measurement starts.
39
403. *Measurement*: all the samples are collected sequentially by performing the
41number of runs estimated in the previous step for each sample.
42
43This already gives us one important rule for writing benchmarks for Catch: the
44benchmarks must be repeatable. The user code will be executed several times, and
45the number of times it will be executed during the estimation step cannot be
46known beforehand since it depends on the time it takes to execute the code.
47User code that cannot be executed repeatedly will lead to bogus results or
48crashes.
49
50## Benchmark specification
51
52Benchmarks can be specified anywhere inside a Catch test case.
53There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
54
55Let's have a look how a naive Fibonacci implementation could be benchmarked:
56```c++
57std::uint64_t Fibonacci(std::uint64_t number) {
58    return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
59}
60```
61Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
62```c++
63TEST_CASE("Fibonacci") {
64    CHECK(Fibonacci(0) == 1);
65    // some more asserts..
66    CHECK(Fibonacci(5) == 8);
67    // some more asserts..
68
69    // now let's benchmark:
70    BENCHMARK("Fibonacci 20") {
71        return Fibonacci(20);
72    };
73
74    BENCHMARK("Fibonacci 25") {
75        return Fibonacci(25);
76    };
77
78    BENCHMARK("Fibonacci 30") {
79        return Fibonacci(30);
80    };
81
82    BENCHMARK("Fibonacci 35") {
83        return Fibonacci(35);
84    };
85}
86```
87There's a few things to note:
88- As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
89 the closing brace (as opposed to the first experimental version).
90- The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
91
92Running this already runs the benchmarks and outputs something similar to:
93```
94-------------------------------------------------------------------------------
95Fibonacci
96-------------------------------------------------------------------------------
97C:\path\to\Catch2\Benchmark.tests.cpp(10)
98...............................................................................
99benchmark name                                  samples       iterations    estimated
100                                                mean          low mean      high mean
101                                                std dev       low std dev   high std dev
102-------------------------------------------------------------------------------
103Fibonacci 20                                            100       416439   83.2878 ms
104                                                       2 ns         2 ns         2 ns
105                                                       0 ns         0 ns         0 ns
106
107Fibonacci 25                                            100       400776   80.1552 ms
108                                                       3 ns         3 ns         3 ns
109                                                       0 ns         0 ns         0 ns
110
111Fibonacci 30                                            100       396873   79.3746 ms
112                                                      17 ns        17 ns        17 ns
113                                                       0 ns         0 ns         0 ns
114
115Fibonacci 35                                            100       145169   87.1014 ms
116                                                     468 ns       464 ns       473 ns
117                                                      21 ns        15 ns        34 ns
118```
119
120### Advanced benchmarking
121The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
122However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
123the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
124while the blocks of the advanced benchmarks are invoked exactly twice:
125once during the estimation phase, and another time during the execution phase.
126
127```c++
128BENCHMARK("simple"){ return long_computation(); };
129
130BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
131    set_up();
132    meter.measure([] { return long_computation(); });
133};
134```
135
136These advanced benchmarks no longer consist entirely of user code to be measured.
137In these cases, the code to be measured is provided via the
138`Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
139kind of state that might be required for the benchmark but is not to be included
140in the measurements, like making a vector of random integers to feed to a
141sorting algorithm.
142
143A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
144by invoking the callable object passed in as many times as necessary. Anything
145that needs to be done outside the measurement can be done outside the call to
146`measure`.
147
148The callable object passed in to `measure` can optionally accept an `int`
149parameter.
150
151```c++
152meter.measure([](int i) { return long_computation(i); });
153```
154
155If it accepts an `int` parameter, the sequence number of each run will be passed
156in, starting with 0. This is useful if you want to measure some mutating code,
157for example. The number of runs can be known beforehand by calling
158`Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
159mutated by each run.
160
161```c++
162std::vector<std::string> v(meter.runs());
163std::fill(v.begin(), v.end(), test_string());
164meter.measure([&v](int i) { in_place_escape(v[i]); });
165```
166
167Note that it is not possible to simply use the same instance for different runs
168and resetting it between each run since that would pollute the measurements with
169the resetting code.
170
171It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get
172the same semantics as providing a callable to `meter.measure` with `int` argument:
173
174```c++
175BENCHMARK("indexed", i){ return long_computation(i); };
176```
177
178### Constructors and destructors
179
180All of these tools give you a lot mileage, but there are two things that still
181need special handling: constructors and destructors. The problem is that if you
182use automatic objects they get destroyed by the end of the scope, so you end up
183measuring the time for construction and destruction together. And if you use
184dynamic allocation instead, you end up including the time to allocate memory in
185the measurements.
186
187To solve this conundrum, Catch provides class templates that let you manually
188construct and destroy objects without dynamic allocation and in a way that lets
189you measure construction and destruction separately.
190
191```c++
192BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) {
193    std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
194    meter.measure([&](int i) { storage[i].construct("thing"); });
195};
196
197BENCHMARK_ADVANCED("destroy")(Catch::Benchmark::Chronometer meter) {
198    std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
199    for(auto&& o : storage)
200        o.construct("thing");
201    meter.measure([&](int i) { storage[i].destruct(); });
202};
203```
204
205`Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
206objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
207create an object in that storage. So if you want to measure the time it takes
208for a certain constructor to run, you can just measure the time it takes to run
209this function.
210
211When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
212constructed there it will be automatically destroyed, so nothing leaks.
213
214If you want to measure a destructor, though, we need to use
215`Catch::Benchmark::destructable_object<T>`. These objects are similar to
216`Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
217it does not destroy anything automatically. Instead, you are required to call
218the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
219can use to measure the destruction time.
220
221### The optimizer
222
223Sometimes the optimizer will optimize away the very code that you want to
224measure. There are several ways to use results that will prevent the optimiser
225from removing them. You can use the `volatile` keyword, or you can output the
226value to standard output or to a file, both of which force the program to
227actually generate the value somehow.
228
229Catch adds a third option. The values returned by any function provided as user
230code are guaranteed to be evaluated and not optimised out. This means that if
231your user code consists of computing a certain value, you don't need to bother
232with using `volatile` or forcing output. Just `return` it from the function.
233That helps with keeping the code in a natural fashion.
234
235Here's an example:
236
237```c++
238// may measure nothing at all by skipping the long calculation since its
239// result is not used
240BENCHMARK("no return"){ long_calculation(); };
241
242// the result of long_calculation() is guaranteed to be computed somehow
243BENCHMARK("with return"){ return long_calculation(); };
244```
245
246However, there's no other form of control over the optimizer whatsoever. It is
247up to you to write a benchmark that actually measures what you want and doesn't
248just measure the time to do a whole bunch of nothing.
249
250To sum up, there are two simple rules: whatever you would do in handwritten code
251to control optimization still works in Catch; and Catch makes return values
252from user code into observable effects that can't be optimized away.
253
254<i>Adapted from nonius' documentation.</i>
255