1benchmark 2========= 3[![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark) 4[![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master) 5[![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark) 6 7A library to support the benchmarking of functions, similar to unit-tests. 8 9Discussion group: https://groups.google.com/d/forum/benchmark-discuss 10 11IRC channel: https://freenode.net #googlebenchmark 12 13Example usage 14------------- 15Define a function that executes the code to be measured a 16specified number of times: 17 18```c++ 19static void BM_StringCreation(benchmark::State& state) { 20 while (state.KeepRunning()) 21 std::string empty_string; 22} 23// Register the function as a benchmark 24BENCHMARK(BM_StringCreation); 25 26// Define another benchmark 27static void BM_StringCopy(benchmark::State& state) { 28 std::string x = "hello"; 29 while (state.KeepRunning()) 30 std::string copy(x); 31} 32BENCHMARK(BM_StringCopy); 33 34BENCHMARK_MAIN(); 35``` 36 37Sometimes a family of microbenchmarks can be implemented with 38just one routine that takes an extra argument to specify which 39one of the family of benchmarks to run. For example, the following 40code defines a family of microbenchmarks for measuring the speed 41of `memcpy()` calls of different lengths: 42 43```c++ 44static void BM_memcpy(benchmark::State& state) { 45 char* src = new char[state.range_x()]; char* dst = new char[state.range_x()]; 46 memset(src, 'x', state.range_x()); 47 while (state.KeepRunning()) 48 memcpy(dst, src, state.range_x()); 49 state.SetBytesProcessed(int64_t(state.iterations()) * 50 int64_t(state.range_x())); 51 delete[] src; 52 delete[] dst; 53} 54BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); 55``` 56 57The preceding code is quite repetitive, and can be replaced with the 58following short-hand. The following invocation will pick a few 59appropriate arguments in the specified range and will generate a 60microbenchmark for each such argument. 61 62```c++ 63BENCHMARK(BM_memcpy)->Range(8, 8<<10); 64``` 65 66You might have a microbenchmark that depends on two inputs. For 67example, the following code defines a family of microbenchmarks for 68measuring the speed of set insertion. 69 70```c++ 71static void BM_SetInsert(benchmark::State& state) { 72 while (state.KeepRunning()) { 73 state.PauseTiming(); 74 std::set<int> data = ConstructRandomSet(state.range_x()); 75 state.ResumeTiming(); 76 for (int j = 0; j < state.range_y(); ++j) 77 data.insert(RandomNumber()); 78 } 79} 80BENCHMARK(BM_SetInsert) 81 ->ArgPair(1<<10, 1) 82 ->ArgPair(1<<10, 8) 83 ->ArgPair(1<<10, 64) 84 ->ArgPair(1<<10, 512) 85 ->ArgPair(8<<10, 1) 86 ->ArgPair(8<<10, 8) 87 ->ArgPair(8<<10, 64) 88 ->ArgPair(8<<10, 512); 89``` 90 91The preceding code is quite repetitive, and can be replaced with 92the following short-hand. The following macro will pick a few 93appropriate arguments in the product of the two specified ranges 94and will generate a microbenchmark for each such pair. 95 96```c++ 97BENCHMARK(BM_SetInsert)->RangePair(1<<10, 8<<10, 1, 512); 98``` 99 100For more complex patterns of inputs, passing a custom function 101to Apply allows programmatic specification of an 102arbitrary set of arguments to run the microbenchmark on. 103The following example enumerates a dense range on one parameter, 104and a sparse range on the second. 105 106```c++ 107static void CustomArguments(benchmark::internal::Benchmark* b) { 108 for (int i = 0; i <= 10; ++i) 109 for (int j = 32; j <= 1024*1024; j *= 8) 110 b->ArgPair(i, j); 111} 112BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 113``` 114 115Templated microbenchmarks work the same way: 116Produce then consume 'size' messages 'iters' times 117Measures throughput in the absence of multiprogramming. 118 119```c++ 120template <class Q> int BM_Sequential(benchmark::State& state) { 121 Q q; 122 typename Q::value_type v; 123 while (state.KeepRunning()) { 124 for (int i = state.range_x(); i--; ) 125 q.push(v); 126 for (int e = state.range_x(); e--; ) 127 q.Wait(&v); 128 } 129 // actually messages, not bytes: 130 state.SetBytesProcessed( 131 static_cast<int64_t>(state.iterations())*state.range_x()); 132} 133BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 134``` 135 136Three macros are provided for adding benchmark templates. 137 138```c++ 139#if __cplusplus >= 201103L // C++11 and greater. 140#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. 141#else // C++ < C++11 142#define BENCHMARK_TEMPLATE(func, arg1) 143#endif 144#define BENCHMARK_TEMPLATE1(func, arg1) 145#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 146``` 147 148In a multithreaded test (benchmark invoked by multiple threads simultaneously), 149it is guaranteed that none of the threads will start until all have called 150KeepRunning, and all will have finished before KeepRunning returns false. As 151such, any global setup or teardown you want to do can be 152wrapped in a check against the thread index: 153 154```c++ 155static void BM_MultiThreaded(benchmark::State& state) { 156 if (state.thread_index == 0) { 157 // Setup code here. 158 } 159 while (state.KeepRunning()) { 160 // Run the test as normal. 161 } 162 if (state.thread_index == 0) { 163 // Teardown code here. 164 } 165} 166BENCHMARK(BM_MultiThreaded)->Threads(2); 167``` 168 169If the benchmarked code itself uses threads and you want to compare it to 170single-threaded code, you may want to use real-time ("wallclock") measurements 171for latency comparisons: 172 173```c++ 174BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 175``` 176 177Without `UseRealTime`, CPU time is used by default. 178 179To prevent a value or expression from being optimized away by the compiler 180the `benchmark::DoNotOptimize(...)` function can be used. 181 182```c++ 183static void BM_test(benchmark::State& state) { 184 while (state.KeepRunning()) { 185 int x = 0; 186 for (int i=0; i < 64; ++i) { 187 benchmark::DoNotOptimize(x += i); 188 } 189 } 190} 191``` 192 193Benchmark Fixtures 194------------------ 195Fixture tests are created by 196first defining a type that derives from ::benchmark::Fixture and then 197creating/registering the tests using the following macros: 198 199* `BENCHMARK_F(ClassName, Method)` 200* `BENCHMARK_DEFINE_F(ClassName, Method)` 201* `BENCHMARK_REGISTER_F(ClassName, Method)` 202 203For Example: 204 205```c++ 206class MyFixture : public benchmark::Fixture {}; 207 208BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 209 while (st.KeepRunning()) { 210 ... 211 } 212} 213 214BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 215 while (st.KeepRunning()) { 216 ... 217 } 218} 219/* BarTest is NOT registered */ 220BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 221/* BarTest is now registered */ 222``` 223 224Output Formats 225-------------- 226The library supports multiple output formats. Use the 227`--benchmark_format=<tabular|json>` flag to set the format type. `tabular` is 228the default format. 229 230The Tabular format is intended to be a human readable format. By default 231the format generates color output. Context is output on stderr and the 232tabular data on stdout. Example tabular output looks like: 233``` 234Benchmark Time(ns) CPU(ns) Iterations 235---------------------------------------------------------------------- 236BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 237BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 238BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 239``` 240 241The JSON format outputs human readable json split into two top level attributes. 242The `context` attribute contains information about the run in general, including 243information about the CPU and the date. 244The `benchmarks` attribute contains a list of ever benchmark run. Example json 245output looks like: 246``` json 247{ 248 "context": { 249 "date": "2015/03/17-18:40:25", 250 "num_cpus": 40, 251 "mhz_per_cpu": 2801, 252 "cpu_scaling_enabled": false, 253 "build_type": "debug" 254 }, 255 "benchmarks": [ 256 { 257 "name": "BM_SetInsert/1024/1", 258 "iterations": 94877, 259 "real_time": 29275, 260 "cpu_time": 29836, 261 "bytes_per_second": 134066, 262 "items_per_second": 33516 263 }, 264 { 265 "name": "BM_SetInsert/1024/8", 266 "iterations": 21609, 267 "real_time": 32317, 268 "cpu_time": 32429, 269 "bytes_per_second": 986770, 270 "items_per_second": 246693 271 }, 272 { 273 "name": "BM_SetInsert/1024/10", 274 "iterations": 21393, 275 "real_time": 32724, 276 "cpu_time": 33355, 277 "bytes_per_second": 1199226, 278 "items_per_second": 299807 279 } 280 ] 281} 282``` 283 284The CSV format outputs comma-separated values. The `context` is output on stderr 285and the CSV itself on stdout. Example CSV output looks like: 286``` 287name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 288"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 289"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 290"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 291``` 292 293Debug vs Release 294---------------- 295By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: 296 297``` 298cmake -DCMAKE_BUILD_TYPE=Release 299``` 300 301To enable link-time optimisation, use 302 303``` 304cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true 305``` 306 307Linking against the library 308--------------------------- 309When using gcc, it is necessary to link against pthread to avoid runtime exceptions. This is due to how gcc implements std::thread. See [issue #67](https://github.com/google/benchmark/issues/67) for more details. 310