1================= 2SanitizerCoverage 3================= 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11Sanitizer tools have a very simple code coverage tool built in. It allows to 12get function-level, basic-block-level, and edge-level coverage at a very low 13cost. 14 15How to build and run 16==================== 17 18SanitizerCoverage can be used with :doc:`AddressSanitizer`, 19:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer. 20In addition to ``-fsanitize=``, pass one of the following compile-time flags: 21 22* ``-fsanitize-coverage=func`` for function-level coverage (very fast). 23* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30% 24 **extra** slowdown). 25* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown). 26 27You may also specify ``-fsanitize-coverage=indirect-calls`` for 28additional `caller-callee coverage`_. 29 30At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``, 31``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate. 32 33To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters`` 34to one of the above compile-time flags. At runtime, use 35``*SAN_OPTIONS=coverage=1:coverage_counters=1``. 36 37Example: 38 39.. code-block:: console 40 41 % cat -n cov.cc 42 1 #include <stdio.h> 43 2 __attribute__((noinline)) 44 3 void foo() { printf("foo\n"); } 45 4 46 5 int main(int argc, char **argv) { 47 6 if (argc == 2) 48 7 foo(); 49 8 printf("main\n"); 50 9 } 51 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func 52 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov 53 main 54 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 55 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov 56 foo 57 main 58 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 59 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov 60 61Every time you run an executable instrumented with SanitizerCoverage 62one ``*.sancov`` file is created during the process shutdown. 63If the executable is dynamically linked against instrumented DSOs, 64one ``*.sancov`` file will be also created for every DSO. 65 66Postprocessing 67============== 68 69The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, 70one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the 71magic defines the size of the following offsets. The rest of the data is the 72offsets in the corresponding binary/DSO that were executed during the run. 73 74A simple script 75``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is 76provided to dump these offsets. 77 78.. code-block:: console 79 80 % sancov.py print a.out.22679.sancov a.out.22673.sancov 81 sancov.py: read 2 PCs from a.out.22679.sancov 82 sancov.py: read 1 PCs from a.out.22673.sancov 83 sancov.py: 2 files merged; 2 PCs total 84 0x465250 85 0x4652a0 86 87You can then filter the output of ``sancov.py`` through ``addr2line --exe 88ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line 89numbers: 90 91.. code-block:: console 92 93 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out 94 cov.cc:3 95 cov.cc:5 96 97How good is the coverage? 98========================= 99 100It is possible to find out which PCs are not covered, by subtracting the covered 101set from the set of all instrumented PCs. The latter can be obtained by listing 102all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py`` 103can do this for you. Just supply the path to binary and a list of covered PCs: 104 105.. code-block:: console 106 107 % sancov.py print a.out.12345.sancov > covered.txt 108 sancov.py: read 2 64-bit PCs from a.out.12345.sancov 109 sancov.py: 1 file merged; 2 PCs total 110 % sancov.py missing a.out < covered.txt 111 sancov.py: found 3 instrumented PCs in a.out 112 sancov.py: read 2 PCs from stdin 113 sancov.py: 1 PCs missing from coverage 114 0x4cc61c 115 116Edge coverage 117============= 118 119Consider this code: 120 121.. code-block:: c++ 122 123 void foo(int *a) { 124 if (a) 125 *a = 0; 126 } 127 128It contains 3 basic blocks, let's name them A, B, C: 129 130.. code-block:: none 131 132 A 133 |\ 134 | \ 135 | B 136 | / 137 |/ 138 C 139 140If blocks A, B, and C are all covered we know for certain that the edges A=>B 141and B=>C were executed, but we still don't know if the edge A=>C was executed. 142Such edges of control flow graph are called 143`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The 144edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical 145edges by introducing new dummy blocks and then instruments those blocks: 146 147.. code-block:: none 148 149 A 150 |\ 151 | \ 152 D B 153 | / 154 |/ 155 C 156 157Bitset 158====== 159 160When ``coverage_bitset=1`` run-time flag is given, the coverage will also be 161dumped as a bitset (text file with 1 for blocks that have been executed and 0 162for blocks that were not). 163 164.. code-block:: console 165 166 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc 167 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 168 main 169 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1 170 foo 171 main 172 % head *bitset* 173 ==> a.out.38214.bitset-sancov <== 174 01101 175 ==> a.out.6128.bitset-sancov <== 176 11011% 177 178For a given executable the length of the bitset is always the same (well, 179unless dlopen/dlclose come into play), so the bitset coverage can be 180easily used for bitset-based corpus distillation. 181 182Caller-callee coverage 183====================== 184 185(Experimental!) 186Every indirect function call is instrumented with a run-time function call that 187captures caller and callee. At the shutdown time the process dumps a separate 188file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as 189pairs of lines (odd lines are callers, even lines are callees) 190 191.. code-block:: console 192 193 a.out 0x4a2e0c 194 a.out 0x4a6510 195 a.out 0x4a2e0c 196 a.out 0x4a87f0 197 198Current limitations: 199 200* Only the first 14 callees for every caller are recorded, the rest are silently 201 ignored. 202* The output format is not very compact since caller and callee may reside in 203 different modules and we need to spell out the module names. 204* The routine that dumps the output is not optimized for speed 205* Only Linux x86_64 is tested so far. 206* Sandboxes are not supported. 207 208Coverage counters 209================= 210 211This experimental feature is inspired by 212`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage 213instrumentation. With additional compile-time and run-time flags you can get 214more sensitive coverage information. In addition to boolean values assigned to 215every basic block (edge) the instrumentation will collect imprecise counters. 216On exit, every counter will be mapped to a 8-bit bitset representing counter 217ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will 218be dumped to disk. 219 220.. code-block:: console 221 222 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters 223 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out 224 % ls -l *counters-sancov 225 ... a.out.17110.counters-sancov 226 % xxd *counters-sancov 227 0000000: 0001 0100 01 228 229These counters may also be used for in-process coverage-guided fuzzers. See 230``include/sanitizer/coverage_interface.h``: 231 232.. code-block:: c++ 233 234 // The coverage instrumentation may optionally provide imprecise counters. 235 // Rather than exposing the counter values to the user we instead map 236 // the counters to a bitset. 237 // Every counter is associated with 8 bits in the bitset. 238 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+ 239 // The i-th bit is set to 1 if the counter value is in the i-th range. 240 // This counter-based coverage implementation is *not* thread-safe. 241 242 // Returns the number of registered coverage counters. 243 uintptr_t __sanitizer_get_number_of_counters(); 244 // Updates the counter 'bitset', clears the counters and returns the number of 245 // new bits in 'bitset'. 246 // If 'bitset' is nullptr, only clears the counters. 247 // Otherwise 'bitset' should be at least 248 // __sanitizer_get_number_of_counters bytes long and 8-aligned. 249 uintptr_t 250 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset); 251 252Tracing basic blocks 253==================== 254An *experimental* feature to support basic block (or edge) tracing. 255With ``-fsanitize-coverage=trace-bb`` the compiler will insert 256``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge 257(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``). 258 259Tracing data flow 260================= 261 262An *experimental* feature to support data-flow-guided fuzzing. 263With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation 264around comparison instructions and switch statements. 265The fuzzer will need to define the following functions, 266they will be called by the instrumented code. 267 268.. code-block:: c++ 269 270 // Called before a comparison instruction. 271 // SizeAndType is a packed value containing 272 // - [63:32] the Size of the operands of comparison in bits 273 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE) 274 // Arg1 and Arg2 are arguments of the comparison. 275 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2); 276 277 // Called before a switch statement. 278 // Val is the switch operand. 279 // Cases[0] is the number of case constants. 280 // Cases[1] is the size of Val in bits. 281 // Cases[2:] are the case constants. 282 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases); 283 284This interface is a subject to change. 285The current implementation is not thread-safe and thus can be safely used only for single-threaded targets. 286 287Output directory 288================ 289 290By default, .sancov files are created in the current working directory. 291This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: 292 293.. code-block:: console 294 295 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo 296 % ls -l /tmp/cov/*sancov 297 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 298 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov 299 300Sudden death 301============ 302 303Normally, coverage data is collected in memory and saved to disk when the 304program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when 305``__sanitizer_cov_dump()`` is called. 306 307If the program ends with a signal that ASan does not handle (or can not handle 308at all, like SIGKILL), coverage data will be lost. This is a big problem on 309Android, where SIGKILL is a normal way of evicting applications from memory. 310 311With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a 312memory-mapped file as soon as it collected. 313 314.. code-block:: console 315 316 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out 317 main 318 % ls 319 7036.sancov.map 7036.sancov.raw a.out 320 % sancov.py rawunpack 7036.sancov.raw 321 sancov.py: reading map 7036.sancov.map 322 sancov.py: unpacking 7036.sancov.raw 323 writing 1 PCs to a.out.7036.sancov 324 % sancov.py print a.out.7036.sancov 325 sancov.py: read 1 PCs from a.out.7036.sancov 326 sancov.py: 1 files merged; 1 PCs total 327 0x4b2bae 328 329Note that on 64-bit platforms, this method writes 2x more data than the default, 330because it stores full PC values instead of 32-bit offsets. 331 332In-process fuzzing 333================== 334 335Coverage data could be useful for fuzzers and sometimes it is preferable to run 336a fuzzer in the same process as the code being fuzzed (in-process fuzzer). 337 338You can use ``__sanitizer_get_total_unique_coverage()`` from 339``<sanitizer/coverage_interface.h>`` which returns the number of currently 340covered entities in the program. This will tell the fuzzer if the coverage has 341increased after testing every new input. 342 343If a fuzzer finds a bug in the ASan run, you will need to save the reproducer 344before exiting the process. Use ``__asan_set_death_callback`` from 345``<sanitizer/asan_interface.h>`` to do that. 346 347An example of such fuzzer can be found in `the LLVM tree 348<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_. 349 350Performance 351=========== 352 353This coverage implementation is **fast**. With function-level coverage 354(``-fsanitize-coverage=func``) the overhead is not measurable. With 355basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies 356between 0 and 25%. 357 358============== ========= ========= ========= ========= ========= ========= 359 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2 360============== ========= ========= ========= ========= ========= ========= 361 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12 362 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18 363 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11 364 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05 365 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19 366 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03 367 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21 368462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09 369 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05 370 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12 371 473.astar 658.00 652.00 0.99 715.00 1.09 1.10 372 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19 373 433.milc 616.00 627.00 1.02 627.00 1.02 1.00 374 444.namd 602.00 601.00 1.00 654.00 1.09 1.09 375 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03 376 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07 377 453.povray 427.00 434.00 1.02 495.00 1.16 1.14 378 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99 379 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08 380============== ========= ========= ========= ========= ========= ========= 381 382Why another coverage? 383===================== 384 385Why did we implement yet another code coverage? 386 * We needed something that is lightning fast, plays well with 387 AddressSanitizer, and does not significantly increase the binary size. 388 * Traditional coverage implementations based in global counters 389 `suffer from contention on counters 390 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_. 391