1================= 2SanitizerCoverage 3================= 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11Sanitizer tools have a very simple code coverage tool built in. It allows to 12get function-level, basic-block-level, and edge-level coverage at a very low 13cost. 14 15How to build and run 16==================== 17 18SanitizerCoverage can be used with :doc:`AddressSanitizer`, 19:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, 20UndefinedBehaviorSanitizer, or without any sanitizer. Pass one of the 21following compile-time flags: 22 23* ``-fsanitize-coverage=func`` for function-level coverage (very fast). 24* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30% 25 **extra** slowdown). 26* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown). 27 28You may also specify ``-fsanitize-coverage=indirect-calls`` for 29additional `caller-callee coverage`_. 30 31At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, 32``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as 33appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``. 34 35To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters`` 36to one of the above compile-time flags. At runtime, use 37``*SAN_OPTIONS=coverage=1:coverage_counters=1``. 38 39Example: 40 41.. code-block:: console 42 43 % cat -n cov.cc 44 1 #include <stdio.h> 45 2 __attribute__((noinline)) 46 3 void foo() { printf("foo\n"); } 47 4 48 5 int main(int argc, char **argv) { 49 6 if (argc == 2) 50 7 foo(); 51 8 printf("main\n"); 52 9 } 53 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func 54 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov 55 main 56 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 57 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov 58 foo 59 main 60 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 61 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov 62 63Every time you run an executable instrumented with SanitizerCoverage 64one ``*.sancov`` file is created during the process shutdown. 65If the executable is dynamically linked against instrumented DSOs, 66one ``*.sancov`` file will be also created for every DSO. 67 68Postprocessing 69============== 70 71The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, 72one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the 73magic defines the size of the following offsets. The rest of the data is the 74offsets in the corresponding binary/DSO that were executed during the run. 75 76A simple script 77``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is 78provided to dump these offsets. 79 80.. code-block:: console 81 82 % sancov.py print a.out.22679.sancov a.out.22673.sancov 83 sancov.py: read 2 PCs from a.out.22679.sancov 84 sancov.py: read 1 PCs from a.out.22673.sancov 85 sancov.py: 2 files merged; 2 PCs total 86 0x465250 87 0x4652a0 88 89You can then filter the output of ``sancov.py`` through ``addr2line --exe 90ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line 91numbers: 92 93.. code-block:: console 94 95 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out 96 cov.cc:3 97 cov.cc:5 98 99Sancov Tool 100=========== 101 102A new experimental ``sancov`` tool is developed to process coverage files. 103The tool is part of LLVM project and is currently supported only on Linux. 104It can handle symbolization tasks autonomously without any extra support 105from the environment. You need to pass .sancov files (named 106``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files. 107Sancov matches these files using module names and binaries file names. 108 109.. code-block:: console 110 111 USAGE: sancov [options] <action> (<binary file>|<.sancov file>)... 112 113 Action (required) 114 -print - Print coverage addresses 115 -covered-functions - Print all covered functions. 116 -not-covered-functions - Print all not covered functions. 117 -html-report - Print HTML coverage report. 118 119 Options 120 -blacklist=<string> - Blacklist file (sanitizer blacklist format). 121 -demangle - Print demangled function name. 122 -strip_path_prefix=<string> - Strip this prefix from file paths in reports 123 124 125Automatic HTML Report Generation 126================================ 127 128If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html 129coverage report would be automatically generated alongside the coverage files. 130The ``sancov`` binary should be present in ``PATH`` or 131``sancov_path=<path_to_sancov`` option can be used to specify tool location. 132 133 134How good is the coverage? 135========================= 136 137It is possible to find out which PCs are not covered, by subtracting the covered 138set from the set of all instrumented PCs. The latter can be obtained by listing 139all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py`` 140can do this for you. Just supply the path to binary and a list of covered PCs: 141 142.. code-block:: console 143 144 % sancov.py print a.out.12345.sancov > covered.txt 145 sancov.py: read 2 64-bit PCs from a.out.12345.sancov 146 sancov.py: 1 file merged; 2 PCs total 147 % sancov.py missing a.out < covered.txt 148 sancov.py: found 3 instrumented PCs in a.out 149 sancov.py: read 2 PCs from stdin 150 sancov.py: 1 PCs missing from coverage 151 0x4cc61c 152 153Edge coverage 154============= 155 156Consider this code: 157 158.. code-block:: c++ 159 160 void foo(int *a) { 161 if (a) 162 *a = 0; 163 } 164 165It contains 3 basic blocks, let's name them A, B, C: 166 167.. code-block:: none 168 169 A 170 |\ 171 | \ 172 | B 173 | / 174 |/ 175 C 176 177If blocks A, B, and C are all covered we know for certain that the edges A=>B 178and B=>C were executed, but we still don't know if the edge A=>C was executed. 179Such edges of control flow graph are called 180`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The 181edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical 182edges by introducing new dummy blocks and then instruments those blocks: 183 184.. code-block:: none 185 186 A 187 |\ 188 | \ 189 D B 190 | / 191 |/ 192 C 193 194Bitset 195====== 196 197When ``coverage_bitset=1`` run-time flag is given, the coverage will also be 198dumped as a bitset (text file with 1 for blocks that have been executed and 0 199for blocks that were not). 200 201.. code-block:: console 202 203 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc 204 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 205 main 206 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1 207 foo 208 main 209 % head *bitset* 210 ==> a.out.38214.bitset-sancov <== 211 01101 212 ==> a.out.6128.bitset-sancov <== 213 11011% 214 215For a given executable the length of the bitset is always the same (well, 216unless dlopen/dlclose come into play), so the bitset coverage can be 217easily used for bitset-based corpus distillation. 218 219Caller-callee coverage 220====================== 221 222(Experimental!) 223Every indirect function call is instrumented with a run-time function call that 224captures caller and callee. At the shutdown time the process dumps a separate 225file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as 226pairs of lines (odd lines are callers, even lines are callees) 227 228.. code-block:: console 229 230 a.out 0x4a2e0c 231 a.out 0x4a6510 232 a.out 0x4a2e0c 233 a.out 0x4a87f0 234 235Current limitations: 236 237* Only the first 14 callees for every caller are recorded, the rest are silently 238 ignored. 239* The output format is not very compact since caller and callee may reside in 240 different modules and we need to spell out the module names. 241* The routine that dumps the output is not optimized for speed 242* Only Linux x86_64 is tested so far. 243* Sandboxes are not supported. 244 245Coverage counters 246================= 247 248This experimental feature is inspired by 249`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage 250instrumentation. With additional compile-time and run-time flags you can get 251more sensitive coverage information. In addition to boolean values assigned to 252every basic block (edge) the instrumentation will collect imprecise counters. 253On exit, every counter will be mapped to a 8-bit bitset representing counter 254ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will 255be dumped to disk. 256 257.. code-block:: console 258 259 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters 260 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out 261 % ls -l *counters-sancov 262 ... a.out.17110.counters-sancov 263 % xxd *counters-sancov 264 0000000: 0001 0100 01 265 266These counters may also be used for in-process coverage-guided fuzzers. See 267``include/sanitizer/coverage_interface.h``: 268 269.. code-block:: c++ 270 271 // The coverage instrumentation may optionally provide imprecise counters. 272 // Rather than exposing the counter values to the user we instead map 273 // the counters to a bitset. 274 // Every counter is associated with 8 bits in the bitset. 275 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+ 276 // The i-th bit is set to 1 if the counter value is in the i-th range. 277 // This counter-based coverage implementation is *not* thread-safe. 278 279 // Returns the number of registered coverage counters. 280 uintptr_t __sanitizer_get_number_of_counters(); 281 // Updates the counter 'bitset', clears the counters and returns the number of 282 // new bits in 'bitset'. 283 // If 'bitset' is nullptr, only clears the counters. 284 // Otherwise 'bitset' should be at least 285 // __sanitizer_get_number_of_counters bytes long and 8-aligned. 286 uintptr_t 287 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset); 288 289Tracing basic blocks 290==================== 291Experimental support for basic block (or edge) tracing. 292With ``-fsanitize-coverage=trace-bb`` the compiler will insert 293``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge 294(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``). 295Example: 296 297.. code-block:: console 298 299 % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc 300 % ASAN_OPTIONS=coverage=1 ./a.out 301 302This will produce two files after the process exit: 303`trace-points.PID.sancov` and `trace-events.PID.sancov`. 304The first file will contain a textual description of all the instrumented points in the program 305in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line. 306The second file will contain the actual execution trace as a sequence of 4-byte integers 307-- these integers are the indices into the array of instrumented points (the first file). 308 309Basic block tracing is currently supported only for single-threaded applications. 310 311 312Tracing PCs 313=========== 314*Experimental* feature similar to tracing basic blocks, but with a different API. 315With ``-fsanitize-coverage=trace-pc`` the compiler will insert 316``__sanitizer_cov_trace_pc()`` on every edge. 317With an additional ``...=trace-pc,indirect-calls`` flag 318``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. 319These callbacks are not implemented in the Sanitizer run-time and should be defined 320by the user. So, these flags do not require the other sanitizer to be used. 321This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller) 322and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__. 323 324Tracing data flow 325================= 326 327An *experimental* feature to support data-flow-guided fuzzing. 328With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation 329around comparison instructions and switch statements. 330The fuzzer will need to define the following functions, 331they will be called by the instrumented code. 332 333.. code-block:: c++ 334 335 // Called before a comparison instruction. 336 // SizeAndType is a packed value containing 337 // - [63:32] the Size of the operands of comparison in bits 338 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE) 339 // Arg1 and Arg2 are arguments of the comparison. 340 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2); 341 342 // Called before a switch statement. 343 // Val is the switch operand. 344 // Cases[0] is the number of case constants. 345 // Cases[1] is the size of Val in bits. 346 // Cases[2:] are the case constants. 347 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases); 348 349This interface is a subject to change. 350The current implementation is not thread-safe and thus can be safely used only for single-threaded targets. 351 352Output directory 353================ 354 355By default, .sancov files are created in the current working directory. 356This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: 357 358.. code-block:: console 359 360 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo 361 % ls -l /tmp/cov/*sancov 362 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov 363 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov 364 365Sudden death 366============ 367 368Normally, coverage data is collected in memory and saved to disk when the 369program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when 370``__sanitizer_cov_dump()`` is called. 371 372If the program ends with a signal that ASan does not handle (or can not handle 373at all, like SIGKILL), coverage data will be lost. This is a big problem on 374Android, where SIGKILL is a normal way of evicting applications from memory. 375 376With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a 377memory-mapped file as soon as it collected. 378 379.. code-block:: console 380 381 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out 382 main 383 % ls 384 7036.sancov.map 7036.sancov.raw a.out 385 % sancov.py rawunpack 7036.sancov.raw 386 sancov.py: reading map 7036.sancov.map 387 sancov.py: unpacking 7036.sancov.raw 388 writing 1 PCs to a.out.7036.sancov 389 % sancov.py print a.out.7036.sancov 390 sancov.py: read 1 PCs from a.out.7036.sancov 391 sancov.py: 1 files merged; 1 PCs total 392 0x4b2bae 393 394Note that on 64-bit platforms, this method writes 2x more data than the default, 395because it stores full PC values instead of 32-bit offsets. 396 397In-process fuzzing 398================== 399 400Coverage data could be useful for fuzzers and sometimes it is preferable to run 401a fuzzer in the same process as the code being fuzzed (in-process fuzzer). 402 403You can use ``__sanitizer_get_total_unique_coverage()`` from 404``<sanitizer/coverage_interface.h>`` which returns the number of currently 405covered entities in the program. This will tell the fuzzer if the coverage has 406increased after testing every new input. 407 408If a fuzzer finds a bug in the ASan run, you will need to save the reproducer 409before exiting the process. Use ``__asan_set_death_callback`` from 410``<sanitizer/asan_interface.h>`` to do that. 411 412An example of such fuzzer can be found in `the LLVM tree 413<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_. 414 415Performance 416=========== 417 418This coverage implementation is **fast**. With function-level coverage 419(``-fsanitize-coverage=func``) the overhead is not measurable. With 420basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies 421between 0 and 25%. 422 423============== ========= ========= ========= ========= ========= ========= 424 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2 425============== ========= ========= ========= ========= ========= ========= 426 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12 427 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18 428 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11 429 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05 430 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19 431 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03 432 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21 433462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09 434 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05 435 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12 436 473.astar 658.00 652.00 0.99 715.00 1.09 1.10 437 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19 438 433.milc 616.00 627.00 1.02 627.00 1.02 1.00 439 444.namd 602.00 601.00 1.00 654.00 1.09 1.09 440 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03 441 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07 442 453.povray 427.00 434.00 1.02 495.00 1.16 1.14 443 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99 444 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08 445============== ========= ========= ========= ========= ========= ========= 446 447Why another coverage? 448===================== 449 450Why did we implement yet another code coverage? 451 * We needed something that is lightning fast, plays well with 452 AddressSanitizer, and does not significantly increase the binary size. 453 * Traditional coverage implementations based in global counters 454 `suffer from contention on counters 455 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_. 456