1================= 2DataFlowSanitizer 3================= 4 5.. toctree:: 6 :hidden: 7 8 DataFlowSanitizerDesign 9 10.. contents:: 11 :local: 12 13Introduction 14============ 15 16DataFlowSanitizer is a generalised dynamic data flow analysis. 17 18Unlike other Sanitizer tools, this tool is not designed to detect a 19specific class of bugs on its own. Instead, it provides a generic 20dynamic data flow analysis framework to be used by clients to help 21detect application-specific issues within their own code. 22 23How to build libc++ with DFSan 24============================== 25 26DFSan requires either all of your code to be instrumented or for uninstrumented 27functions to be listed as ``uninstrumented`` in the `ABI list`_. 28 29If you'd like to have instrumented libc++ functions, then you need to build it 30with DFSan instrumentation from source. Here is an example of how to build 31libc++ and the libc++ ABI with data flow sanitizer instrumentation. 32 33.. code-block:: console 34 35 cd libcxx-build 36 37 # An example using ninja 38 cmake -GNinja path/to/llvm-project/llvm \ 39 -DCMAKE_C_COMPILER=clang \ 40 -DCMAKE_CXX_COMPILER=clang++ \ 41 -DLLVM_USE_SANITIZER="DataFlow" \ 42 -DLLVM_ENABLE_LIBCXX=ON \ 43 -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi" 44 45 ninja cxx cxxabi 46 47Note: Ensure you are building with a sufficiently new version of Clang. 48 49Usage 50===== 51 52With no program changes, applying DataFlowSanitizer to a program 53will not alter its behavior. To use DataFlowSanitizer, the program 54uses API functions to apply tags to data to cause it to be tracked, and to 55check the tag of a specific data item. DataFlowSanitizer manages 56the propagation of tags through the program according to its data flow. 57 58The APIs are defined in the header file ``sanitizer/dfsan_interface.h``. 59For further information about each function, please refer to the header 60file. 61 62.. _ABI list: 63 64ABI List 65-------- 66 67DataFlowSanitizer uses a list of functions known as an ABI list to decide 68whether a call to a specific function should use the operating system's native 69ABI or whether it should use a variant of this ABI that also propagates labels 70through function parameters and return values. The ABI list file also controls 71how labels are propagated in the former case. DataFlowSanitizer comes with a 72default ABI list which is intended to eventually cover the glibc library on 73Linux but it may become necessary for users to extend the ABI list in cases 74where a particular library or function cannot be instrumented (e.g. because 75it is implemented in assembly or another language which DataFlowSanitizer does 76not support) or a function is called from a library or function which cannot 77be instrumented. 78 79DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`. 80The pass treats every function in the ``uninstrumented`` category in the 81ABI list file as conforming to the native ABI. Unless the ABI list contains 82additional categories for those functions, a call to one of those functions 83will produce a warning message, as the labelling behavior of the function 84is unknown. The other supported categories are ``discard``, ``functional`` 85and ``custom``. 86 87* ``discard`` -- To the extent that this function writes to (user-accessible) 88 memory, it also updates labels in shadow memory (this condition is trivially 89 satisfied for functions which do not write to user-accessible memory). Its 90 return value is unlabelled. 91* ``functional`` -- Like ``discard``, except that the label of its return value 92 is the union of the label of its arguments. 93* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F`` 94 is called, where ``F`` is the name of the function. This function may wrap 95 the original function or provide its own implementation. This category is 96 generally used for uninstrumentable functions which write to user-accessible 97 memory or which have more complex label propagation behavior. The signature 98 of ``__dfsw_F`` is based on that of ``F`` with each argument having a 99 label of type ``dfsan_label`` appended to the argument list. If ``F`` 100 is of non-void return type a final argument of type ``dfsan_label *`` 101 is appended to which the custom function can store the label for the 102 return value. For example: 103 104.. code-block:: c++ 105 106 void f(int x); 107 void __dfsw_f(int x, dfsan_label x_label); 108 109 void *memcpy(void *dest, const void *src, size_t n); 110 void *__dfsw_memcpy(void *dest, const void *src, size_t n, 111 dfsan_label dest_label, dfsan_label src_label, 112 dfsan_label n_label, dfsan_label *ret_label); 113 114If a function defined in the translation unit being compiled belongs to the 115``uninstrumented`` category, it will be compiled so as to conform to the 116native ABI. Its arguments will be assumed to be unlabelled, but it will 117propagate labels in shadow memory. 118 119For example: 120 121.. code-block:: none 122 123 # main is called by the C runtime using the native ABI. 124 fun:main=uninstrumented 125 fun:main=discard 126 127 # malloc only writes to its internal data structures, not user-accessible memory. 128 fun:malloc=uninstrumented 129 fun:malloc=discard 130 131 # tolower is a pure function. 132 fun:tolower=uninstrumented 133 fun:tolower=functional 134 135 # memcpy needs to copy the shadow from the source to the destination region. 136 # This is done in a custom function. 137 fun:memcpy=uninstrumented 138 fun:memcpy=custom 139 140Example 141======= 142 143The following program demonstrates label propagation by checking that 144the correct labels are propagated. 145 146.. code-block:: c++ 147 148 #include <sanitizer/dfsan_interface.h> 149 #include <assert.h> 150 151 int main(void) { 152 int i = 1; 153 dfsan_label i_label = dfsan_create_label("i", 0); 154 dfsan_set_label(i_label, &i, sizeof(i)); 155 156 int j = 2; 157 dfsan_label j_label = dfsan_create_label("j", 0); 158 dfsan_set_label(j_label, &j, sizeof(j)); 159 160 int k = 3; 161 dfsan_label k_label = dfsan_create_label("k", 0); 162 dfsan_set_label(k_label, &k, sizeof(k)); 163 164 dfsan_label ij_label = dfsan_get_label(i + j); 165 assert(dfsan_has_label(ij_label, i_label)); 166 assert(dfsan_has_label(ij_label, j_label)); 167 assert(!dfsan_has_label(ij_label, k_label)); 168 169 dfsan_label ijk_label = dfsan_get_label(i + j + k); 170 assert(dfsan_has_label(ijk_label, i_label)); 171 assert(dfsan_has_label(ijk_label, j_label)); 172 assert(dfsan_has_label(ijk_label, k_label)); 173 174 return 0; 175 } 176 177fast16labels mode 178================= 179 180If you need 16 or fewer labels, you can use fast16labels instrumentation for 181less CPU and code size overhead. To use fast16labels instrumentation, you'll 182need to specify `-fsanitize=dataflow -mllvm -dfsan-fast-16-labels` in your 183compile and link commands and use a modified API for creating and managing 184labels. 185 186In fast16labels mode, base labels are simply 16-bit unsigned integers that are 187powers of 2 (i.e. 1, 2, 4, 8, ..., 32768), and union labels are created by ORing 188base labels. In this mode DFSan does not manage any label metadata, so the 189functions `dfsan_create_label`, `dfsan_union`, `dfsan_get_label_info`, 190`dfsan_has_label`, `dfsan_has_label_with_desc`, `dfsan_get_label_count`, and 191`dfsan_dump_labels` are unsupported. Instead of using them, the user should 192maintain any necessary metadata about base labels themselves. 193 194For example: 195 196.. code-block:: c++ 197 198 #include <sanitizer/dfsan_interface.h> 199 #include <assert.h> 200 201 int main(void) { 202 int i = 100; 203 int j = 200; 204 int k = 300; 205 dfsan_label i_label = 1; 206 dfsan_label j_label = 2; 207 dfsan_label k_label = 4; 208 dfsan_set_label(i_label, &i, sizeof(i)); 209 dfsan_set_label(j_label, &j, sizeof(j)); 210 dfsan_set_label(k_label, &k, sizeof(k)); 211 212 dfsan_label ij_label = dfsan_get_label(i + j); 213 214 assert(ij_label & i_label); // ij_label has i_label 215 assert(ij_label & j_label); // ij_label has j_label 216 assert(!(ij_label & k_label)); // ij_label doesn't have k_label 217 assert(ij_label == 3); // Verifies all of the above 218 219 dfsan_label ijk_label = dfsan_get_label(i + j + k); 220 221 assert(ijk_label & i_label); // ijk_label has i_label 222 assert(ijk_label & j_label); // ijk_label has j_label 223 assert(ijk_label & k_label); // ijk_label has k_label 224 assert(ijk_label == 7); // Verifies all of the above 225 226 return 0; 227 } 228 229Current status 230============== 231 232DataFlowSanitizer is a work in progress, currently under development for 233x86\_64 Linux. 234 235Design 236====== 237 238Please refer to the :doc:`design document<DataFlowSanitizerDesign>`. 239