1=================
2DataFlowSanitizer
3=================
4
5.. toctree::
6   :hidden:
7
8   DataFlowSanitizerDesign
9
10.. contents::
11   :local:
12
13Introduction
14============
15
16DataFlowSanitizer is a generalised dynamic data flow analysis.
17
18Unlike other Sanitizer tools, this tool is not designed to detect a
19specific class of bugs on its own.  Instead, it provides a generic
20dynamic data flow analysis framework to be used by clients to help
21detect application-specific issues within their own code.
22
23How to build libc++ with DFSan
24==============================
25
26DFSan requires either all of your code to be instrumented or for uninstrumented
27functions to be listed as ``uninstrumented`` in the `ABI list`_.
28
29If you'd like to have instrumented libc++ functions, then you need to build it
30with DFSan instrumentation from source. Here is an example of how to build
31libc++ and the libc++ ABI with data flow sanitizer instrumentation.
32
33.. code-block:: console
34
35  cd libcxx-build
36
37  # An example using ninja
38  cmake -GNinja path/to/llvm-project/llvm \
39    -DCMAKE_C_COMPILER=clang \
40    -DCMAKE_CXX_COMPILER=clang++ \
41    -DLLVM_USE_SANITIZER="DataFlow" \
42    -DLLVM_ENABLE_LIBCXX=ON \
43    -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi"
44
45  ninja cxx cxxabi
46
47Note: Ensure you are building with a sufficiently new version of Clang.
48
49Usage
50=====
51
52With no program changes, applying DataFlowSanitizer to a program
53will not alter its behavior.  To use DataFlowSanitizer, the program
54uses API functions to apply tags to data to cause it to be tracked, and to
55check the tag of a specific data item.  DataFlowSanitizer manages
56the propagation of tags through the program according to its data flow.
57
58The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
59For further information about each function, please refer to the header
60file.
61
62.. _ABI list:
63
64ABI List
65--------
66
67DataFlowSanitizer uses a list of functions known as an ABI list to decide
68whether a call to a specific function should use the operating system's native
69ABI or whether it should use a variant of this ABI that also propagates labels
70through function parameters and return values.  The ABI list file also controls
71how labels are propagated in the former case.  DataFlowSanitizer comes with a
72default ABI list which is intended to eventually cover the glibc library on
73Linux but it may become necessary for users to extend the ABI list in cases
74where a particular library or function cannot be instrumented (e.g. because
75it is implemented in assembly or another language which DataFlowSanitizer does
76not support) or a function is called from a library or function which cannot
77be instrumented.
78
79DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
80The pass treats every function in the ``uninstrumented`` category in the
81ABI list file as conforming to the native ABI.  Unless the ABI list contains
82additional categories for those functions, a call to one of those functions
83will produce a warning message, as the labelling behavior of the function
84is unknown.  The other supported categories are ``discard``, ``functional``
85and ``custom``.
86
87* ``discard`` -- To the extent that this function writes to (user-accessible)
88  memory, it also updates labels in shadow memory (this condition is trivially
89  satisfied for functions which do not write to user-accessible memory).  Its
90  return value is unlabelled.
91* ``functional`` -- Like ``discard``, except that the label of its return value
92  is the union of the label of its arguments.
93* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
94  is called, where ``F`` is the name of the function.  This function may wrap
95  the original function or provide its own implementation.  This category is
96  generally used for uninstrumentable functions which write to user-accessible
97  memory or which have more complex label propagation behavior.  The signature
98  of ``__dfsw_F`` is based on that of ``F`` with each argument having a
99  label of type ``dfsan_label`` appended to the argument list.  If ``F``
100  is of non-void return type a final argument of type ``dfsan_label *``
101  is appended to which the custom function can store the label for the
102  return value.  For example:
103
104.. code-block:: c++
105
106  void f(int x);
107  void __dfsw_f(int x, dfsan_label x_label);
108
109  void *memcpy(void *dest, const void *src, size_t n);
110  void *__dfsw_memcpy(void *dest, const void *src, size_t n,
111                      dfsan_label dest_label, dfsan_label src_label,
112                      dfsan_label n_label, dfsan_label *ret_label);
113
114If a function defined in the translation unit being compiled belongs to the
115``uninstrumented`` category, it will be compiled so as to conform to the
116native ABI.  Its arguments will be assumed to be unlabelled, but it will
117propagate labels in shadow memory.
118
119For example:
120
121.. code-block:: none
122
123  # main is called by the C runtime using the native ABI.
124  fun:main=uninstrumented
125  fun:main=discard
126
127  # malloc only writes to its internal data structures, not user-accessible memory.
128  fun:malloc=uninstrumented
129  fun:malloc=discard
130
131  # tolower is a pure function.
132  fun:tolower=uninstrumented
133  fun:tolower=functional
134
135  # memcpy needs to copy the shadow from the source to the destination region.
136  # This is done in a custom function.
137  fun:memcpy=uninstrumented
138  fun:memcpy=custom
139
140Example
141=======
142
143The following program demonstrates label propagation by checking that
144the correct labels are propagated.
145
146.. code-block:: c++
147
148  #include <sanitizer/dfsan_interface.h>
149  #include <assert.h>
150
151  int main(void) {
152    int i = 1;
153    dfsan_label i_label = dfsan_create_label("i", 0);
154    dfsan_set_label(i_label, &i, sizeof(i));
155
156    int j = 2;
157    dfsan_label j_label = dfsan_create_label("j", 0);
158    dfsan_set_label(j_label, &j, sizeof(j));
159
160    int k = 3;
161    dfsan_label k_label = dfsan_create_label("k", 0);
162    dfsan_set_label(k_label, &k, sizeof(k));
163
164    dfsan_label ij_label = dfsan_get_label(i + j);
165    assert(dfsan_has_label(ij_label, i_label));
166    assert(dfsan_has_label(ij_label, j_label));
167    assert(!dfsan_has_label(ij_label, k_label));
168
169    dfsan_label ijk_label = dfsan_get_label(i + j + k);
170    assert(dfsan_has_label(ijk_label, i_label));
171    assert(dfsan_has_label(ijk_label, j_label));
172    assert(dfsan_has_label(ijk_label, k_label));
173
174    return 0;
175  }
176
177fast16labels mode
178=================
179
180If you need 16 or fewer labels, you can use fast16labels instrumentation for
181less CPU and code size overhead.  To use fast16labels instrumentation, you'll
182need to specify `-fsanitize=dataflow -mllvm -dfsan-fast-16-labels` in your
183compile and link commands and use a modified API for creating and managing
184labels.
185
186In fast16labels mode, base labels are simply 16-bit unsigned integers that are
187powers of 2 (i.e. 1, 2, 4, 8, ..., 32768), and union labels are created by ORing
188base labels.  In this mode DFSan does not manage any label metadata, so the
189functions `dfsan_create_label`, `dfsan_union`, `dfsan_get_label_info`,
190`dfsan_has_label`, `dfsan_has_label_with_desc`, `dfsan_get_label_count`, and
191`dfsan_dump_labels` are unsupported.  Instead of using them, the user should
192maintain any necessary metadata about base labels themselves.
193
194For example:
195
196.. code-block:: c++
197
198  #include <sanitizer/dfsan_interface.h>
199  #include <assert.h>
200
201  int main(void) {
202    int i = 100;
203    int j = 200;
204    int k = 300;
205    dfsan_label i_label = 1;
206    dfsan_label j_label = 2;
207    dfsan_label k_label = 4;
208    dfsan_set_label(i_label, &i, sizeof(i));
209    dfsan_set_label(j_label, &j, sizeof(j));
210    dfsan_set_label(k_label, &k, sizeof(k));
211
212    dfsan_label ij_label = dfsan_get_label(i + j);
213
214    assert(ij_label & i_label);  // ij_label has i_label
215    assert(ij_label & j_label);  // ij_label has j_label
216    assert(!(ij_label & k_label));  // ij_label doesn't have k_label
217    assert(ij_label == 3);  // Verifies all of the above
218
219    dfsan_label ijk_label = dfsan_get_label(i + j + k);
220
221    assert(ijk_label & i_label);  // ijk_label has i_label
222    assert(ijk_label & j_label);  // ijk_label has j_label
223    assert(ijk_label & k_label);  // ijk_label has k_label
224    assert(ijk_label == 7);  // Verifies all of the above
225
226    return 0;
227  }
228
229Current status
230==============
231
232DataFlowSanitizer is a work in progress, currently under development for
233x86\_64 Linux.
234
235Design
236======
237
238Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
239