1## fdsan
2
3[TOC]
4
5fdsan is a file descriptor sanitizer added to Android in API level 29.
6In API level 29, fdsan warns when it finds a bug.
7In API level 30, fdsan aborts when it finds a bug.
8
9### Background
10*What problem is fdsan trying to solve? Why should I care?*
11
12fdsan (file descriptor sanitizer) detects mishandling of file descriptor ownership, which tend to manifest as *use-after-close* and *double-close*. These errors are direct analogues of the memory allocation *use-after-free* and *double-free* bugs, but tend to be much more difficult to diagnose and fix. With `malloc` and `free`, implementations have free reign to detect errors and abort on double free. File descriptors, on the other hand, are mandated by the POSIX standard to be allocated with the lowest available number being returned for new allocations. As a result, many file descriptor bugs can *never* be noticed on the thread on which the error occurred, and will manifest as "impossible" behavior on another thread.
13
14For example, given two threads running the following code:
15```cpp
16void thread_one() {
17    int fd = open("/dev/null", O_RDONLY);
18    close(fd);
19    close(fd);
20}
21
22void thread_two() {
23    while (true) {
24        int fd = open("log", O_WRONLY | O_APPEND);
25        if (write(fd, "foo", 3) != 3) {
26            err(1, "write failed!");
27        }
28    }
29}
30```
31the following interleaving is possible:
32```cpp
33thread one                                thread two
34open("/dev/null", O_RDONLY) = 123
35close(123) = 0
36                                          open("log", O_WRONLY | APPEND) = 123
37close(123) = 0
38                                          write(123, "foo", 3) = -1 (EBADF)
39                                          err(1, "write failed!")
40```
41
42Assertion failures are probably the most innocuous result that can arise from these bugs: silent data corruption [[1](#footnotes), [2](#footnotes)] or security vulnerabilities are also possible (e.g. suppose thread two was saving user data to disk when a third thread came in and opened a socket to the Internet).
43
44### Design
45*What does fdsan do?*
46
47fdsan attempts to detect and/or prevent file descriptor mismanagement by enforcing file descriptor ownership. Like how most memory allocations can have their ownership handled by types such as `std::unique_ptr`, almost all file descriptors can be associated with a unique owner which is responsible for their closure. fdsan provides functions to associate a file descriptor with an owner; if someone tries to close a file descriptor that they don't own, depending on configuration, either a warning is emitted, or the process aborts.
48
49The way this is implemented is by providing functions to set a 64-bit closure tag on a file descriptor. The tag consists of an 8-bit type byte that identifies the type of the owner (`enum android_fdan_owner_type` in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h)), and a 56-bit value. The value should ideally be something that uniquely identifies the object (object address for native objects and `System.identityHashCode` for Java objects), but in cases where it's hard to derive an identifier for the "owner" that should close a file descriptor, even using the same value for all file descriptors in the module can be useful, since it'll catch other code that closes your file descriptors.
50
51If a file descriptor that's been marked with a tag is closed with an incorrect tag, or without a tag, we know something has gone wrong, and can generate diagnostics or abort.
52
53### Enabling fdsan (as a user)
54*How do I use fdsan?*
55
56fdsan has four severity levels:
57 - disabled (`ANDROID_FDSAN_ERROR_LEVEL_DISABLED`)
58 - warn-once (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ONCE`)
59   - Upon detecting an error, emit a warning to logcat, generate a tombstone, and then continue execution with fdsan disabled.
60 - warn-always (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ALWAYS`)
61   - Same as warn-once, except without disabling after the first warning.
62 - fatal (`ANDROID_FDSAN_ERROR_LEVEL_FATAL`)
63   - Abort upon detecting an error.
64
65In Android Q, fdsan has a global default of warn-once. fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h).
66
67The likelihood of fdsan catching a file descriptor error is proportional to the percentage of file descriptors in your process that are tagged with an owner.
68
69### Using fdsan to fix a bug
70*No, really, how do I use fdsan?*
71
72Let's look at a simple contrived example that uses sleeps to force a particular interleaving of thread execution.
73
74```cpp
75#include <err.h>
76#include <unistd.h>
77
78#include <chrono>
79#include <thread>
80#include <vector>
81
82#include <android-base/unique_fd.h>
83
84using namespace std::chrono_literals;
85using std::this_thread::sleep_for;
86
87void victim() {
88  sleep_for(300ms);
89  int fd = dup(STDOUT_FILENO);
90  sleep_for(200ms);
91  ssize_t rc = write(fd, "good\n", 5);
92  if (rc == -1) {
93    err(1, "good failed to write?!");
94  }
95  close(fd);
96}
97
98void bystander() {
99  sleep_for(100ms);
100  int fd = dup(STDOUT_FILENO);
101  sleep_for(300ms);
102  close(fd);
103}
104
105void offender() {
106  int fd = dup(STDOUT_FILENO);
107  close(fd);
108  sleep_for(200ms);
109  close(fd);
110}
111
112int main() {
113  std::vector<std::thread> threads;
114  for (auto function : { victim, bystander, offender }) {
115    threads.emplace_back(function);
116  }
117  for (auto& thread : threads) {
118    thread.join();
119  }
120}
121```
122
123When running the program, the threads' executions will be interleaved as follows:
124
125```cpp
126// victim                         bystander                       offender
127                                                                  int fd = dup(1); // 3
128                                                                  close(3);
129                                  int fd = dup(1); // 3
130                                                                  close(3);
131int fd = dup(1); // 3
132                                  close(3);
133write(3, "good\n") = ��;
134```
135
136which results in the following output:
137
138    fdsan_test: good failed to write?!: Bad file descriptor
139
140This implies that either we're accidentally closing out file descriptor too early, or someone else is helpfully closing it for us. Let's use `android::base::unique_fd` in `victim` to guard the file descriptor with fdsan:
141
142```diff
143--- a/fdsan_test.cpp
144+++ b/fdsan_test.cpp
145@@ -12,13 +12,12 @@ using std::this_thread::sleep_for;
146
147 void victim() {
148   sleep_for(200ms);
149-  int fd = dup(STDOUT_FILENO);
150+  android::base::unique_fd fd(dup(STDOUT_FILENO));
151   sleep_for(200ms);
152   ssize_t rc = write(fd, "good\n", 5);
153   if (rc == -1) {
154     err(1, "good failed to write?!");
155   }
156-  close(fd);
157 }
158
159 void bystander() {
160```
161
162Now that we've guarded the file descriptor with fdsan, we should be able to find where the double close is:
163
164```
165pid: 25587, tid: 25589, name: fdsan_test  >>> fdsan_test <<<
166signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
167Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x7bf15dc448'
168    x0  0000000000000000  x1  00000000000063f5  x2  0000000000000023  x3  0000007bf14de338
169    x4  0000007bf14de3b8  x5  3463643531666237  x6  3463643531666237  x7  3834346364353166
170    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000035
171    x12 0000007bf1bebcfa  x13 0000007bf14ddf0a  x14 0000007bf14ddf0a  x15 0000000000000000
172    x16 0000007bf1c33048  x17 0000007bf1ba9990  x18 0000000000000000  x19 00000000000063f3
173    x20 00000000000063f5  x21 0000007bf14de588  x22 0000007bf1f1b864  x23 0000000000000001
174    x24 0000007bf14de130  x25 0000007bf13e1000  x26 0000007bf1f1f580  x27 0000005ab43ab8f0
175    x28 0000000000000000  x29 0000007bf14de400
176    sp  0000007bf14ddff0  lr  0000007bf1b5fd6c  pc  0000007bf1b5fd90
177
178backtrace:
179    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
180    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
181    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
182    #03 pc 00000000000003e4  /system/bin/fdsan_test (bystander()+84)
183    #04 pc 0000000000000918  /system/bin/fdsan_test
184    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
185    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
186```
187
188...in the obviously correct bystander? What's going on here?
189
190The reason for this is (hopefully!) not a bug in fdsan, and will commonly be seen when tracking down double-closes in processes that have sparse fdsan coverage. What actually happened is that the culprit closed `bystander`'s file descriptor between its open and close, which resulted in `bystander` being blamed for closing `victim`'s fd. If we store `bystander`'s fd in a `unique_fd` as well, we should get something more useful:
191```diff
192--- a/tmp/fdsan_test.cpp
193+++ b/tmp/fdsan_test.cpp
194@@ -23,9 +23,8 @@ void victim() {
195
196 void bystander() {
197   sleep_for(100ms);
198-  int fd = dup(STDOUT_FILENO);
199+  android::base::unique_fd fd(dup(STDOUT_FILENO));
200   sleep_for(200ms);
201-  close(fd);
202 }
203```
204giving us:
205```
206pid: 25779, tid: 25782, name: fdsan_test  >>> fdsan_test <<<
207signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
208Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x6fef9ff448'
209    x0  0000000000000000  x1  00000000000064b6  x2  0000000000000023  x3  0000006fef901338
210    x4  0000006fef9013b8  x5  3466663966656636  x6  3466663966656636  x7  3834346666396665
211    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000039
212    x12 0000006ff0055cfa  x13 0000006fef900f0a  x14 0000006fef900f0a  x15 0000000000000000
213    x16 0000006ff009d048  x17 0000006ff0013990  x18 0000000000000000  x19 00000000000064b3
214    x20 00000000000064b6  x21 0000006fef901588  x22 0000006ff04ff864  x23 0000000000000001
215    x24 0000006fef901130  x25 0000006fef804000  x26 0000006ff0503580  x27 0000006368aa18f8
216    x28 0000000000000000  x29 0000006fef901400
217    sp  0000006fef900ff0  lr  0000006feffc9d6c  pc  0000006feffc9d90
218
219backtrace:
220    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
221    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
222    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
223    #03 pc 000000000000045c  /system/bin/fdsan_test (offender()+68)
224    #04 pc 0000000000000920  /system/bin/fdsan_test
225    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
226    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
227```
228
229Hooray!
230
231In a real application, things are probably not going to be as detectable or reproducible as our toy example, which is a good reason to try to maximize the usage of fdsan-enabled types like `unique_fd` and `ParcelFileDescriptor`, to improve the odds that double closes in other code get detected.
232
233### Enabling fdsan (as a C++ library implementer)
234
235fdsan operates via two main primitives. `android_fdsan_exchange_owner_tag` modifies a file descriptor's close tag, and `android_fdsan_close_with_tag` closes a file descriptor with its tag. In the `<android/fdsan.h>` header, these are marked with `__attribute__((weak))`, so instead of passing down the platform version from JNI, availability of the functions can be queried directly. An example implementation of unique_fd follows:
236
237```cpp
238/*
239 * Copyright (C) 2018 The Android Open Source Project
240 * All rights reserved.
241 *
242 * Redistribution and use in source and binary forms, with or without
243 * modification, are permitted provided that the following conditions
244 * are met:
245 *  * Redistributions of source code must retain the above copyright
246 *    notice, this list of conditions and the following disclaimer.
247 *  * Redistributions in binary form must reproduce the above copyright
248 *    notice, this list of conditions and the following disclaimer in
249 *    the documentation and/or other materials provided with the
250 *    distribution.
251 *
252 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
253 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
254 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
255 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
256 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
257 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
258 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
259 * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
260 * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
261 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
262 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
263 * SUCH DAMAGE.
264 */
265
266#pragma once
267
268#include <android/fdsan.h>
269#include <unistd.h>
270
271#include <utility>
272
273struct unique_fd {
274    unique_fd() = default;
275
276    explicit unique_fd(int fd) {
277        reset(fd);
278    }
279
280    unique_fd(const unique_fd& copy) = delete;
281    unique_fd(unique_fd&& move) {
282        *this = std::move(move);
283    }
284
285    ~unique_fd() {
286        reset();
287    }
288
289    unique_fd& operator=(const unique_fd& copy) = delete;
290    unique_fd& operator=(unique_fd&& move) {
291        if (this == &move) {
292            return *this;
293        }
294
295        reset();
296
297        if (move.fd_ != -1) {
298            fd_ = move.fd_;
299            move.fd_ = -1;
300
301            // Acquire ownership from the moved-from object.
302            exchange_tag(fd_, move.tag(), tag());
303        }
304
305        return *this;
306    }
307
308    int get() { return fd_; }
309
310    int release() {
311        if (fd_ == -1) {
312            return -1;
313        }
314
315        int fd = fd_;
316        fd_ = -1;
317
318        // Release ownership.
319        exchange_tag(fd, tag(), 0);
320        return fd;
321    }
322
323    void reset(int new_fd = -1) {
324        if (fd_ != -1) {
325            close(fd_, tag());
326            fd_ = -1;
327        }
328
329        if (new_fd != -1) {
330            fd_ = new_fd;
331
332            // Acquire ownership of the presumably unowned fd.
333            exchange_tag(fd_, 0, tag());
334        }
335    }
336
337  private:
338    int fd_ = -1;
339
340    // The obvious choice of tag to use is the address of the object.
341    uint64_t tag() {
342        return reinterpret_cast<uint64_t>(this);
343    }
344
345    // These functions are marked with __attribute__((weak)), so that their
346    // availability can be determined at runtime. These wrappers will use them
347    // if available, and fall back to no-ops or regular close on pre-Q devices.
348    static void exchange_tag(int fd, uint64_t old_tag, uint64_t new_tag) {
349        if (android_fdsan_exchange_owner_tag) {
350            android_fdsan_exchange_owner_tag(fd, old_tag, new_tag);
351        }
352    }
353
354    static int close(int fd, uint64_t tag) {
355        if (android_fdsan_close_with_tag) {
356            return android_fdsan_close_with_tag(fd, tag);
357        } else {
358            return ::close(fd);
359        }
360    }
361};
362```
363
364### Frequently seen bugs
365 * Native APIs not making it clear when they take ownership of a file descriptor. <br/>
366   * Solution: accept `unique_fd` instead of `int` in functions that take ownership.
367   * [Example one](https://android-review.googlesource.com/c/platform/system/core/+/721985), [two](https://android-review.googlesource.com/c/platform/frameworks/native/+/709451)
368 * Receiving a `ParcelFileDescriptor` via Intent, and then passing it into JNI code that ends up calling close on it. <br/>
369   * Solution: ¯\\\_(ツ)\_/¯. Use fdsan?
370   * [Example one](https://android-review.googlesource.com/c/platform/system/bt/+/710104), [two](https://android-review.googlesource.com/c/platform/frameworks/base/+/732305)
371
372### Footnotes
3731. [How To Corrupt An SQLite Database File](https://www.sqlite.org/howtocorrupt.html#_continuing_to_use_a_file_descriptor_after_it_has_been_closed)
374
3752. [<b><i>50%</i></b> of Facebook's iOS crashes caused by a file descriptor double close leading to SQLite database corruption](https://code.fb.com/ios/debugging-file-corruption-on-ios/)
376