1# EINTR 2 3## The problem 4 5If your code is blocked in a system call when a signal needs to be delivered, 6the kernel needs to interrupt that system call. For something like a read(2) 7call where some data has already been read, the call can just return with 8what data it has. (This is one reason why read(2) sometimes returns less data 9than you asked for, even though more data is available. It also explains why 10such behavior is relatively rare, and a cause of bugs.) 11 12But what if read(2) hasn't read any data yet? Or what if you've made some other 13system call, for which there is no equivalent "partial" success, such as 14poll(2)? In poll(2)'s case, there's either something to report (in which 15case the system call would already have returned), or there isn't. 16 17The kernel's solution to this problem is to return failure (-1) and set 18errno to `EINTR`: "interrupted system call". 19 20### Can I just opt out? 21 22Technically, yes. In practice on Android, no. Technically if a signal's 23disposition is set to ignore, the kernel doesn't even have to deliver the 24signal, so your code can just stay blocked in the system call it was already 25making. In practice, though, you can't guarantee that all signals are either 26ignored or will kill your process... Unless you're a small single-threaded 27C program that doesn't use any libraries, you can't realistically make this 28guarantee. If any code has installed a signal handler, you need to cope with 29`EINTR`. And if you're an Android app, the zygote has already installed a whole 30host of signal handlers before your code even starts to run. (And, no, you 31can't ignore them instead, because some of them are critical to how ART works. 32For example: Java `NullPointerException`s are optimized by trapping `SIGSEGV` 33signals so that the code generated by the JIT doesn't have to insert explicit 34null pointer checks.) 35 36### Why don't I see this in Java code? 37 38You won't see this in Java because the decision was taken to hide this issue 39from Java programmers. Basically, all the libraries like `java.io.*` and 40`java.net.*` hide this from you. (The same should be true of `android.*` too, 41so it's worth filing bugs if you find any exceptions that aren't documented!) 42 43### Why doesn't libc do that too? 44 45For most people, things would be easier if libc hid this implementation 46detail. But there are legitimate use cases, and automatically retrying 47would hide those. For example, you might want to use signals and `EINTR` 48to interrupt another thread (in fact, that's how interruption of threads 49doing I/O works in Java behind the scenes!). As usual, C/C++ choose the more 50powerful but more error-prone option. 51 52## The fix 53 54### Easy cases 55 56In most cases, the fix is simple: wrap the system call with the 57`TEMP_FAILURE_RETRY` macro. This is basically a while loop that retries the 58system call as long as the result is -1 and errno is `EINTR`. 59 60So, for example: 61``` 62 n = read(fd, buf, buf_size); // BAD! 63 n = TEMP_FAILURE_RETRY(read(fd, buf, buf_size)); // GOOD! 64``` 65 66### close(2) 67 68TL;DR: *never* wrap close(2) calls with `TEMP_FAILURE_RETRY`. 69 70The case of close(2) is complicated. POSIX explicitly says that close(2) 71shouldn't close the file descriptor if it returns `EINTR`, but that's *not* 72true on Linux (and thus on Android). See 73[Returning EINTR from close()](https://lwn.net/Articles/576478/) 74for more discussion. 75 76Given that most Android code (and especially "all apps") are multithreaded, 77retrying close(2) is especially dangerous because the file descriptor might 78already have been reused by another thread, so the "retry" succeeds, but 79actually closes a *different* file descriptor belonging to a *different* 80thread. 81 82### Timeouts 83 84System calls with timeouts are the other interesting case where "just wrap 85everything with `TEMP_FAILURE_RETRY()`" doesn't work. Because some amount of 86time will have elapsed, you'll want to recalculate the timeout. Otherwise you 87can end up with your 1 minute timeout being indefinite if you're receiving 88signals at least once per minute, say. In this case you'll want to do 89something like adding an explicit loop around your system call, calculating 90the timeout _inside_ the loop, and using `continue` each time the system call 91fails with `EINTR`. 92