1Mechanisms for Coordination Between Garbage Collector and Mutator 2----------------------------------------------------------------- 3 4Most garbage collection work can proceed concurrently with the client or 5mutator Java threads. But in certain places, for example while tracing from 6thread stacks, the garbage collector needs to ensure that Java data processed 7by the collector is consistent and complete. At these points, the mutators 8should not hold references to the heap that are invisible to the garbage 9collector. And they should not be modifying the data that is visible to the 10collector. 11 12Logically, the collector and mutator share a reader-writer lock on the Java 13heap and associated data structures. Mutators hold the lock in reader or shared mode 14while running Java code or touching heap-related data structures. The collector 15holds the lock in writer or exclusive mode while it needs the heap data 16structures to be stable. However, this reader-writer lock has a very customized 17implementation that also provides additional facilities, such as the ability 18to exclude only a single thread, so that we can specifically examine its heap 19references. 20 21In order to ensure consistency of the Java data, the compiler inserts "suspend 22points", sometimes also called "safe points" into the code. These allow a thread 23to respond to external requests. 24 25Whenever a thread is runnable, i.e. whenever a thread logically holds the 26mutator lock in shared mode, it is expected to regularly execute such a suspend 27point, and check for pending requests. They are currently implemented by 28setting a flag in the thread structure[^1], which is then explicitly tested by the 29compiler-generated code. 30 31A thread responds to suspend requests only when it is "runnable", i.e. logically 32running Java code. When it runs native code, or is blocked in a kernel call, it 33logically releases the mutator lock. When the garbage collector needs mutator 34cooperation, and the thread is not runnable, it is assured that the mutator is 35not touching Java data, and hence the collector can safely perform the required 36action itself, on the mutator thread's behalf. 37 38Normally, when a thread makes a JNI call, it is not considered runnable while 39executing native code. This makes the transitions to and from running native JNI 40code somewhat expensive (see below). But these transitions are necessary to 41ensure that such code, which does not execute "suspend points", and can thus not 42cooperate with the GC, doesn't delay GC completion. `@FastNative` and 43`@CriticalNative` calls avoid these transitions, instead allowing the thread to 44remain "runnable", at the expense of potentially delaying GC operations for the 45duration of the call. 46 47Although we say that a thread is "suspended" when it is not running Java code, 48it may in fact still be running native code and touching data structures that 49are not considered "Java data". This distinction can be a fine line. For 50example, a Java thread blocked on a Java monitor will normally be "suspended" 51and blocked on a mutex contained in the monitor data structure. But it may wake 52up for reasons beyond ARTs control, which will normally result in touching the 53mutex. The monitor code must be quite careful to ensure that this does not cause 54problems, especially if the ART runtime was shut down in the interim and the 55monitor data structure has been reclaimed. 56 57Calls to change thread state 58---------------------------- 59 60When a thread changes between running Java and native code, it has to 61correspondingly change its state between "runnable" and one of several 62other states, all of which are considered to be "suspended" for our purposes. 63When a Java thread starts to execute native code, and may thus not respond 64promptly to suspend requests, it will normally create an object of type 65`ScopedThreadSuspension`. `ScopedThreadSuspension`'s constructor changes state to 66the "suspended" state given as an argument, logically releasing the mutator lock 67and promising to no longer touch Java data structures. It also handles any 68pending suspension requests that slid in just before it changed state. 69 70Conversely, `ScopedThreadSuspension`'s destructor waits until the GC has finished 71any actions it is currently performing on the thread's behalf and effectively 72released the mutator exclusive lock, and then returns to runnable state, 73re-acquiring the mutator lock. 74 75Occasionally a thread running native code needs to temporarily again access Java 76data structures, performing the above transitions in the opposite order. 77`ScopedObjectAccess` is a similar RAII object whose constructor and destructor 78perform those transitions in the reverse order from `ScopedThreadSuspension`. 79 80Mutator lock implementation 81--------------------------- 82 83The mutator lock is not implemented as a conventional mutex. But it plays by the 84rules of our normal static thread-safety analysis. Thus a function that is 85expected to be called in runnable state, with the ability to access Java data, 86should be annotated with `REQUIRES_SHARED(Locks::mutator_lock_)`. 87 88There is an explicit `mutator_lock_` object, of type `MutatorMutex`. `MutatorMutex` is 89seemingly a minor refinement of `ReaderWriterMutex`, but it is used entirely 90differently. It is acquired explicitly by clients that need to hold it 91exclusively, and in a small number of cases, it is acquired in shared mode, e.g. 92via `SharedTryLock()`, or by the GC itself. However, more commonly 93`MutatorMutex::TransitionFromSuspendedToRunnable()`, is used to logically acquire 94the mutator mutex, e.g. as part of `ScopedObjectAccess` construction. 95 96`TransitionFromSuspendedToRunnable()` does not physically acquire the 97`ReaderWriterMutex` in shared mode. Thus any thread acquiring the lock in exclusive mode 98must, in addition, explicitly arrange for mutator threads to be suspended via the 99thread suspension mechanism, and then make them runnable again on release. 100 101Logically the mutator lock is held in shared/reader mode if ***either*** the 102underlying reader-writer lock is held in shared mode, ***or*** if a mutator is in 103runnable state. 104 105Suspension and checkpoint API 106----------------------------- 107 108Suspend point checks enable three kinds of communication with mutator threads: 109 110**Checkpoints** 111: Checkpoint requests are used to get a thread to perform an action 112on our behalf. `RequestCheckpoint()` asks a specific thread to execute the closure 113supplied as an argument at its leisure. `RequestSynchronousCheckpoint()` in 114addition waits for the thread to complete running the closure, and handles 115suspended threads by running the closure on their behalf. In addition to these 116functions provided by `Thread`, `ThreadList` provides the `RunCheckpoint()` function 117that runs a checkpoint function on behalf of each thread, either by using 118`RequestCheckpoint()` to run it inside a running thread, or by ensuring that a 119suspended thread stays suspended, and then running the function on its behalf. 120`RunCheckpoint()` does not wait for completion of the function calls triggered by 121the resulting `RequestCheckpoint()` invocations. 122 123**Empty Checkpoints** 124: ThreadList provides `RunEmptyCheckpoint()`, which waits until 125all threads have either passed a suspend point, or have been suspended. This 126ensures that no thread is still executing Java code inside the same 127suspend-point-delimited code interval it was executing before the call. For 128example, a read-barrier started before a `RunEmptyCheckpoint()` call will have 129finished before the call returns. 130 131**Thread suspension** 132: ThreadList provides a number of `SuspendThread...()` calls and 133a `SuspendAll()` call to suspend one or all threads until they are resumed by 134`Resume()` or `ResumeAll()`. The `Suspend...` calls guarantee that the target 135thread(s) are suspended (again, only in the sense of not running Java code) 136when the call returns. 137 138[^1]: Some comments in the code refer to a not-yet-really-implemented scheme in 139which the compiler-generated code would load through the address at 140`tlsPtr_.suspend_trigger`. A thread suspension is requested by setting this to 141null, triggering a `SIGSEGV`, causing that thread to check for GC cooperation 142requests. The real mechanism instead sets an appropriate `ThreadFlag` entry to 143request suspension or a checkpoint. Note that the actual checkpoint function 144value is set, along with the flag, while holding `suspend_count_lock_`. If the 145target thread notices that a checkpoint is requested, it then acquires 146the `suspend_count_lock_` to read the checkpoint function. 147