1page.title=Avoiding Priority Inversion 2@jd:body 3 4<!-- 5 Copyright 2013 The Android Open Source Project 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18--> 19<div id="qv-wrapper"> 20 <div id="qv"> 21 <h2>In this document</h2> 22 <ol id="auto-toc"> 23 </ol> 24 </div> 25</div> 26 27<p> 28This article explains how the Android's audio system attempts to avoid 29priority inversion, 30and highlights techniques that you can use too. 31</p> 32 33<p> 34These techniques may be useful to developers of high-performance 35audio apps, OEMs, and SoC providers who are implementing an audio 36HAL. Please note implementing these techniques is not 37guaranteed to prevent glitches or other failures, particularly if 38used outside of the audio context. 39Your results may vary, and you should conduct your own 40evaluation and testing. 41</p> 42 43<h2 id="background">Background</h2> 44 45<p> 46The Android AudioFlinger audio server and AudioTrack/AudioRecord 47client implementation are being re-architected to reduce latency. 48This work started in Android 4.1, and continued with further improvements 49in 4.2, 4.3, 4.4, and 5.0. 50</p> 51 52<p> 53To achieve this lower latency, many changes were needed throughout the system. One 54important change is to assign CPU resources to time-critical 55threads with a more predictable scheduling policy. Reliable scheduling 56allows the audio buffer sizes and counts to be reduced while still 57avoiding underruns and overruns. 58</p> 59 60<h2 id="priorityInversion">Priority inversion</h2> 61 62<p> 63<a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a> 64is a classic failure mode of real-time systems, 65where a higher-priority task is blocked for an unbounded time waiting 66for a lower-priority task to release a resource such as (shared 67state protected by) a 68<a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>. 69</p> 70 71<p> 72In an audio system, priority inversion typically manifests as a 73<a href="http://en.wikipedia.org/wiki/Glitch">glitch</a> 74(click, pop, dropout), 75<a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a> 76when circular buffers 77are used, or delay in responding to a command. 78</p> 79 80<p> 81In the Android audio implementation, priority inversion is most 82likely to occur in these places. And so you should focus your attention here: 83</p> 84 85<ul> 86 87<li> 88between normal mixer thread and fast mixer thread in AudioFlinger 89</li> 90 91<li> 92between application callback thread for a fast AudioTrack and 93fast mixer thread (they both have elevated priority, but slightly 94different priorities) 95</li> 96 97<li> 98between application callback thread for a fast AudioRecord and 99fast capture thread (similar to previous) 100</li> 101 102<li> 103within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation 104</li> 105 106<li> 107within the audio driver in kernel 108</li> 109 110<li> 111between AudioTrack or AudioRecord callback thread and other app threads (this is out of our control) 112</li> 113 114</ul> 115 116<h2 id="commonSolutions">Common solutions</h2> 117 118<p> 119The typical solutions include: 120</p> 121 122<ul> 123 124<li> 125disabling interrupts 126</li> 127 128<li> 129priority inheritance mutexes 130</li> 131 132</ul> 133 134<p> 135Disabling interrupts is not feasible in Linux user space, and does 136not work for Symmetric Multi-Processors (SMP). 137</p> 138 139 140<p> 141Priority inheritance 142<a href="http://en.wikipedia.org/wiki/Futex">futexes</a> 143(fast user-space mutexes) are available 144in Linux kernel, but are not currently exposed by the Android C 145runtime library 146<a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>. 147They are not used in the audio system because they are relatively heavyweight, 148and because they rely on a trusted client. 149</p> 150 151<h2 id="androidTechniques">Techniques used by Android</h2> 152 153<p> 154Experiments started with "try lock" and lock with timeout. These are 155non-blocking and bounded blocking variants of the mutex lock 156operation. Try lock and lock with timeout worked fairly well but were 157susceptible to a couple of obscure failure modes: the 158server was not guaranteed to be able to access the shared state if 159the client happened to be busy, and the cumulative timeout could 160be too long if there was a long sequence of unrelated locks that 161all timed out. 162</p> 163 164 165<p> 166We also use 167<a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a> 168such as: 169</p> 170 171<ul> 172<li>increment</li> 173<li>bitwise "or"</li> 174<li>bitwise "and"</li> 175</ul> 176 177<p> 178All of these return the previous value and include the necessary 179SMP barriers. The disadvantage is they can require unbounded retries. 180In practice, we've found that the retries are not a problem. 181</p> 182 183<p class="note"><strong>Note:</strong> Atomic operations and their interactions with memory barriers 184are notoriously badly misunderstood and used incorrectly. We include these methods 185here for completeness but recommend you also read the article 186<a href="https://developer.android.com/training/articles/smp.html"> 187SMP Primer for Android</a> 188for further information. 189</p> 190 191<p> 192We still have and use most of the above tools, and have recently 193added these techniques: 194</p> 195 196<ul> 197 198<li> 199Use non-blocking single-reader single-writer 200<a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a> 201for data. 202</li> 203 204<li> 205Try to 206<i>copy</i> 207state rather than 208<i>share</i> 209state between high- and 210low-priority modules. 211</li> 212 213<li> 214When state does need to be shared, limit the state to the 215maximum-size 216<a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a> 217that can be accessed atomically in one-bus operation 218without retries. 219</li> 220 221<li> 222For complex multi-word state, use a state queue. A state queue 223is basically just a non-blocking single-reader single-writer FIFO 224queue used for state rather than data, except the writer collapses 225adjacent pushes into a single push. 226</li> 227 228<li> 229Pay attention to 230<a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a> 231for SMP correctness. 232</li> 233 234<li> 235<a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>. 236When sharing 237<i>state</i> 238between processes, don't 239assume that the state is well-formed. For example, check that indices 240are within bounds. This verification isn't needed between threads 241in the same process, between mutual trusting processes (which 242typically have the same UID). It's also unnecessary for shared 243<i>data</i> 244such as PCM audio where a corruption is inconsequential. 245</li> 246 247</ul> 248 249<h2 id="nonBlockingAlgorithms">Non-blocking algorithms</h2> 250 251<p> 252<a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a> 253have been a subject of much recent study. 254But with the exception of single-reader single-writer FIFO queues, 255we've found them to be complex and error-prone. 256</p> 257 258<p> 259Starting in Android 4.2, you can find our non-blocking, 260single-reader/writer classes in these locations: 261</p> 262 263<ul> 264 265<li> 266frameworks/av/include/media/nbaio/ 267</li> 268 269<li> 270frameworks/av/media/libnbaio/ 271</li> 272 273<li> 274frameworks/av/services/audioflinger/StateQueue* 275</li> 276 277</ul> 278 279<p> 280These were designed specifically for AudioFlinger and are not 281general-purpose. Non-blocking algorithms are notorious for being 282difficult to debug. You can look at this code as a model. But be 283aware there may be bugs, and the classes are not guaranteed to be 284suitable for other purposes. 285</p> 286 287<p> 288For developers, some of the sample OpenSL ES application code should be updated to 289use non-blocking algorithms or reference a non-Android open source library. 290</p> 291 292<p> 293We have published an example non-blocking FIFO implementation that is specifically designed for 294application code. See these files located in the platform source directory 295<code>frameworks/av/audio_utils</code>: 296</p> 297<ul> 298 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/fifo.h">include/audio_utils/fifo.h</a> 299 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/fifo.c">fifo.c</a> 300 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/roundup.h">include/audio_utils/roundup.h</a> 301 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/roundup.c">roundup.c</a> 302</ul> 303 304<h2 id="tools">Tools</h2> 305 306<p> 307To the best of our knowledge, there are no automatic tools for 308finding priority inversion, especially before it happens. Some 309research static code analysis tools are capable of finding priority 310inversions if able to access the entire codebase. Of course, if 311arbitrary user code is involved (as it is here for the application) 312or is a large codebase (as for the Linux kernel and device drivers), 313static analysis may be impractical. The most important thing is to 314read the code very carefully and get a good grasp on the entire 315system and the interactions. Tools such as 316<a href="http://developer.android.com/tools/help/systrace.html">systrace</a> 317and 318<code>ps -t -p</code> 319are useful for seeing priority inversion after it occurs, but do 320not tell you in advance. 321</p> 322 323<h2 id="aFinalWord">A final word</h2> 324 325<p> 326After all of this discussion, don't be afraid of mutexes. Mutexes 327are your friend for ordinary use, when used and implemented correctly 328in ordinary non-time-critical use cases. But between high- and 329low-priority tasks and in time-sensitive systems mutexes are more 330likely to cause trouble. 331</p> 332 333