1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> 4<html> 5<head> 6 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 7 <title><atomic> design</title> 8 <link type="text/css" rel="stylesheet" href="menu.css"> 9 <link type="text/css" rel="stylesheet" href="content.css"> 10</head> 11 12<body> 13<div id="menu"> 14 <div> 15 <a href="https://llvm.org/">LLVM Home</a> 16 </div> 17 18 <div class="submenu"> 19 <label>libc++ Info</label> 20 <a href="/index.html">About</a> 21 </div> 22 23 <div class="submenu"> 24 <label>Quick Links</label> 25 <a href="https://lists.llvm.org/mailman/listinfo/cfe-dev">cfe-dev</a> 26 <a href="https://lists.llvm.org/mailman/listinfo/cfe-commits">cfe-commits</a> 27 <a href="https://bugs.llvm.org/">Bug Reports</a> 28 <a href="https://github.com/llvm/llvm-project/tree/master/libcxx/">Browse Sources</a> 29 </div> 30</div> 31 32<div id="content"> 33 <!--*********************************************************************--> 34 <h1><atomic> design</h1> 35 <!--*********************************************************************--> 36 37<p> 38The <tt><atomic></tt> header is one of the most closely coupled headers to 39the compiler. Ideally when you invoke any function from 40<tt><atomic></tt>, it should result in highly optimized assembly being 41inserted directly into your application ... assembly that is not otherwise 42representable by higher level C or C++ expressions. The design of the libc++ 43<tt><atomic></tt> header started with this goal in mind. A secondary, but 44still very important goal is that the compiler should have to do minimal work to 45facilitate the implementation of <tt><atomic></tt>. Without this second 46goal, then practically speaking, the libc++ <tt><atomic></tt> header would 47be doomed to be a barely supported, second class citizen on almost every 48platform. 49</p> 50 51<p>Goals:</p> 52 53<blockquote><ul> 54<li>Optimal code generation for atomic operations</li> 55<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li> 56<li>Conformance to the C++0X draft standard</li> 57</ul></blockquote> 58 59<p> 60The purpose of this document is to inform compiler writers what they need to do 61to enable a high performance libc++ <tt><atomic></tt> with minimal effort. 62</p> 63 64<h2>The minimal work that must be done for a conforming <tt><atomic></tt></h2> 65 66<p> 67The only "atomic" operations that must actually be lock free in 68<tt><atomic></tt> are represented by the following compiler intrinsics: 69</p> 70 71<blockquote><pre> 72__atomic_flag__ 73__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) 74{ 75 unique_lock<mutex> _(some_mutex); 76 __atomic_flag__ result = *obj; 77 *obj = desr; 78 return result; 79} 80 81void 82__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) 83{ 84 unique_lock<mutex> _(some_mutex); 85 *obj = desr; 86} 87</pre></blockquote> 88 89<p> 90Where: 91</p> 92 93<blockquote><ul> 94<li> 95If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then 96the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to 97<tt>int</tt>). 98</li> 99<li> 100If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then 101the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>. 102</li> 103<li> 104<p> 105To communicate that the above intrinsics are available, the compiler must 106arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name 107appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>. 108</p> 109<p> 110For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>: 111</p> 112<blockquote><pre> 113__has_feature(__atomic_flag) == 1 114__has_feature(__atomic_exchange_seq_cst_j) == 1 115__has_feature(__atomic_store_seq_cst_j) == 1 116 117typedef unsigned int __atomic_flag__; 118 119unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) 120{ 121 // ... 122} 123 124void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) 125{ 126 // ... 127} 128</pre></blockquote> 129</li> 130</ul></blockquote> 131 132<p> 133That's it! Compiler writers do the above and you've got a fully conforming 134(though sub-par performance) <tt><atomic></tt> header! 135</p> 136 137<h2>Recommended work for a higher performance <tt><atomic></tt></h2> 138 139<p> 140It would be good if the above intrinsics worked with all integral types plus 141<tt>void*</tt>. Because this may not be possible to do in a lock-free manner for 142all integral types on all platforms, a compiler must communicate each type that 143an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works 144for all types except for <tt>long long</tt> and <tt>unsigned long long</tt> 145then: 146</p> 147 148<blockquote><pre> 149__has_feature(__atomic_exchange_seq_cst_b) == 1 // bool 150__has_feature(__atomic_exchange_seq_cst_c) == 1 // char 151__has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char 152__has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char 153__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t 154__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t 155__has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t 156__has_feature(__atomic_exchange_seq_cst_s) == 1 // short 157__has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short 158__has_feature(__atomic_exchange_seq_cst_i) == 1 // int 159__has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int 160__has_feature(__atomic_exchange_seq_cst_l) == 1 // long 161__has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long 162__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void* 163</pre></blockquote> 164 165<p> 166Note that only the <tt>__has_feature</tt> flag is decorated with the argument 167type. The name of the compiler intrinsic is not decorated, but instead works 168like a C++ overloaded function. 169</p> 170 171<p> 172Additionally there are other intrinsics besides 173<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They 174are optional. But if the compiler can generate faster code than provided by the 175library, then clients will benefit from the compiler writer's expertise and 176knowledge of the targeted platform. 177</p> 178 179<p> 180Below is the complete list of <i>sequentially consistent</i> intrinsics, and 181their library implementations. Template syntax is used to indicate the desired 182overloading for integral and void* types. The template does not represent a 183requirement that the intrinsic operate on <em>any</em> type! 184</p> 185 186<blockquote><pre> 187T is one of: bool, char, signed char, unsigned char, short, unsigned short, 188 int, unsigned int, long, unsigned long, 189 long long, unsigned long long, char16_t, char32_t, wchar_t, void* 190 191template <class T> 192T 193__atomic_load_seq_cst(T const volatile* obj) 194{ 195 unique_lock<mutex> _(some_mutex); 196 return *obj; 197} 198 199template <class T> 200void 201__atomic_store_seq_cst(T volatile* obj, T desr) 202{ 203 unique_lock<mutex> _(some_mutex); 204 *obj = desr; 205} 206 207template <class T> 208T 209__atomic_exchange_seq_cst(T volatile* obj, T desr) 210{ 211 unique_lock<mutex> _(some_mutex); 212 T r = *obj; 213 *obj = desr; 214 return r; 215} 216 217template <class T> 218bool 219__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) 220{ 221 unique_lock<mutex> _(some_mutex); 222 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) 223 { 224 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 225 return true; 226 } 227 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 228 return false; 229} 230 231template <class T> 232bool 233__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) 234{ 235 unique_lock<mutex> _(some_mutex); 236 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) 237 { 238 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 239 return true; 240 } 241 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 242 return false; 243} 244 245T is one of: char, signed char, unsigned char, short, unsigned short, 246 int, unsigned int, long, unsigned long, 247 long long, unsigned long long, char16_t, char32_t, wchar_t 248 249template <class T> 250T 251__atomic_fetch_add_seq_cst(T volatile* obj, T operand) 252{ 253 unique_lock<mutex> _(some_mutex); 254 T r = *obj; 255 *obj += operand; 256 return r; 257} 258 259template <class T> 260T 261__atomic_fetch_sub_seq_cst(T volatile* obj, T operand) 262{ 263 unique_lock<mutex> _(some_mutex); 264 T r = *obj; 265 *obj -= operand; 266 return r; 267} 268 269template <class T> 270T 271__atomic_fetch_and_seq_cst(T volatile* obj, T operand) 272{ 273 unique_lock<mutex> _(some_mutex); 274 T r = *obj; 275 *obj &= operand; 276 return r; 277} 278 279template <class T> 280T 281__atomic_fetch_or_seq_cst(T volatile* obj, T operand) 282{ 283 unique_lock<mutex> _(some_mutex); 284 T r = *obj; 285 *obj |= operand; 286 return r; 287} 288 289template <class T> 290T 291__atomic_fetch_xor_seq_cst(T volatile* obj, T operand) 292{ 293 unique_lock<mutex> _(some_mutex); 294 T r = *obj; 295 *obj ^= operand; 296 return r; 297} 298 299void* 300__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) 301{ 302 unique_lock<mutex> _(some_mutex); 303 void* r = *obj; 304 (char*&)(*obj) += operand; 305 return r; 306} 307 308void* 309__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) 310{ 311 unique_lock<mutex> _(some_mutex); 312 void* r = *obj; 313 (char*&)(*obj) -= operand; 314 return r; 315} 316 317void __atomic_thread_fence_seq_cst() 318{ 319 unique_lock<mutex> _(some_mutex); 320} 321 322void __atomic_signal_fence_seq_cst() 323{ 324 unique_lock<mutex> _(some_mutex); 325} 326</pre></blockquote> 327 328<p> 329One should consult the (currently draft) 330<a href="https://wg21.link/n3126">C++ standard</a> 331for the details of the definitions for these operations. For example 332<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail 333spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is 334not. 335</p> 336 337<p> 338If on your platform the lock-free definition of 339<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as 340<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the 341<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a 342performance cost. The library will prefer your implementation of 343<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own 344definition for implementing 345<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library 346will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call 347<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an 348intrinsic for the strong version but not the weak. 349</p> 350 351<h2>Taking advantage of weaker memory synchronization</h2> 352 353<p> 354So far all of the intrinsics presented require a <em>sequentially 355consistent</em> memory ordering. That is, no loads or stores can move across 356the operation (just as if the library had locked that internal mutex). But 357<tt><atomic></tt> supports weaker memory ordering operations. In all, 358there are six memory orderings (listed here from strongest to weakest): 359</p> 360 361<blockquote><pre> 362memory_order_seq_cst 363memory_order_acq_rel 364memory_order_release 365memory_order_acquire 366memory_order_consume 367memory_order_relaxed 368</pre></blockquote> 369 370<p> 371(See the 372<a href="https://wg21.link/n3126">C++ standard</a> 373for the detailed definitions of each of these orderings). 374</p> 375 376<p> 377On some platforms, the compiler vendor can offer some or even all of the above 378intrinsics at one or more weaker levels of memory synchronization. This might 379lead for example to not issuing an <tt>mfence</tt> instruction on the x86. 380</p> 381 382<p> 383If the compiler does not offer any given operation, at any given memory ordering 384level, the library will automatically attempt to call the next highest memory 385ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't 386exist, then the library takes over and does the job with a <tt>mutex</tt>. This 387is a compile-time search & selection operation. At run time, the 388application will only see the few inlined assembly instructions for the selected 389intrinsic. 390</p> 391 392<p> 393Each intrinsic is appended with the 7-letter name of the memory ordering it 394addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is 395defined by: 396</p> 397 398<blockquote><pre> 399T __atomic_load_relaxed(const volatile T* obj); 400</pre></blockquote> 401 402<p> 403And announced with: 404</p> 405 406<blockquote><pre> 407__has_feature(__atomic_load_relaxed_b) == 1 // bool 408__has_feature(__atomic_load_relaxed_c) == 1 // char 409__has_feature(__atomic_load_relaxed_a) == 1 // signed char 410... 411</pre></blockquote> 412 413<p> 414The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized 415on two memory orderings. The first ordering applies when the operation returns 416<tt>true</tt> and the second ordering applies when the operation returns 417<tt>false</tt>. 418</p> 419 420<p> 421Not every memory ordering is appropriate for every operation. <tt>exchange</tt> 422and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt> 423only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>. 424<tt>store</tt> 425only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The 426<tt>compare_exchange</tt> operations support the following 16 combinations out 427of the possible 36: 428</p> 429 430<blockquote><pre> 431relaxed_relaxed 432consume_relaxed 433consume_consume 434acquire_relaxed 435acquire_consume 436acquire_acquire 437release_relaxed 438release_consume 439release_acquire 440acq_rel_relaxed 441acq_rel_consume 442acq_rel_acquire 443seq_cst_relaxed 444seq_cst_consume 445seq_cst_acquire 446seq_cst_seq_cst 447</pre></blockquote> 448 449<p> 450Again, the compiler supplies intrinsics only for the strongest orderings where 451it can make a difference. The library takes care of calling the weakest 452supplied intrinsic that is as strong or stronger than the customer asked for. 453</p> 454 455</div> 456</body> 457</html> 458