1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> 4<html> 5<head> 6 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 7 <title><atomic> design</title> 8 <link type="text/css" rel="stylesheet" href="menu.css"> 9 <link type="text/css" rel="stylesheet" href="content.css"> 10</head> 11 12<body> 13<div id="menu"> 14 <div> 15 <a href="http://llvm.org/">LLVM Home</a> 16 </div> 17 18 <div class="submenu"> 19 <label>libc++ Info</label> 20 <a href="/index.html">About</a> 21 </div> 22 23 <div class="submenu"> 24 <label>Quick Links</label> 25 <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">cfe-dev</a> 26 <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">cfe-commits</a> 27 <a href="http://llvm.org/bugs/">Bug Reports</a> 28 <a href="http://llvm.org/svn/llvm-project/libcxx/trunk/">Browse SVN</a> 29 <a href="http://llvm.org/viewvc/llvm-project/libcxx/trunk/">Browse ViewVC</a> 30 </div> 31</div> 32 33<div id="content"> 34 <!--*********************************************************************--> 35 <h1><atomic> design</h1> 36 <!--*********************************************************************--> 37 38<p> 39The <tt><atomic></tt> header is one of the most closely coupled headers to 40the compiler. Ideally when you invoke any function from 41<tt><atomic></tt>, it should result in highly optimized assembly being 42inserted directly into your application ... assembly that is not otherwise 43representable by higher level C or C++ expressions. The design of the libc++ 44<tt><atomic></tt> header started with this goal in mind. A secondary, but 45still very important goal is that the compiler should have to do minimal work to 46facilitate the implementation of <tt><atomic></tt>. Without this second 47goal, then practically speaking, the libc++ <tt><atomic></tt> header would 48be doomed to be a barely supported, second class citizen on almost every 49platform. 50</p> 51 52<p>Goals:</p> 53 54<blockquote><ul> 55<li>Optimal code generation for atomic operations</li> 56<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li> 57<li>Conformance to the C++0X draft standard</li> 58</ul></blockquote> 59 60<p> 61The purpose of this document is to inform compiler writers what they need to do 62to enable a high performance libc++ <tt><atomic></tt> with minimal effort. 63</p> 64 65<h2>The minimal work that must be done for a conforming <tt><atomic></tt></h2> 66 67<p> 68The only "atomic" operations that must actually be lock free in 69<tt><atomic></tt> are represented by the following compiler intrinsics: 70</p> 71 72<blockquote><pre> 73__atomic_flag__ 74__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) 75{ 76 unique_lock<mutex> _(some_mutex); 77 __atomic_flag__ result = *obj; 78 *obj = desr; 79 return result; 80} 81 82void 83__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) 84{ 85 unique_lock<mutex> _(some_mutex); 86 *obj = desr; 87} 88</pre></blockquote> 89 90<p> 91Where: 92</p> 93 94<blockquote><ul> 95<li> 96If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then 97the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to 98<tt>int</tt>). 99</li> 100<li> 101If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then 102the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>. 103</li> 104<li> 105<p> 106To communicate that the above intrinsics are available, the compiler must 107arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name 108appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>. 109</p> 110<p> 111For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>: 112</p> 113<blockquote><pre> 114__has_feature(__atomic_flag) == 1 115__has_feature(__atomic_exchange_seq_cst_j) == 1 116__has_feature(__atomic_store_seq_cst_j) == 1 117 118typedef unsigned int __atomic_flag__; 119 120unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) 121{ 122 // ... 123} 124 125void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) 126{ 127 // ... 128} 129</pre></blockquote> 130</li> 131</ul></blockquote> 132 133<p> 134That's it! Compiler writers do the above and you've got a fully conforming 135(though sub-par performance) <tt><atomic></tt> header! 136</p> 137 138<h2>Recommended work for a higher performance <tt><atomic></tt></h2> 139 140<p> 141It would be good if the above intrinsics worked with all integral types plus 142<tt>void*</tt>. Because this may not be possible to do in a lock-free manner for 143all integral types on all platforms, a compiler must communicate each type that 144an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works 145for all types except for <tt>long long</tt> and <tt>unsigned long long</tt> 146then: 147</p> 148 149<blockquote><pre> 150__has_feature(__atomic_exchange_seq_cst_b) == 1 // bool 151__has_feature(__atomic_exchange_seq_cst_c) == 1 // char 152__has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char 153__has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char 154__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t 155__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t 156__has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t 157__has_feature(__atomic_exchange_seq_cst_s) == 1 // short 158__has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short 159__has_feature(__atomic_exchange_seq_cst_i) == 1 // int 160__has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int 161__has_feature(__atomic_exchange_seq_cst_l) == 1 // long 162__has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long 163__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void* 164</pre></blockquote> 165 166<p> 167Note that only the <tt>__has_feature</tt> flag is decorated with the argument 168type. The name of the compiler intrinsic is not decorated, but instead works 169like a C++ overloaded function. 170</p> 171 172<p> 173Additionally there are other intrinsics besides 174<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They 175are optional. But if the compiler can generate faster code than provided by the 176library, then clients will benefit from the compiler writer's expertise and 177knowledge of the targeted platform. 178</p> 179 180<p> 181Below is the complete list of <i>sequentially consistent</i> intrinsics, and 182their library implementations. Template syntax is used to indicate the desired 183overloading for integral and void* types. The template does not represent a 184requirement that the intrinsic operate on <em>any</em> type! 185</p> 186 187<blockquote><pre> 188T is one of: bool, char, signed char, unsigned char, short, unsigned short, 189 int, unsigned int, long, unsigned long, 190 long long, unsigned long long, char16_t, char32_t, wchar_t, void* 191 192template <class T> 193T 194__atomic_load_seq_cst(T const volatile* obj) 195{ 196 unique_lock<mutex> _(some_mutex); 197 return *obj; 198} 199 200template <class T> 201void 202__atomic_store_seq_cst(T volatile* obj, T desr) 203{ 204 unique_lock<mutex> _(some_mutex); 205 *obj = desr; 206} 207 208template <class T> 209T 210__atomic_exchange_seq_cst(T volatile* obj, T desr) 211{ 212 unique_lock<mutex> _(some_mutex); 213 T r = *obj; 214 *obj = desr; 215 return r; 216} 217 218template <class T> 219bool 220__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) 221{ 222 unique_lock<mutex> _(some_mutex); 223 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) 224 { 225 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 226 return true; 227 } 228 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 229 return false; 230} 231 232template <class T> 233bool 234__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) 235{ 236 unique_lock<mutex> _(some_mutex); 237 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) 238 { 239 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); 240 return true; 241 } 242 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); 243 return false; 244} 245 246T is one of: char, signed char, unsigned char, short, unsigned short, 247 int, unsigned int, long, unsigned long, 248 long long, unsigned long long, char16_t, char32_t, wchar_t 249 250template <class T> 251T 252__atomic_fetch_add_seq_cst(T volatile* obj, T operand) 253{ 254 unique_lock<mutex> _(some_mutex); 255 T r = *obj; 256 *obj += operand; 257 return r; 258} 259 260template <class T> 261T 262__atomic_fetch_sub_seq_cst(T volatile* obj, T operand) 263{ 264 unique_lock<mutex> _(some_mutex); 265 T r = *obj; 266 *obj -= operand; 267 return r; 268} 269 270template <class T> 271T 272__atomic_fetch_and_seq_cst(T volatile* obj, T operand) 273{ 274 unique_lock<mutex> _(some_mutex); 275 T r = *obj; 276 *obj &= operand; 277 return r; 278} 279 280template <class T> 281T 282__atomic_fetch_or_seq_cst(T volatile* obj, T operand) 283{ 284 unique_lock<mutex> _(some_mutex); 285 T r = *obj; 286 *obj |= operand; 287 return r; 288} 289 290template <class T> 291T 292__atomic_fetch_xor_seq_cst(T volatile* obj, T operand) 293{ 294 unique_lock<mutex> _(some_mutex); 295 T r = *obj; 296 *obj ^= operand; 297 return r; 298} 299 300void* 301__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) 302{ 303 unique_lock<mutex> _(some_mutex); 304 void* r = *obj; 305 (char*&)(*obj) += operand; 306 return r; 307} 308 309void* 310__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) 311{ 312 unique_lock<mutex> _(some_mutex); 313 void* r = *obj; 314 (char*&)(*obj) -= operand; 315 return r; 316} 317 318void __atomic_thread_fence_seq_cst() 319{ 320 unique_lock<mutex> _(some_mutex); 321} 322 323void __atomic_signal_fence_seq_cst() 324{ 325 unique_lock<mutex> _(some_mutex); 326} 327</pre></blockquote> 328 329<p> 330One should consult the (currently draft) 331<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a> 332for the details of the definitions for these operations. For example 333<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail 334spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is 335not. 336</p> 337 338<p> 339If on your platform the lock-free definition of 340<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as 341<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the 342<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a 343performance cost. The library will prefer your implementation of 344<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own 345definition for implementing 346<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library 347will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call 348<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an 349intrinsic for the strong version but not the weak. 350</p> 351 352<h2>Taking advantage of weaker memory synchronization</h2> 353 354<p> 355So far all of the intrinsics presented require a <em>sequentially 356consistent</em> memory ordering. That is, no loads or stores can move across 357the operation (just as if the library had locked that internal mutex). But 358<tt><atomic></tt> supports weaker memory ordering operations. In all, 359there are six memory orderings (listed here from strongest to weakest): 360</p> 361 362<blockquote><pre> 363memory_order_seq_cst 364memory_order_acq_rel 365memory_order_release 366memory_order_acquire 367memory_order_consume 368memory_order_relaxed 369</pre></blockquote> 370 371<p> 372(See the 373<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a> 374for the detailed definitions of each of these orderings). 375</p> 376 377<p> 378On some platforms, the compiler vendor can offer some or even all of the above 379intrinsics at one or more weaker levels of memory synchronization. This might 380lead for example to not issuing an <tt>mfence</tt> instruction on the x86. 381</p> 382 383<p> 384If the compiler does not offer any given operation, at any given memory ordering 385level, the library will automatically attempt to call the next highest memory 386ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't 387exist, then the library takes over and does the job with a <tt>mutex</tt>. This 388is a compile-time search & selection operation. At run time, the 389application will only see the few inlined assembly instructions for the selected 390intrinsic. 391</p> 392 393<p> 394Each intrinsic is appended with the 7-letter name of the memory ordering it 395addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is 396defined by: 397</p> 398 399<blockquote><pre> 400T __atomic_load_relaxed(const volatile T* obj); 401</pre></blockquote> 402 403<p> 404And announced with: 405</p> 406 407<blockquote><pre> 408__has_feature(__atomic_load_relaxed_b) == 1 // bool 409__has_feature(__atomic_load_relaxed_c) == 1 // char 410__has_feature(__atomic_load_relaxed_a) == 1 // signed char 411... 412</pre></blockquote> 413 414<p> 415The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized 416on two memory orderings. The first ordering applies when the operation returns 417<tt>true</tt> and the second ordering applies when the operation returns 418<tt>false</tt>. 419</p> 420 421<p> 422Not every memory ordering is appropriate for every operation. <tt>exchange</tt> 423and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt> 424only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>. 425<tt>store</tt> 426only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The 427<tt>compare_exchange</tt> operations support the following 16 combinations out 428of the possible 36: 429</p> 430 431<blockquote><pre> 432relaxed_relaxed 433consume_relaxed 434consume_consume 435acquire_relaxed 436acquire_consume 437acquire_acquire 438release_relaxed 439release_consume 440release_acquire 441acq_rel_relaxed 442acq_rel_consume 443acq_rel_acquire 444seq_cst_relaxed 445seq_cst_consume 446seq_cst_acquire 447seq_cst_seq_cst 448</pre></blockquote> 449 450<p> 451Again, the compiler supplies intrinsics only for the strongest orderings where 452it can make a difference. The library takes care of calling the weakest 453supplied intrinsic that is as strong or stronger than the customer asked for. 454</p> 455 456</div> 457</body> 458</html> 459