Lines Matching refs:we
170 location A into reg1. (Note that we’re writing in one order and reading in
204 <p>To get into a situation where we see B=5 before we see the store to A, either
247 different order by thread 2. If that happened, we could actually appear to
384 <p>With this in mind, we’re ready to talk about ARM.</p>
405 reads B repeatedly, looping until we read 1 from B. The idea here is that
467 the other side, assume we read A earlier, or it lives on the same cache line as
468 something else we recently read. Core 2 spins until it sees the update to B,
492 guarantees about the ordering of loads in thread 1, but we don’t have any of
497 program order, why do we need the load/load barrier in thread 2? Because we
507 matter, so long as the appropriate guarantees are kept. If we use a barrier in
564 <p>A thornier question is: do we need a barrier in thread 2? If the CPU doesn’t
637 the loop, but on ARM we can also just do this:</p>
656 <p>What we’ve done here is change the assignment of reg1 from a constant (8) to
657 a value that depends on what we loaded from B. In this case, we do a bitwise
685 <p>While we’re hip-deep, it’s worth noting that ARM does not provide <em>causal
756 1. If we used an atomic increment operation, you would be guaranteed that the
838 lock</em>. The idea is that a memory address (which we’ll call “lock”)
841 back to zero when done. If another thread has already set the lock to 1, we sit
844 <p>To make this work we use an atomic RMW primitive called
847 in memory matches what we expect, it is replaced with the new value, and the old
848 value is returned. If the current value is not what we expect, we don’t change
852 simpler for examples, so we use it and just refer to it as “CAS”.</p>
866 will set it to 1 to indicate that we now have it. If another thread has it, the
874 <p>On SMP, a spin lock is a useful way to guard a small critical section. If we
876 release the lock, we can just burn a few cycles while we wait our turn.
877 However, if the other thread happens to be executing on the same core, we’re
883 uniprocessor you never want to spin at all. For the sake of brevity we’re
896 reorder code around the barrier. That way, we know that the
902 <p>Of course, we also want to make sure that none of the memory accesses
917 <strong>before</strong> we release the lock, so that loads and stores in the
921 assignment on ARM and x86. Unlike the atomic RMW operations, we don’t guarantee
923 though, because we only need to keep the other threads <strong>out</strong>. The
926 longer, but we will still execute code correctly.</p>
935 <p>When acquiring the spinlock, we issue the atomic CAS and then the barrier.
936 When releasing the spinlock, we issue the barrier and then the atomic store.
953 for doing this lies in a couple of optimizations we can now perform.</p>
957 critical section above it. In other words, we need a load/store and store/store
958 barrier. In an earlier section we learned that these aren’t necessary on x86
968 <p>Suppose we have a mix of locally-visible and globally-visible memory
984 <p>Here we see two completely independent sets of operations. The first set
985 operates on a thread-local data structure, so we’re not concerned about clashes
1010 time than one of the loads, we essentially get it for free, since it happens
1016 outside the current thread, nothing else can see them until we’re finished here,
1025 <p>Returning to an earlier point, we can state that on x86 all loads are
1056 <p>Here we present some examples of incorrect code, along with simple ways to
1057 fix them. Before we do that, we need to discuss the use of a basic language
1081 pthread mutex) rather than an atomic operation, but we will employ the latter to
1084 <p>For the sake of brevity we’re ignoring the effects of compiler optimizations
1088 solve both compiler-reordering and memory-access-ordering issues, but we’re only
1111 <p>The idea here is that we allocate a structure, initialize its fields, and at
1112 the very end we “publish” it by storing it in a global variable. At that point,
1115 erroneous assumption that the compiler outputs code exactly as we have it in the
1127 order, but what about reads? In this case we should be okay on ARM, because the
1137 <p>Now we know the ordering will be correct. This may seem like an awkward way
1193 operation, we know that it will work correctly. If the reference count goes to
1194 zero, we recycle the storage.</p>
1197 <code>sharedThing</code> and then releases its copy. However, because we didn’t
1207 <code>sharedThing</code> are observed before we recycle the object.</p>
1221 <p>We haven’t discussed some relevant Java language features, so we’ll take a
1285 threads, and we want to be sure that every thread sees the current count when
1296 updates could be lost. To make the increment atomic, we need to declare
1301 results from <code>get()</code>, because we’re reading the value with an ordinary load. We
1305 <p>Unfortunately, we’ve introduced the possibility of lock contention, which
1307 synchronized, we could declare <code>mValue</code> with “volatile”. (Note
1308 <code>incr()</code> must still use <code>synchronize</code>.) Now we know that
1370 <p>The idea is that we want to have a single instance of a <code>Helper</code>
1372 it once, so we create and return it through a dedicated <code>getHelper()</code>
1373 function. To avoid a race in which two threads create the instance, we need to
1374 synchronize the object creation. However, we don’t want to pay the overhead for
1375 the “synchronized” block on every call, so we only do that part if
1416 <li>Do the simple thing and delete the outer check. This ensures that we never
1615 predictable way, but we can assert a relationship between the array contents.
1655 thread 1 was on iteration 1536. If we look one step in the future, at thread
1656 1’s iteration 1537, we expect to see that thread 1 saw that thread 2 was at
1660 <p>Now suppose we fail to observe a volatile write to <code>b</code>:</p>
1691 <p>Now, <code>BB[1537]</code> holds 165, a smaller value than we expected, so we
1692 know we have a problem. Put succinctly, for i=166, BB[AA[i]+1] < i. (This also
1693 catches failures by thread 2 to observe writes to <code>a</code>, for example if we
1694 miss an update and assign <code>AA[166] = 1535</code>, we will get
1719 <p>Consider once again volatile accesses in Java. Earlier we made reference to
1756 must be observable in program order by all threads. Thus, we will never see
1774 <p>As we saw in an earlier section, we need to insert a store/load barrier
1793 to a releasing store, but we’ve omitted load/store from the pre-store barrier,
1796 <p>What we’re really trying to guarantee, though, is that (using thread 1 as an
1805 barrier, so for ARM we must use explicit barriers.</p>