Synchronization S 

David Padua (Ed.) Encyclopedia of Parallel Computing With Figures and Tables 123 Editor-in-Chief David Padua University of Illinois at Urbana-Champaign Urbana, IL USA ISBN ---- e-ISBN ---- DOI ./---- Print and electronic bundle ISBN: ---- Springer New York Dordrecht Heidelberg London Library of Congress Control Number: © Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, Spring Street, New York, NY , USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Synchronization S . Duato J () A new theory of deadlock-free adaptive routing may produce incorrect results. As a trivial example, con- in wormhole networks. IEEE Trans Parallel Distrib Syst (): sider a global counter incremented by multiple threads. – Each thread loads the counter into a register, increments . Duato J () A necessary and sufficient condition for deadlock- the register, and writes the updated value back to mem- free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst ():– ory. If two threads load the same value before either . Duato J () A necessary and sufficient condition for deadlock- stores it back, updates may be lost: free routing in cut-through and store-and-forward networks. IEEE Trans Parallel Distrib Syst ():– c==0 . Peh L-S, Dally WJ () Flit reservation flow control. In: Thread 1: Thread 2: Proceedings of the th international symposium on high- r1 := c performance computer architecture, Toulouse, France, January , pp – r1:= c . Stunkel CB et al () Architecture and implementation of vul- ++r1 can. In: Proceedings of the th international parallel processing ++r1 symposium, Cancun, Mexico, pp – c:=r1 . Stunkel CB et al () The SP high-performance switch. In: Pro- c:=r1 ceedings of the scalable high performance computing conference, Knoxville, pp – c==1 . Seitz C () The cosmic cube. Commun ACM ():– Synchronization serves to preclude invalid thread interleavings. It is commonly divided into the subtasks of atomicity and condition synchronization. Atomicity Symmetric Multiprocessors ensures that a given sequence of instructions, typically performed by a single thread, appears to all other Shared-Memory Multiprocessors threads as if it had executed indivisibly – not inter- leaved with anything else. In the example above, one would typically specify that the load-increment-store instruction sequence should execute atomically. Synchronization Condition synchronization forces a thread to wait, before performing an operation on shared data, until MichaelL. Scott some desired precondition is true. In the example above, University of Rochester, Rochester, NY, USA one might want to wait until all threads had performed their increments before reading the final count. Synonyms While it is tempting to suspect that condition syn- Fences; Multiprocessor synchronization; Mutual exclu- chronization subsumes atomicity (make the precondi- sion; Process synchronization tion be that no other thread is currently executing a conflicting operation), atomicity is in fact considerably S Deﬁnition harder, because it requires consensus among all com- Synchronization is the use of language or library mech- peting threads: they must all agree as to which will anisms to constrain the ordering (interleaving) of proceed and which will wait. Put another way, condi- instructions performed by separate threads, to preclude tion synchronization delays a thread until some locally orderings that lead to incorrect or undesired results. observable condition is seen to be true; atomicity is a property of the system as a whole. Discussion Like many aspects of parallel computing, syn- In a parallel program, the instructions of any given chronization looks different in shared-memory and thread appear to occur in sequential order (at least from message-passing systems. In the latter, synchronization that thread’s point of view), but if the threads run inde- is generally subsumed in the message-passing meth- pendently, their sequences of instructions may inter- ods; in a shared-memory system, it typically employs a leave arbitrarily, and many of the possible interleavings separate set of methods. S Synchronization Shared-memory implementations of synchroniza- In either case, it returns the previous value, from tion can be categorized as busy-wait (spinning), or which one can deduce whether the replacement scheduler-based. The former actively consume processor occurred. cycles until the running thread is able to proceed. The Load-linked (l) and store-conditional (l, v).Thefirst latter deschedule the current thread, allowing the pro- of these returns the value at location l and “remem- cessor to be used by other threads, with the expectation bers” l.Thesecondstoresv to l if l has not been that future activity by one of those threads will make modified by any other processor since a previous the original thread runnable again. Because it avoids the load-linked by the current processor. cost of two context switches, busy-wait synchronization These instructions differ in their expressive power. is typically faster than scheduler-based synchronization Herlihy has shown [] that compare-and-swap (CAS) when the expected wait time is short and when the and load-linked / store-conditional (LL/SC) are univer- processor is not needed for other purposes. Scheduler- sal primitives, meaning, informally, that they can be based synchronization is typically faster when expected used to construct a non-blocking implementation of any wait times are long; it is necessary when the num- other RMW operation. The following code provides a ber of threads exceeds the number of processors (else simple implementation of fetch-and-ϕ using CAS. quantum-long delays or even deadlock can occur). In the typical implementation, busy-wait synchronization val old := *l; is built on top of whatever hardware instructions exe- loop cute atomically. Scheduler-based synchronization, in val new := phi(old); turn, is built on top of busy-wait synchronization, which val found := CAS(l, old, new); is used to protect the scheduler’s own data structures if (old == found) break; (see entries on Scheduling Algorithms and on Processes, old := found; Tasks, and Threads). If the test on line of this code fails, it must be because some other thread successfully modified *l.Thesystem Hardware Primitives as a whole has made forward progress, but the current In the earliest multiprocessors, load and store were thread must try again. the only memory-access instructions guaranteed to As discussed in the entry on Non-blocking Algo- be atomic, and busy-wait synchronization was imple- rithms, this simple implementation is lock-free but not mented using these. Modern machines provide a vari- wait-free. There are stronger (but slower and more ety of atomic read-modify-write (RMW) instructions, complex) non-blocking implementations in which each which serve to update amemorylocationatom- thread is guaranteed to make forward progress in a ically. These significantly simplify the implementa- bounded number of its own instructions. tion of synchronization. Common RMW instructions include: NB: In any distributed system, and in most modern Test-and-set (l) sets the Boolean variable at location l shared memory systems, instructions executed by a to true,andreturnsthepreviousvalue. given thread are not, in general, guaranteed to be seen Swap (l, v) stores the value v to location l and returns in sequential order by other threads, and instructions the previous value. of any two threads are not, in general, guaranteed to be Atomic-ϕ (l, v) replaces the value o at location l with seen in the same order by all of their peers. Modern ϕ(o, v) for some simple arithmetic function ϕ (add, processors typically provide so-called fence or barrier sub, and, etc.). instructions (not to be confused with the barriers dis- Fetch-and-ϕ (l, v) is like atomic-ϕ,butalsoreturnsthe cussed under Condition Synchronization below) that previous value. force previous instructions of the current thread to be Compare-and-swap (l, o, n) inspects the value v at seen by other threads before subsequent instructions of location l, and if it is equal to o, replaces it with n. the current thread. Implementations of synchronization Synchronization S methods typically include sufficient fences that if syn- before any releases) ensures that the global set of critical chronization method s in thread t occurs before syn- section executions remains serializable. chronization method s in thread t,thenallinstruc- tions that precede s in t will appear in t to have Relaxations of Mutual Exclusion occurred before any of its own instructions that fol- So-called reader–writer locks increase concurrency by low s. For more information, see the entry on Memory observing that it is safe for more than one thread to read Models. The remainder of the discussion here assumes a location concurrently,

Load more