Types of Synchronization Busy Waiting Vs. Blocking a Simple Lock

Types of Synchronization Busy Waiting Vs. Blocking a Simple Lock

Synchronization Types of Synchronization Mutual Exclusion CS 740 • Locks November 14, 2003 Event Synchronization • Global or group-based (barriers) Based on TCM/C&S • Point-to-point Topics • Locks • Barriers • Hardware primitives –2– CS 740 F’03 Busy Waiting vs. Blocking A Simple Lock lock: ld register, location Busy-waiting is preferable when: cmp register, #0 • scheduling overhead is larger than expected wait bnz lock time st location, #1 • processor resources are not needed for other ret tasks unlock: st location, #0 • schedule-based blocking is inappropriate (e.g., in OS kernel) ret –3– CS 740 F’03 –4– CS 740 F’03 Need Atomic Primitive! Test&Set based lock Test&Set lock: t&s register, location bnz lock Swap ret Fetch&Op unlock: st location, #0 • Fetch&Incr, Fetch&Decr ret Compare&Swap –5– CS 740 F’03 –6– CS 740 F’03 T&S Lock Performance Test and Test and Set Code: lock; delay(c); unlock; A: while (lock != free) Same total no. of lock calls as p increases; measure time per transfer if (test&set(lock) == free) { 20 Test&set,c = 0 Test&set, exponential backoff, c = 3.64 18 critical section; Test&set, exponential backoff, c = 0 Ideal 16 } 14 12 else goto A; s) µ 10 Time ( 8 6 (+) spinning happens in cache 4 (-) can still generate a lot of traffic when 2 many processors go to do test&set 0 3 5 7 9 11 13 15 Number of processors –7– CS 740 F’03 –8– CS 740 F’03 Test and Set with Backoff Test and Set with Update Upon failure, delay for a while before Test and Set sends updates to retrying processors that cache the lock • either constant delay or exponential backoff Tradeoffs: Tradeoffs: (+) good for bus-based machines (+) much less network traffic (-) still lots of traffic on distributed networks (-) exponential backoff can cause starvation for Main problem with test&set-based high-contention locks schemes is that a lock release causes –new requestors back off for shorter times all waiters to try to get the lock, But exponential found to work best in using a test&set to try to get it. practice –9– CS 740 F’03 –10– CS 740 F’03 Ticket Lock (fetch&incr based) Ticket Lock Tradeoffs Two counters: (+) guaranteed FIFO order; no • next_ticket (number of requestors) starvation possible • now_serving (number of releases that have happened) (+) latency can be low if fetch&incr is Algorithm: cacheable • First do a fetch&incr on next_ticket (not (+) traffic can be quite low test&set) • When release happens, poll the value of (-) but traffic is not guaranteed to be now_serving O(1) per lock acquire –if my_ticket, then I win Use delay; but how much? –11– CS 740 F’03 –12– CS 740 F’03 LockLoop: lock; Performance delay(c); unlock; on SGIdelay(d); Challenge Array-Based Queueing Locks Array-based LL-SC LL-SC, exponential Every process spins on a unique location, Ticket Ticket, proportional rather than on a single now_serving 7 7 7 6 6 6 counter 5 5 5 s) s) s) µ µ µ 4 4 4 Time ( Time ( 3 3 Time ( 3 fetch&incr gives a process the address 2 2 2 1 1 1 0 0 0 on which to spin 13579111315 13579111315 13579111315 Number of processors Number of processors Number of processors (a) Null (c = 0, d = 0) (b) Critical-section (c = 3.64 µs, d = 0) (c) Delay (c = 3.64 µs, d = 1.29 µs) Tradeoffs: • Simple LL-SC lock does best at small p due to unfairness (+) guarantees FIFO order (like ticket lock) – Not so with delay between unlock and next lock (+) O(1) traffic with coherence caches (unlike – Need to be careful with backoff ticket lock) • Ticket lock with proportional backoff scales well, as does array lock (-) requires space per lock proportional to P • Methodologically challenging, and need to look at real workloads –13– CS 740 F’03 –14– CS 740 F’03 List-Base Queueing Locks (MCS) Implementing Fetch&Op All other good things + O(1) traffic even Load Linked/Store Conditional without coherent caches (spin locally) lock: ll reg1, location /* LL location to reg1 */ Uses compare&swap to build linked lists in software bnz reg1, lock /* check if location locked*/ Locally-allocated flag per list node to spin on sc location, reg2 /* SC reg2 into location*/ beqz reg2, lock Can work with fetch&store, but loses FIFO /* if failed, start again */ guarantee ret Tradeoffs: unlock: (+) less storage than array-based locks st location, #0 /* write 0 to location */ (+) O(1) traffic even without coherent caches ret (-) compare&swap not easy to implement –15– CS 740 F’03 –16– CS 740 F’03 Barriers A Working Centralized Barrier Consecutively entering the same barrier doesn’t work We will discuss five barriers: • Must prevent process from entering until all have left previous instance • centralized • Could use another counter, but increases latency and contention • software combining tree Sense reversal: wait for flag to take different value consecutive times • dissemination barrier BARRIER (bar_name, p) { • tournament barrier local_sense = !(local_sense); /* toggle private sense var */ LOCK(bar_name.lock); • MCS tree-based barrier mycount = bar_name.counter++; /* mycount is private */ if (bar_name.counter == p) UNLOCK(bar_name.lock); bar_name.counter = 0; /* reset for next */ bar_name.flag = local_sense; /* release waiters*/ else { UNLOCK(bar_name.lock); while (bar_name.flag != local_sense) {}; } } –17– CS 740 F’03 –18– CS 740 F’03 Centralized Barrier Software Combining Tree Barrier Contention Little contention Basic idea: • notify a single shared counter when you arrive • poll that shared location until all have arrived Simple implementation require Flat polling/spinning twice: Tree structured Writes into one tree for barrier arrival • first to ensure that all procs have left previous barrier Reads from another tree to allow procs • second to ensure that all procs have arrived at to continue current barrier Sense reversal to distinguish Solution to get one spin: sense reversal consecutive barriers –19– CS 740 F’03 –20– CS 740 F’03 Tournament Barrier Barrier Performance Centralized 35 Combining tree Tournament 30 Binary combining tree Dissemination 25 Representative processor at a node is s) µ 20 statically chosen Time ( Time 15 • no fetch&op needed 10 k In round k, proc i=2 sets a flag for proc j=i-2k 5 0 • i then drops out of tournament and j proceeds in 12345678 next round Number of processors • i waits for global flag signalling completion of • Centralized does quite well barrier to be set • Helpful hardware support: piggybacking of reads –could use combining wakeup tree misses on bus – Also for spinning on highly contended locks –21– CS 740 F’03 –22– CS 740 F’03 MCS Software Barrier Barrier Recommendations Modifies tournament barrier to allow Criteria: static allocation in wakeup tree, and to • length of critical path use sense reversal • number of network transactions Every processor is a node in two P-node • space requirements trees: • atomic operation requirements • has pointers to its parent building a fanin-4 arrival tree • has pointers to its children to build a fanout-2 wakeup tree –23– CS 740 F’03 –24– CS 740 F’03 Space Requirements Network Transactions Centralized: Centralized, combining tree: • constant • O(P) if broadcast and coherent caches; MCS, combining tree: • unbounded otherwise • O(P) Dissemination: Dissemination, Tournament: • O(PlogP) • O(PlogP) Tournament, MCS: • O(P) –25– CS 740 F’03 –26– CS 740 F’03 Critical Path Length Primitives Needed If independent parallel network paths Centralized and combining tree: available: • atomic increment • all are O(logP) except centralized, which is O(P) • atomic decrement Otherwise (e.g., shared bus): Others: • linear factors dominate • atomic read • atomic write –27– CS 740 F’03 –28– CS 740 F’03 Barrier Recommendations Without broadcast on distributed memory: • Dissemination – MCS is good, only critical path length is about 1.5X longer – MCS has somewhat better network load and space requirements Cache coherence with broadcast (e.g., a bus): • MCS with flag wakeup – centralized is best for modest numbers of processors Big advantage of centralized barrier: • adapts to changing number of processors across barrier calls –29– CS 740 F’03.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us