Cooperative Concurrency for a Multicore World

Cooperative Concurrency for a Multicore World Cormac Flanagan Stephen Freund Jaeheon Yi, Caitlin Sadowski Williams College UCSC Cooperative Concurrency for a Multicore World Cormac Flanagan Stephen Freund Jaeheon Yi, Caitlin Sadowski Williams College UCSC Multiple Threads Single Thread x++ is a non-atomic read-modify-write x++ x = 0; x = 0; thread interference? thread interference? while (x < len) { while (x < len) { thread interference? thread interference? tmp = a[x]; tmp = a[x]; thread interference? thread interference? b[x] = tmp; b[x] = tmp; thread interference? thread interference? x++; x++; thread interference? thread interference? } } Controlling Thread Interference #1: Manually x = 0; manually identify where x = 0; thread interference? thread interference? while (x < len) { thread interference while (x < len) { thread interference? does not occur thread interference? tmp = a[x]; tmp = a[x]; thread interference? thread interference? b[x] = tmp; b[x] = tmp; thread interference? thread interference? x++; x++; thread interference? thread interference? } } Programmer Productivity Heuristic: assume no interference, use sequential reasoning Controlling Thread Interference #2: Race Freedom • Race condition: two concurrent Thread 1 Thread 2 unsynchronized accesses, at Thread 1 Thread 2 least one write t2 = bal acquire(m) t1 = bal t1 = bal • Strongly correlated with defects bal = t2 - 10 release(m) bal = t1 + 10 • Race-free programs exhibit acquire(m) sequentially consistent bal = 0 behavior, even when run on a release(m) relaxed memory model acquire(m) bal = t1 + 10 • Race freedom by itself is not sufficient to prevent release(m) concurrency bugs Controlling Thread Interference #3: Atomicity • A method is atomic if it behaves as if it executes serially, without interleaved operations of other thread atomic copy(...) { void busyloop(...) { x = 0; acquire(m); thread interference? thread interference? while (x < len) { while (!test()) { thread interference? thread interference? tmp = a[x]; release(m); thread interference? thread interference? b[x] = tmp; acquire(m); thread interference? bimodal semantics thread interference? x++; increment or x++; thread interference? read-modify-write thread interference? } } } } sequential reasoning ok 10% of methods non-atomic 90% of methods atomic local atomic blocks awkward full complexity of threading Review of Cooperative Multitasking ... • Cooperative scheduler performs context switches only ... at yield statements ... ... yield • Clean semantics ... • Sequential reasoning valid by default ... yield ... • ... except where yields highlight thread interference yield ... ... • Limitation: Uses only a single processor yield ... ... yield Cooperative Concurrency Code with sync & yields ... acquire(m) x++ Cooperative scheduler release(m) Preemptive scheduler seq. reasoning ok yield // interference full performance except where yields ... no overhead highlight interference acquire(m) x=0 acquire(m) release(m) x=0 yield release(m) yield ... ... yield yield barrier(b) yield barrier(b) acquire(m) yield x++ release(m) acquire(m) yield x++ release(m) ≅ yield Yields mark all thread interference Cooperative Coop/preemptive Preemptive correctness ^ equivalence ) correctness Benefits of Yield over Atomic • Atomic methods are exactly those with no yields atomic copy(...) { void busyloop(...) { x = 0; acquire(m); thread interference? thread interference? while (x < len) { while (!test()) { thread interference? thread interference? tmp = a[x]; release(m); thread interference? thread yield; interference? b[x] = tmp; acquire(m); thread interference? x++ always thread interference? x++; an increment x++; thread interference? operation thread interference? } } } } atomic is an interface-level spec yield is a code-level spec (method contains no yields) Multiple Threads Single Thread x++ is a non-atomic read-modify-write x++ x = 0; x = 0; thread interference? thread interference? while (x < len) { while (x < len) { thread interference? thread interference? tmp = a[x]; tmp = a[x]; thread interference? thread interference? b[x] = tmp; b[x] = tmp; thread interference? thread interference? x++; x++; thread interference? thread interference? } } Cooperative Concurrency Single Thread x++ is an increment x++ { int t=x; yield; x=t+1; } x = 0; thread x = interference? 0; x = 0; thread interference? thread interference? while (x < len) { while (x < len) { thread yield;interference? thread interference? tmp = a[x]; tmp = a[x]; threadt yield;interference? thread interference? b[x] = tmp; b[x] = tmp; thread interference? thread interference? x++; x++; thread interference? thread interference? } } Cooperability in the design space non-interference specification atomic yield traditional synchronization atomicity cooperability + analysis (this talk) policy automatic new runtime transactional mutual memory systems exclusion Transactional Memory, Larus & Rajwar, 2007 Automatic mutual exclusion, Isard & Birrell, HOTOS ’07 Cooperative Concurrency Code with sync & yields ... acquire(m) x++ Cooperative scheduler release(m) Preemptive scheduler seq. reasoning ok yield full performance except where yields ... no overhead highlight interference acquire(m) x=0 acquire(m) release(m) x=0 yield release(m) yield ... ... yield yield barrier(b) yield barrier(b) acquire(m) yield x++ release(m) acquire(m) yield x++ release(m) ≅ yield Cooperative Coop/preemptive Preemptive correctness ^ equivalence ) correctness Cooperative Concurrency Code with sync & yields ... 1. Examples acquire(m) of x++ Cooperative scheduler coding release(m) with Yields Preemptive scheduler seq. reasoning ok yield full performance except where yields ... no overhead highlight interference 2. User study: acquire(m) x=0 acquire(m) release(m) x=0 Do Yieldsyield help? release(m) yield ... ... yield yield barrier(b) yield barrier(b) acquire(m) yield x++ release(m) acquire(m) yield x++ release(m) ≅ yield 3. Dynamic analysis for C-P equivalence (detecting missing yields) Cooperative4. StaticCoop/preemptive type system Preemptive correctness for^ verifyingequivalence C-P equivalence) correctness Example: java.util.StringBuffer.append(...) synchronized StringBuffer append(StringBuffer sb){ ... int len = sb.length(); yield; ... // allocate space for len chars sb.getChars(0, len, value, index); return this; } synchronized void getChars(int, int, char[], int) {...} synchronized void expandCapacity(int) {...} synchronized int length() {...} void update_x() { x is volatile concurrent calls to update_x x = slow_f(x); } Not C-P equivalent: Copper / Silver No yield between accesses to x Coop/preemptive Cooperative Preemptive version 1 equivalence correctness correctness ^ ) void update_x() { acquire(m); x = slow_f(x); release(m); } Not efficient! high lock contention = low performance Copper / Silver Coop/preemptive Cooperative Preemptive version 2 equivalence correctness correctness ^ ) void update_x() { int fx = slow_f(x); acquire(m); x = fx; release(m); } Not C-P equivalent: Copper / Silver No yield between accesses to x Coop/preemptive Cooperative Preemptive version 3 equivalence correctness correctness ^ ) void update_x() { int fx = slow_f(x); yield; acquire(m); x = fx; release(m); } Not correct: Copper / Silver Stale value at yield Coop/preemptive Cooperative Preemptive version 4 equivalence correctness correctness ^ ) void update_x() { restructure: int y = x; test and retry pattern for (;;) { yield; int fy = slow_f(y); if (x == y) { x = fy; return; } else { y = x; } Not C-P equivalent: } Copper / Silver No yield between access to x } Coop/preemptive Cooperative Preemptive version 5 equivalence correctness correctness ^ ) void update_x() { int y = x; for (;;) { yield; int fy = slow_f(y); acquire(m); if (x == y) { x = fy; return; } else { y = x; } release(m); } } Copper / Silver Coop/preemptive Cooperative Preemptive version 6 equivalence correctness correctness ^ ) Cooperative Concurrency Code with sync & yields ... 1. Examples acquire(m) of x++ Cooperative scheduler coding release(m) with Yields Preemptive scheduler seq. reasoning ok yield full performance except where yields ... no overhead highlight interference 2. User study: acquire(m) x=0 acquire(m) release(m) x=0 Do Yieldsyield help? release(m) yield ... ... yield yield barrier(b) yield barrier(b) acquire(m) yield x++ release(m) acquire(m) yield x++ release(m) ≅ yield 3. Dynamic analysis for C-P equivalence (detecting missing yields) Cooperative4. StaticCoop/preemptive type system Preemptive correctness for^ verifyingequivalence C-P equivalence) correctness A Preliminary User Study of Cooperability •Hypothesis: Yields help code comprehension + defect detection? •Study structure •Web-based survey, background check on threads •Between-group design - code with or without yields •Three code samples, based on real-world bugs •Task: Identify all bugs User Evaluation for Cooperability StringBuffer Concurrency bug Some other bug Didn’t find bug Total Yields 10 1 1 12 No Yields 1 5 9 15 All Samples Concurrency bug Some other bug Didn’t find bug Total Yields 30 3 3 36 No Yields 17 6 21 44 - Difference is statistically significant (p < 0.001) User Evaluation of Correctness Conditions: A Case Study of Cooperability. Sadowski & Yi, PLATEAU 2010 Cooperative Concurrency Code with sync & yields ... 1. Examples acquire(m) of x++ Cooperative scheduler coding release(m) with Yields Preemptive scheduler seq. reasoning ok yield full performance except where yields ... no overhead highlight interference 2. User study: acquire(m) x=0 acquire(m) release(m)

Cooperative Concurrency for a Multicore World

Threads Chapter 4

Lecture 10: Introduction to Openmp (Part 2)

The Problem with Threads

Modifiable Array Data Structures for Mesh Topology

CUDA Binary Utilities

Lithe: Enabling Efficient Composition of Parallel Libraries

Thread Scheduling in Multi-Core Operating Systems Redha Gouicem

Performance and Programmability Trade-Offs in the Opencl 2.0 SVM and Memory Model

Applications Tuning for Streaming SIMD Extensions

Worst-Case Execution Time Estimation for Hardware-Assisted Multithreaded Processors

Interfacing Chapel with Traditional HPC Programming Languages ∗

IMTEC-91-58 High-Performance Computing: Industry Uses Of