Lock-Free Concurrency • Compilers and Hardware, and the C++ Memory Model

Engineering Tool 4 (D-MAVT, AS19) Day 3 Malte Schwerhoff http://lec.inf.ethz.ch/mavt/etIV/2019/ Yesterday’s ECTS Project Function void select_random_students(...): Master solution Student solution // N is number of students // N is number of students idx1 = random_uint(0, N); idx1 = random_uint(0, N); do { idx2 = random_uint(1, N); idx2 = random_uint(0, N); idx2 = (idx1 + idx2) % N; } while (idx2 == idx1); 2 Yesterday’s ECTS Project Fine-grained vs. coarse-grained locking Fine-grained Coarse-grained class Student { … std::mutex mutex; …} std::mutex global_mx = …; class Lecturer { class Lecturer { … … void transfer(Student* s1, Student* s2, …) { void transfer(Student* s1, Student* s2, …) { // let s1 < s2 global_mx.lock(); s1->mutex.lock(); s2->mutex.lock(); // critical section: transfer credits // critical section: transfer credits global_mx.unlock(); s1->mutex.unlock(); s2->mutex.unlock(); } } … … } } class Admin { class Admin { … … void work() { void work() { global_mx.lock(); for (auto student : students) … student->mutex.lock(); } … } } } 3 Yesterday’s ECTS Project Mutual exclusion always (and intentionally) reduces the degree of parallelism in a program. Guidelines too keep in mind: • Be careful not to unnecessarily reduce the degree of parallelism • Make the critical section as small as possible But remember: always benchmark your program to check effective performance gain. 4 Today’s Agenda The purpose of today’s lecture is to briefly go over a few other concepts and techniques from the area of parallel programming: • Locks – the better mutexes • Beyond deadlocks: livelocks, starvation, fairness • Inter-thread communication: condition variables, barriers • Beyond threads: tasks, futures, promises • Declarative parallelism: the OpenMP paradigm • Lock-free concurrency • Compilers and hardware, and the C++ memory model 5 Locks 6 Beyond Deadlocks: Dangers of Mutexes void foo(...) { some_mutex.lock(); if (...) { ... What could go wrong return; with this program? } some_mutex.unlock(); } void foo(...) { some_mutex.lock(); What could go wrong some_object.some_function(); with this program? some_mutex.unlock(); } 7 Beyond Deadlocks: Dangers of Mutexes void foo(...) { • some_function might not terminate … some_mutex.lock(); • … or raise an exception (a “controlled” some_object.some_function(); error) some_mutex.unlock(); some_mutex won’t be unlocked } and the whole system could deadlock ↪ • Non-termination cannot be prevented systematically (in practice) Apply the following guidelines (if possible): • Don’t call unknown functions when holding a mutex ↪ • In particular, do not call virtual member functions ! 8 Beyond Deadlocks: Mutexes and RAII void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } These situations can be handled systematically, by using the RAII idiom (recall ET2). Core idea: • Recall that stack-allocated objects (those not instantiated with new) are deallocated when they go out of scope, e.g. at the end of a function int& foo(...) { int x = ...; return x; x goes out of scope; returns } reference to dead object 9 Beyond Deadlocks: Mutexes and RAII void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } These situations can be handled systematically, by using the RAII idiom (recall ET2). Core idea: • Recall that stack-allocated objects (those not instantiated with new) are deallocated when they go out of scope, e.g. at the end of a function • Destructor is automatically called when objects are deallocated Wrap mutex in stack-allocated guard object • Guard’s constructor locks the mutex ↪ • Guard’s deconstructor unlocks it 10 Beyond Deadlocks: Guarding Mutexes void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } void foo(...) { void foo(...) { std::lock_guard<std::mutex> std::lock_guard<std::mutex> guard(some_mutex); guard(some_mutex); if (...) return; fun() // might throw exception } } Guard automatically locks mutex – and more importantly, also unlocks it 11 Beyond Deadlocks: Guarding Mutexes • Guideline: use locks, i.e. guarded mutexes, whenever possible • Not done in this course for simplicity • Nevertheless very important: if you remember one thing from today, then this • Different locks exist (see cppreference.com for details): • std::lock_guard: basic lock for single mutex • std::scoped_lock: multiple mutexes, prevents deadlocks (if used exclusively) • std::unique_lock: single mutex, more control (e.g. when mutex is locked) • std::shared_lock: for reader-writer situations • Many threads only read the shared data in their critical section • Few threads write the shared data • Reading in parallel is fine, but writers need exclusive access • Example: shared phone book; much more often read than updated 12 Beyond Deadlocks 13 Beyond Deadlocks: Livelocks // simplified C++ code • Solving the Dining Philosophers problem? while (true) { • Try to grab left fork left_fork.try_lock(1ms) if (left_fork.is_locked()) { • If successful, try to grab right fork right_fork.try_lock(1ms) • If both successful eat if (right_fork.is_locked()) break; // exit loop to eat • Otherwise, put down left fork (if grabbed), left_fork.unlock(); wait for a bit and try→ again } this_thread::sleep_for(1ms) } any problems with this algorithm? // let’s eat ... • Possible⤷ danger of a livelock: • All philosophers grab left fork, don’t get right fork, drop forks, wait, repeat ... • Overall system doesn’t halt (as in a deadlock), but no real progress is made either • Earlier-shown resource ordering approach also prevents livelocks 14 Beyond Deadlocks: Starvation and Fairness // simplified C++ code • Solving the Dining Philosophers problem? while (true) { • Try to grab left fork left_fork.try_lock(1ms) if (left_fork.is_locked()) { • If successful, try to grab right fork right_fork.try_lock(1ms) • If both successful eat if (right_fork.is_locked()) break; // exit loop to eat • Otherwise, put down left fork (if grabbed), left_fork.unlock(); wait for a bit and try→ again } this_thread::sleep_for(1ms) } any problems with this algorithm? // let’s eat ... • Starvation⤷ is a related problem: • It can happen that some philosophers never get their forks (shared resources), thus they starve • Earlier-shown resource ordering approach does not prevent starvation • A fair scheduler (or fairness-enforcing locking approaches) can help • Mathematically defining what fairness means is an interesting exercise 15 Inter-Thread Communication 16 Inter-Thread Communication • In order to synchronise their work, threads often need to exchange information, i.e. communicate • Already seen: • Atomic data types • Mutexes • But there are, of course, many more communication ideas & techniques • Most (all?) can be implemented using atomics and mutexes • Different techniques have (dis)advantages in different situations • As usual: understand problem, then choose best-fitting tools 17 Inter-Thread Communication: Condition Variables • C++: library <condition_variable> • Think of a condition variable as a broadcast • Sender-receiver/producer-consumer are typical use case • One thread performs some work • Then informs another thread (or many others) that the work is done 18 Inter-Thread Communication: Condition Variables • C++: library <condition_variable>; think broadcast • Atomic boolean vs. condition variable // T1 does some work // T1 does some work T1: atomic_bool = true; cond_var.notify_all() while (!atomic_bool.read()); cond_var.wait(); T2: // T2 can use T1’s results // T2 can use T1’s results + simple, no internal overhead + waiting threads sleep, no busy waiting + can also notify (wake up) only one thread – busy waiting: loop consumes CPU time – more internal overhead, in particular thread sleep/wake – further complications (lost wake-up, spurious wake-up) 19 Inter-Thread Communication: Barriers • Situation: • Threads solve a problem in phases • Phase i+1 can only be started once all (or some) threads completed phase i • Can already be handled, using com- time binations of what we’ve seen so far (atomics, mutexes, …) … • … but barriers (C++ 20, Java, …) exist specifically for this use case Gears icon by icons8.com 20 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: • N worker threads, one supervisor thread void worker(...) { // do work from phase 1 end_of_phase.arrive_and_wait(); // do work from phase 2 ... } void supervisor(...) { int p = 1; while (...) { end_of_phase.arrive_and_wait(); cout << “End of phase “ << p++; } } 21 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: N worker threads, one supervisor thread void worker(...) { • st // do work from phase 1 • Workers must all finish 1 phase before end_of_phase.arrive_and_wait(); starting 2nd, etc. // do work from phase 2 ... } void supervisor(...) { int p = 1; while (...) { end_of_phase.arrive_and_wait(); cout << “End of phase “ << p++; } } 22 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: N worker threads, one supervisor thread void worker(...) { • st // do work from phase 1 • Workers must all finish 1 phase before end_of_phase.arrive_and_wait(); starting 2nd, etc. • Supervisor simply prints finished phase // do work from phase 2 ... } void supervisor(...)

Load more