Lock-Free Concurrency • Compilers and Hardware, and the C++ Memory Model

Engineering Tool 4 (D-MAVT, AS19) Day 3 Malte Schwerhoff http://lec.inf.ethz.ch/mavt/etIV/2019/ Yesterday’s ECTS Project Function void select_random_students(...): Master solution Student solution // N is number of students // N is number of students idx1 = random_uint(0, N); idx1 = random_uint(0, N); do { idx2 = random_uint(1, N); idx2 = random_uint(0, N); idx2 = (idx1 + idx2) % N; } while (idx2 == idx1); 2 Yesterday’s ECTS Project Fine-grained vs. coarse-grained locking Fine-grained Coarse-grained class Student { … std::mutex mutex; …} std::mutex global_mx = …; class Lecturer { class Lecturer { … … void transfer(Student* s1, Student* s2, …) { void transfer(Student* s1, Student* s2, …) { // let s1 < s2 global_mx.lock(); s1->mutex.lock(); s2->mutex.lock(); // critical section: transfer credits // critical section: transfer credits global_mx.unlock(); s1->mutex.unlock(); s2->mutex.unlock(); } } … … } } class Admin { class Admin { … … void work() { void work() { global_mx.lock(); for (auto student : students) … student->mutex.lock(); } … } } } 3 Yesterday’s ECTS Project Mutual exclusion always (and intentionally) reduces the degree of parallelism in a program. Guidelines too keep in mind: • Be careful not to unnecessarily reduce the degree of parallelism • Make the critical section as small as possible But remember: always benchmark your program to check effective performance gain. 4 Today’s Agenda The purpose of today’s lecture is to briefly go over a few other concepts and techniques from the area of parallel programming: • Locks – the better mutexes • Beyond deadlocks: livelocks, starvation, fairness • Inter-thread communication: condition variables, barriers • Beyond threads: tasks, futures, promises • Declarative parallelism: the OpenMP paradigm • Lock-free concurrency • Compilers and hardware, and the C++ memory model 5 Locks 6 Beyond Deadlocks: Dangers of Mutexes void foo(...) { some_mutex.lock(); if (...) { ... What could go wrong return; with this program? } some_mutex.unlock(); } void foo(...) { some_mutex.lock(); What could go wrong some_object.some_function(); with this program? some_mutex.unlock(); } 7 Beyond Deadlocks: Dangers of Mutexes void foo(...) { • some_function might not terminate … some_mutex.lock(); • … or raise an exception (a “controlled” some_object.some_function(); error) some_mutex.unlock(); some_mutex won’t be unlocked } and the whole system could deadlock ↪ • Non-termination cannot be prevented systematically (in practice) Apply the following guidelines (if possible): • Don’t call unknown functions when holding a mutex ↪ • In particular, do not call virtual member functions ! 8 Beyond Deadlocks: Mutexes and RAII void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } These situations can be handled systematically, by using the RAII idiom (recall ET2). Core idea: • Recall that stack-allocated objects (those not instantiated with new) are deallocated when they go out of scope, e.g. at the end of a function int& foo(...) { int x = ...; return x; x goes out of scope; returns } reference to dead object 9 Beyond Deadlocks: Mutexes and RAII void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } These situations can be handled systematically, by using the RAII idiom (recall ET2). Core idea: • Recall that stack-allocated objects (those not instantiated with new) are deallocated when they go out of scope, e.g. at the end of a function • Destructor is automatically called when objects are deallocated Wrap mutex in stack-allocated guard object • Guard’s constructor locks the mutex ↪ • Guard’s deconstructor unlocks it 10 Beyond Deadlocks: Guarding Mutexes void foo(...) { void foo(...) { some_mutex.lock(); some_mutex.lock(); if (...) return; fun() // might throw exception some_mutex.unlock(); some_mutex.unlock(); } } void foo(...) { void foo(...) { std::lock_guard<std::mutex> std::lock_guard<std::mutex> guard(some_mutex); guard(some_mutex); if (...) return; fun() // might throw exception } } Guard automatically locks mutex – and more importantly, also unlocks it 11 Beyond Deadlocks: Guarding Mutexes • Guideline: use locks, i.e. guarded mutexes, whenever possible • Not done in this course for simplicity • Nevertheless very important: if you remember one thing from today, then this • Different locks exist (see cppreference.com for details): • std::lock_guard: basic lock for single mutex • std::scoped_lock: multiple mutexes, prevents deadlocks (if used exclusively) • std::unique_lock: single mutex, more control (e.g. when mutex is locked) • std::shared_lock: for reader-writer situations • Many threads only read the shared data in their critical section • Few threads write the shared data • Reading in parallel is fine, but writers need exclusive access • Example: shared phone book; much more often read than updated 12 Beyond Deadlocks 13 Beyond Deadlocks: Livelocks // simplified C++ code • Solving the Dining Philosophers problem? while (true) { • Try to grab left fork left_fork.try_lock(1ms) if (left_fork.is_locked()) { • If successful, try to grab right fork right_fork.try_lock(1ms) • If both successful eat if (right_fork.is_locked()) break; // exit loop to eat • Otherwise, put down left fork (if grabbed), left_fork.unlock(); wait for a bit and try→ again } this_thread::sleep_for(1ms) } any problems with this algorithm? // let’s eat ... • Possible⤷ danger of a livelock: • All philosophers grab left fork, don’t get right fork, drop forks, wait, repeat ... • Overall system doesn’t halt (as in a deadlock), but no real progress is made either • Earlier-shown resource ordering approach also prevents livelocks 14 Beyond Deadlocks: Starvation and Fairness // simplified C++ code • Solving the Dining Philosophers problem? while (true) { • Try to grab left fork left_fork.try_lock(1ms) if (left_fork.is_locked()) { • If successful, try to grab right fork right_fork.try_lock(1ms) • If both successful eat if (right_fork.is_locked()) break; // exit loop to eat • Otherwise, put down left fork (if grabbed), left_fork.unlock(); wait for a bit and try→ again } this_thread::sleep_for(1ms) } any problems with this algorithm? // let’s eat ... • Starvation⤷ is a related problem: • It can happen that some philosophers never get their forks (shared resources), thus they starve • Earlier-shown resource ordering approach does not prevent starvation • A fair scheduler (or fairness-enforcing locking approaches) can help • Mathematically defining what fairness means is an interesting exercise 15 Inter-Thread Communication 16 Inter-Thread Communication • In order to synchronise their work, threads often need to exchange information, i.e. communicate • Already seen: • Atomic data types • Mutexes • But there are, of course, many more communication ideas & techniques • Most (all?) can be implemented using atomics and mutexes • Different techniques have (dis)advantages in different situations • As usual: understand problem, then choose best-fitting tools 17 Inter-Thread Communication: Condition Variables • C++: library <condition_variable> • Think of a condition variable as a broadcast • Sender-receiver/producer-consumer are typical use case • One thread performs some work • Then informs another thread (or many others) that the work is done 18 Inter-Thread Communication: Condition Variables • C++: library <condition_variable>; think broadcast • Atomic boolean vs. condition variable // T1 does some work // T1 does some work T1: atomic_bool = true; cond_var.notify_all() while (!atomic_bool.read()); cond_var.wait(); T2: // T2 can use T1’s results // T2 can use T1’s results + simple, no internal overhead + waiting threads sleep, no busy waiting + can also notify (wake up) only one thread – busy waiting: loop consumes CPU time – more internal overhead, in particular thread sleep/wake – further complications (lost wake-up, spurious wake-up) 19 Inter-Thread Communication: Barriers • Situation: • Threads solve a problem in phases • Phase i+1 can only be started once all (or some) threads completed phase i • Can already be handled, using com- time binations of what we’ve seen so far (atomics, mutexes, …) … • … but barriers (C++ 20, Java, …) exist specifically for this use case Gears icon by icons8.com 20 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: • N worker threads, one supervisor thread void worker(...) { // do work from phase 1 end_of_phase.arrive_and_wait(); // do work from phase 2 ... } void supervisor(...) { int p = 1; while (...) { end_of_phase.arrive_and_wait(); cout << “End of phase “ << p++; } } 21 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: N worker threads, one supervisor thread void worker(...) { • st // do work from phase 1 • Workers must all finish 1 phase before end_of_phase.arrive_and_wait(); starting 2nd, etc. // do work from phase 2 ... } void supervisor(...) { int p = 1; while (...) { end_of_phase.arrive_and_wait(); cout << “End of phase “ << p++; } } 22 Inter-Thread Communication: Barriers barrier end_of_phase(N + 1); Pseudo-code shows an example: N worker threads, one supervisor thread void worker(...) { • st // do work from phase 1 • Workers must all finish 1 phase before end_of_phase.arrive_and_wait(); starting 2nd, etc. • Supervisor simply prints finished phase // do work from phase 2 ... } void supervisor(...)

Lock-Free Concurrency • Compilers and Hardware, and the C++ Memory Model

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support