Software Transactional Memory

Jaakko Järvi

University of Bergen

<2018-10-02 Tue>

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 1 / 11 Outline

1 Examples

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 2 / 11 Both transactions maintain invariant, yet without early fail can “crash”

Examples Early Rollback is Important

Assume invariant x == 0 && y == 0, and two concurrently executing threads:

atomic { if (x != y) crash(); }

atomic { ++x; ++y; }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 3 / 11 Examples Early Rollback is Important

Assume invariant x == 0 && y == 0, and two concurrently executing threads:

atomic { if (x != y) crash(); }

atomic { ++x; ++y; }

Both transactions maintain invariant, yet without early fail can “crash”

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 3 / 11 Examples Deposit with synchronization (Java)

public void deposit(double amount){ System.out.println("Depositing "+ amount); double nb=0; balanceChangeLock.lock(); try{ nb= balance+ amount; balance= nb; } finally{ balanceChangeLock.unlock(); } System.out.println("New balance is "+ nb); }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 4 / 11 Examples Deposit with STM (hypothetical Java)

public void deposit(double amount){ System.out.println("Depositing "+ amount); double nb=0;

atomic{ nb= balance+ amount; balance= nb; }

System.out.println("New balance is "+ nb); }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 5 / 11 Lock all accounts Manually decide what needs to be undone after each kind of exception Side-effects might be visible in other threads before undone

Examples Composing critical sections (lock syncrhonization)

class Bank{ Accounts accounts; ... void transfer(String name1, String name2, int amount){ synchronized(accounts){ try{ accounts.put(name1, accounts.get(name1)-amount); accounts.put(name2, accounts.get(name2)+amount); } catch(Exception1) {..} catch(Exception2) {..} } ... }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 6 / 11 Examples Composing critical sections (lock syncrhonization)

class Bank{ Accounts accounts; ... void transfer(String name1, String name2, int amount){ synchronized(accounts){ try{ accounts.put(name1, accounts.get(name1)-amount); accounts.put(name2, accounts.get(name2)+amount); } catch(Exception1) {..} catch(Exception2) {..} } ... }

Lock all accounts Manually decide what needs to be undone after each kind of exception Side-effects might be visible in other threads before undone

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 6 / 11 Examples Composing critical sections (STM)

class Bank{ Accounts accounts; ... void transfer(String name1, String name2, int amount){ try{ atomic{ accounts.put(name1, accounts.get(name1)-amount); accounts.put(name2, accounts.get(name2)+amount); } } catch(Exception1) {..} catch(Exception2) {..} } ... }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 7 / 11 Examples Motivating example: HashMap (thread-unsafe)

public Object get(Object key){ int idx= hash(key); // Compute hash HashEntry = buckets[idx]; // to find bucket while(e != null){ // Find element in bucket if(key.equals(e.key)) returne.value; e=e.next; } return null; }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 8 / 11 Simple solution: add a synchronization layer Poor scalability, entire map locked at once.

Examples HashMap (thread-safe via lock synchronization)

public Object get(Object key){ synchronized(mutex) // mutex guards all accesses to map m { returnm.get(key); } }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 9 / 11 Examples HashMap (thread-safe via lock synchronization)

public Object get(Object key){ synchronized(mutex) // mutex guards all accesses to map m { returnm.get(key); } }

Simple solution: add a synchronization layer Poor scalability, entire map locked at once.

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 9 / 11 Equally simple Good scalability, only the impacted parts of HashMap locked (briefly)

Examples HashMap (thread-safe via STM)

public Object get(Object key){ atomic // System guarantees atomicity { returnm.get(key); } }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 10 / 11 Examples HashMap (thread-safe via STM)

public Object get(Object key){ atomic // System guarantees atomicity { returnm.get(key); } }

Equally simple Good scalability, only the impacted parts of HashMap locked (briefly)

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 10 / 11 Examples HashMap (scalable thread-safe with fine-grained locking)

public Object get(Object key){ int hash= hash(key); // Try first without locking... Entry[] tab= table; int index= hash&(tab.length-1); Entry first= tab[index]; Entry e; for(e= first;e != null;e=e.next){ if(e.hash == hash&& eq(key,e.key)){ Object value=e.value; if(value != null) return value; else break; } }

// Recheck under synch if key not there or interference Segment seg= segments[hash& SEGMENT_MASK]; synchronized(seg){ tab= table; index= hash&(tab.length-1); Entry newFirst= tab[index]; if(e != null || first != newFirst){ for(e= newFirst;e != null;e=e.next){ if(e.hash == hash&& eq(key,e.key)) returne.value; } } return null; } }

Jaakko Järvi (University of Bergen) Software Transactional Memory <2018-10-02 Tue> 11 / 11 Concurrent Programming in Standard C++

Jaakko Järvi

University of Bergen

<2018-10-02 Tue>

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 1 / 92 Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 2 / 92 C++ Standardization and concurrency C++11, C++14, C++17, C++2a, . . . and concurrency

C++03 had no support for concurrency All concurrent programs relied on OS specific services that could change from version to version C++11 made two significant additions: a memory model for C++ the beginnings of the standard API for concurrency-related functionality threads, locks, futures, etc. C++14 adds some tweaks to the API C++17 parallel STL algorithms C++2a will add more features composable futures? ? tansactional memory? latches, barriers? atomic smart pointers?

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 3 / 92 C++ memory model Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 4 / 92 For the programmer: a set of guarantees about the order in which memory reads and writes are observed by a thread conversely, a set of obligations that a programmer has to adhere to when writing concurrent code For the compiler and hardware: a set of rules that defines valid code transformations

C++ memory model Memory model

Memory model defines the semantics of shared variables

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 5 / 92 C++ memory model Memory model

Memory model defines the semantics of shared variables

For the programmer: a set of guarantees about the order in which memory reads and writes are observed by a thread conversely, a set of obligations that a programmer has to adhere to when writing concurrent code For the compiler and hardware: a set of rules that defines valid code transformations

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 5 / 92 C++ memory model Multi-processor system

Thread 1 Thread n ···

w r w

Shared memory

Multiple threads can update and access the same shared variables global variables, “static” class members, all data accessible to those variables or passed to the thread by other means Each thread also has its own local variables

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 6 / 92 C++ memory model Consistency in multi-processor systems

Lamport: How to make a Multiprocessor Computer That Correctly Executes Multiprocess programs, 1979: Definitions for what consistency means in a multi-processor systems; what kind of leeway can be given to compilers and hardware

Sequential processor The result of the execution is the same as if the operations had been executed in the order specified by the program

Sequential consistency The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 7 / 92 C++ memory model Sequential consistency

Any execution that does something like in the lower picture must be indistinguishable from some execution like the upper picture

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 8 / 92 C++ memory model Consistency in multi-processor systems

That each processor is sequential does not guarantee sequential consistency

Additional requirement 1 Each processor issues memory requests in the order specified by its program

Additional requirement 2 Memory requests from all processors issued to an individual memory module are serviced from a single FIFO queue. Issuing a memory request consists of entering the request on this queue.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 9 / 92 C++ memory model Programming assuming sequential consistency

Assume shared variables x and y both have value 0

// Thread 1 // Thread 2 x=1; y=1; r1= y; r2= x;

Some interleavings

x=1; // 1 y=1; // 2 x=1; // 1 r1= y; // 1 r2=x // 2 y=1; // 2 y=1; // 2 x=1; // 1 r1= y; // 1 r2=x // 2 r1= y; // 1 r2=x // 2 // r1 == 0, r2 == 1 // r1 == 1, r2 == 0 // r1 == 1, r2 == 1

Execute as if at each step one thread selected, and its next statement is executed Repeat until all threads done Under sequential consistency, it is not possible that both r1 and r2 are 0 at the end of the execution

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 10 / 92 allocate memory for the new object initialize x initialize y assign to x assign to y assign to pos

C++ memory model Another example

pos= new Point(pos.x+1, pos.y+1);

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 11 / 92 C++ memory model Another example

pos= new Point(pos.x+1, pos.y+1);

allocate memory for the new object initialize x initialize y assign to x assign to y assign to pos

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 11 / 92 None of today’s architectures are sequentially consistent This is because of performance optimizations in hardware reordering of instructions, speculative execution, buffering writes This is also because of common compiler optimizations common subexpression elimination, eliminating redundant reads, loop optimizations, . . . These hardware or compiler optimizations, not observable in single-threaded code, can become observable in multi-threaded code

C++ memory model Relaxed memory

Sequential consistency would be a nice programming model, but . . .

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 12 / 92 C++ memory model Relaxed memory

Sequential consistency would be a nice programming model, but . . . None of today’s architectures are sequentially consistent This is because of performance optimizations in hardware reordering of instructions, speculative execution, buffering writes This is also because of common compiler optimizations common subexpression elimination, eliminating redundant reads, loop optimizations, . . . These hardware or compiler optimizations, not observable in single-threaded code, can become observable in multi-threaded code

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 12 / 92 C++ memory model An example not assuming sequential consistency

Assume shared variables x and y, both 0

// Thread 1 // Thread 2 x=1; y=1; r1= y; r2= x;

Processors typically do not wait for the first assignment to complete before executing the second Value 1 stored in a buffer, waiting to be written to memory Not visible in other thread immediately The result r1 == 0 and r2 == 0 is possible ⇒ Compilers can rearrange code Both assignments in each thread are independent Compiler free to move r1 = y and r2 = x up The result r1 == 0 and r2 == 0 is possible ⇒

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 13 / 92 Common subexpression elimination will likely rewrite cout << data + 2 to cout << x, and thus the program could print 2

data=1; x= data+2; ready=1; if (ready ==1) cout << x;

And of course, the SC assumption is unrealistic, so the assignments in thread 1 could be reordered

C++ memory model Another example of compiler effects

Assume sequential consistency, and ready == 0. Seeminly this program would then print nothing, or print 3.

data=1; x= data+2; ready=1; if (ready ==1) cout << data+2;

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 14 / 92 And of course, the SC assumption is unrealistic, so the assignments in thread 1 could be reordered

C++ memory model Another example of compiler effects

Assume sequential consistency, and ready == 0. Seeminly this program would then print nothing, or print 3.

data=1; x= data+2; ready=1; if (ready ==1) cout << data+2;

Common subexpression elimination will likely rewrite cout << data + 2 to cout << x, and thus the program could print 2

data=1; x= data+2; ready=1; if (ready ==1) cout << x;

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 14 / 92 C++ memory model Another example of compiler effects

Assume sequential consistency, and ready == 0. Seeminly this program would then print nothing, or print 3.

data=1; x= data+2; ready=1; if (ready ==1) cout << data+2;

Common subexpression elimination will likely rewrite cout << data + 2 to cout << x, and thus the program could print 2

data=1; x= data+2; ready=1; if (ready ==1) cout << x;

And of course, the SC assumption is unrealistic, so the assignments in thread 1 could be reordered

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 14 / 92 C++ memory model Another example

pos= new Point(pos.x+1, pos.y+1);

allocate memory for the new object initialize x initialize y assign to x assign to y assign to pos

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 15 / 92 C++ memory model Another example

pos= new Point(pos.x+1, pos.y+1);

allocate memory for initialize x initialize y the new object assign to x assign to y assign to pos

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 16 / 92 Data race A program allows a data race if there is a sequentially consistent execution, in which two conflicting operations can be executed simultaneously

C++ memory model Data races

The above problems arise because 1 there are two threads that might access the same data simultaneously in a conflicting way (there is a data race) and 2 either the compiler or processor rewrites/rearranges code (or rearranges when memory reads and writes are visible)

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 17 / 92 C++ memory model Data races

The above problems arise because 1 there are two threads that might access the same data simultaneously in a conflicting way (there is a data race) and 2 either the compiler or processor rewrites/rearranges code (or rearranges when memory reads and writes are visible)

Data race A program allows a data race if there is a sequentially consistent execution, in which two conflicting operations can be executed simultaneously

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 17 / 92 If two memory operations from different threads occur adjacently in the interleaving, they could have occurred in the opposite order too; or simultaneously if true concurrency exists.

C++ memory model Definitions

Conflicting memory operations Two memory operations conflict if they access the same memory location, and at least one of them is a write

Simultaneous operations Two operations are simultaneous if they are from different threads and they are adjacent in the interleaving.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 18 / 92 C++ memory model Definitions

Conflicting memory operations Two memory operations conflict if they access the same memory location, and at least one of them is a write

Simultaneous operations Two operations are simultaneous if they are from different threads and they are adjacent in the interleaving.

If two memory operations from different threads occur adjacently in the interleaving, they could have occurred in the opposite order too; or simultaneously if true concurrency exists.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 18 / 92 The memory model guarantees that updates to different memory locations do not interfere each other, and do not need to be synchronized Two different bitfields in the same contiguous sequence of bitfields are considered the same memory location In C++98, the following could leave x and y with either 0 or 1 c and b could be allocated into the same word Disallowed in C++11

// thread 1: // thread 2: char c; char b; c=1; b=1; intx= c; inty= b;

C++ memory model C++ memory model guarantee

If a program has no data races, it is sequentially consistent

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 19 / 92 C++ memory model C++ memory model guarantee

If a program has no data races, it is sequentially consistent

The memory model guarantees that updates to different memory locations do not interfere each other, and do not need to be synchronized Two different bitfields in the same contiguous sequence of bitfields are considered the same memory location In C++98, the following could leave x and y with either 0 or 1 c and b could be allocated into the same word Disallowed in C++11

// thread 1: // thread 2: char c; char b; c=1; b=1; intx= c; inty= b;

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 19 / 92 C++ memory model called “Sequentially consistent for data race-free programs” (SC-DRF) The basic approach similar to Java’s memory model, except that for Java, semantics more complex (even when races present, memory safety must be preserved)

C++ memory model C++ memory model guarantee, the flip side

If a program has a data race, its behavior is undefined

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 20 / 92 C++ memory model C++ memory model guarantee, the flip side

If a program has a data race, its behavior is undefined

C++ memory model called “Sequentially consistent for data race-free programs” (SC-DRF) The basic approach similar to Java’s memory model, except that for Java, semantics more complex (even when races present, memory safety must be preserved)

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 20 / 92 C++ memory model Relaxed memory model is not a huge (extra) burden to the programmer

In addition to sequential consistency, the above examples rely on atomicity of some operations. x = 1 etc. was assumed to happen atomically Depending on the type of x, this may not be true in all platforms. A thread could observe states where “half” of x written to Certainly one cannot expect atomicity from updates to arbitrary variables of user-defined types Sequential consistency alone is not enough, the programmer should ⇒ anyway use synchronization to ensure that there are no data races.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 21 / 92 C++ memory model Simple example

Consider this code: c++, where c a shared variable Most compilers would turn it into tmp = c; ++tmp; c = tmp; Or to a single instruction, which the processor might not execute atomically With threads 1 and 2 both executing c++, we might get: tmp1= c; // reads n tmp2= c; // reads n ++tmp1; ++tmp2; c= tmp1; // writes n + 1 c= tmp2; // writes n + 1

Hence, even with sequential consistency, typically must synchronize

l.lock(); c++; l.unlock();

Programs with data races almost never produce consistently correct results across various hardware and compiler platforms.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 22 / 92 C++ memory model Intermediate summary

The C++ memory model was designed so that programmers should not have to think about the C++ memory model This works as long as all access to shared variables are ordered with locks, atomics, etc. happens-berore and synchronizes-with relations (to be discussed)

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 23 / 92 // Thread 1: // Thread 2: r1= x; r2= y; if (r1 == 42) y= r1; x= 42;

C++ memory model Reasoning is otherwise too difficult (and unreliable)

Assume x == 0, y == 0

// Thread 2: // Thread 1: r2= y; r1= x; if (r2 == 42) x= 42; if (r1 == 42) y= r1; elsex= 42;

r1 == r2 == 42 possible.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 24 / 92 C++ memory model Reasoning is otherwise too difficult (and unreliable)

Assume x == 0, y == 0

// Thread 2: // Thread 1: r2= y; r1= x; if (r2 == 42) x= 42; if (r1 == 42) y= r1; elsex= 42;

r1 == r2 == 42 possible.

// Thread 1: // Thread 2: r1= x; r2= y; if (r1 == 42) y= r1; x= 42;

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 24 / 92 Standard library offerings for concurrency C++11 concurrency toolbox

Threads Mutexes, locks Condition variables Atomic variables Futures and promises async() function Abandoning processes

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 25 / 92 Threads Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 26 / 92 Threads Threads

Instances of the thread class represents threads of execution The code to executed given as a function object and its arguments Only communication between threads is via shared variables return value of the function ignored (unhandled) exceptions from the function lead to program termination! Threads cannot be terminated from outside More precisely, not portably without terminating the whole program Thread typically bound to an OS thread

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 27 / 92 Threads Constructing a thread object

Constructor takes a function object (such as a lambda) and arguments void print(string s1, string s2) { cout << s1 << s2; }

std::thread t1([]() { cout << "Hello"; }); std::thread t2(print,",", "World!");

A default constructed thread is not bound to a thread of execution Threads are not copyable, but they are movable std::thread t1([](){}); std::thread t2(std::move(t1)); // now t1 not joinable

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 28 / 92 Threads Joinability of a thread

A thread may be joinable or not joinable A joinable thread is potentially executing (running, waiting, currently not scheduled, finished) constructed with a function object acquired a thread of execution from another thread object via a move or a swap A not joinable thread is not executing not yet given a function to execute already joined detached moved from Only not joinable threads can be safely destroyed std::terminate() called otherwise

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 29 / 92 , HelloWorld!

Threads Example, launching and joining threads

#include #include

void print(string s1, string s2) { cout << s1 << s2; }

int main() {

std::thread t1([]() { cout << "Hello"; }); std::thread t2(print,",", "World!");

t1.join(); t2.join(); return0; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 30 / 92 Threads Example, launching and joining threads

#include #include

void print(string s1, string s2) { cout << s1 << s2; }

int main() {

std::thread t1([]() { cout << "Hello"; }); std::thread t2(print,",", "World!");

t1.join(); t2.join(); , HelloWorld! return0; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 30 / 92 , HelloWorld!

Threads Same example in alang

#include "alang.hpp"

void print(string s1, string s2) { cout << s1 << s2; }

int main() { processes ps; ps += []() { cout << "Hello"; }; ps += []() { print(", ", "World!"); }; return0; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 31 / 92 Threads Same example in alang

#include "alang.hpp"

void print(string s1, string s2) { cout << s1 << s2; }

int main() { processes ps; ps += []() { cout << "Hello"; }; ps += []() { print(", ", "World!",); HelloWorld! }; return0; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 31 / 92 Threads A helper function

void pause_thread_s(int n) { std::this_thread::sleep_for(std::chrono::seconds(n)); } void pause_thread_ms(int n) { std::this_thread::sleep_for(std::chrono::milliseconds(n)); }

this_thread namespace has get_id — get thread’s id yield — hint to the scheduler to reschedule sleep_for, sleep_until

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 32 / 92 , World!

Threads Example, detaching threads

detach() lets a thread “loose” the thread object becomes not joinable

void print(string s1, string s2) { cout << s1 << s2; }

int main() { std::thread([]() { pause_thread_s(2); std::cout << "Hello"; }).detach();

std::thread(print,",", "World!").detach();

pause_thread_s(1); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 33 / 92 Threads Example, detaching threads

detach() lets a thread “loose” the thread object becomes not joinable

void print(string s1, string s2) { cout << s1 << s2; }

int main() { std::thread([]() { pause_thread_s(2); std::cout << "Hello"; }).detach();

std::thread(print,",", "World!").detach();

pause_thread_s(1); , World! }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 33 / 92 acc=336: 1 times acc=369: 1 times acc=375: 1 times acc=376: 1 times acc=381: 3 times acc=385: 993 times

Threads A larger thread example

int acc; void acc_square(int x) { acc +=x* x; }

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.push_back(thread(acc_square, i)); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 34 / 92 Threads A larger thread example

int acc; void acc_square(int x) { acc +=x* x; } acc=336: 1 times acc=369: 1 times int main() { acc=375: 1 times map m; acc=376: 1 times acc=381: 3 times for(intj=0; j<1000; ++j) { acc=385: 993 times acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.push_back(thread(acc_square, i)); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 34 / 92 Threads Spawning and joining threads and memory model

All memory operations in the thread that spawns a thread are visible in the spawned thread All memory operations in a thread are visible after joining in the thread that joins it

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 35 / 92 Threads thread constructor’s parameters

The thread constructor defined as

template explicit thread (Fn&& fn, Args&&... args);

The arguments are, however, copied when passed to fn Any exceptions thrown at copies are within the “parent” thread that is spawning the new thread To get move or reference semantics, a move, ref, or cref wrappers can be used

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 36 / 92 100000

Threads Sharing via thread constructor parameters

now sharing is via a reference parameter passed to the thread constructor

void psum(int& acc, const vector& data, int i, int count) { acc= accumulate(data.begin()+ i, data.begin()+i+ count,0); }

int main() { vector data(100000,1); vector accs(10);

vector ts; for(inti=0; i< 10; i++){ ts.push_back(thread( psum, std::ref(accs[i]), std::cref(data), i*10000, 10000 )); } for(auto&t : ts) t.join(); cout << accumulate(accs.begin(), accs.end(),0); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 37 / 92 Threads Sharing via thread constructor parameters

now sharing is via a reference parameter passed to the thread constructor

void psum(int& acc, const vector& data, int i, int count) { acc= accumulate(data.begin()+ i, data.begin()+i+ count,0); }

int main() { vector data(100000,1); vector accs(10);

vector ts; for(inti=0; i< 10; i++){ ts.push_back(thread( psum, std::ref(accs[i]), std::cref(data), i*10000, 10000 )); } 100000 for(auto&t : ts) t.join(); cout << accumulate(accs.begin(), accs.end(),0); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 37 / 92 Exception object’s lifetime is an issue Assume thread A constructs an exception object and somehow passes thread B a reference to the exception object By the time thread B catches and handles the exception, thread A may already be finished Lifetime can of course be managed if the type of the exception is known, but a generic mechanism is very useful

Threads About threads and exceptions

We know that unhandled exceptions on a thread terminate a program What if there is no sensible way to handle an exception, but terminating is not acceptable either? How to transfer an exception to another thread?

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 38 / 92 Threads About threads and exceptions

We know that unhandled exceptions on a thread terminate a program What if there is no sensible way to handle an exception, but terminating is not acceptable either? How to transfer an exception to another thread?

Exception object’s lifetime is an issue Assume thread A constructs an exception object and somehow passes thread B a reference to the exception object By the time thread B catches and handles the exception, thread A may already be finished Lifetime can of course be managed if the type of the exception is known, but a generic mechanism is very useful

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 38 / 92 exception: vector

Threads About threads and exceptions: Example

vector exceptions; mutex mut;

void worker() { try { vector().at(0); } // something that throws catch(...) { lock_guard lock(mut); exceptions.push_back(std::current_exception()); } }

int main() { thread t(worker); t.join(); for(auto& eptr : exceptions) { try { if (eptr != nullptr) std::rethrow_exception(eptr); } catch(const std::exception& e) { std::cout << "exception: " << e.what() << std::endl; } } } Propagating exceptions from threads must be done manually current_exception obtains a shared pointer exception_ptr to the caught exception; the pointer can be passed around by copy and the exception stays alive; rethrow_exception(eptr) rethrows the exception pointed to by eptr.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 39 / 92 Threads About threads and exceptions: Example

vector exceptions; mutex mut;

void worker() { try { vector().at(0); } // something that throws catch(...) { lock_guard lock(mut); exceptions.push_back(std::current_exception()); } }

int main() { thread t(worker); t.join(); for(auto& eptr : exceptions) { try { if (eptr != nullptr) std::rethrow_exception(eptr); } catch(const std::exception& e) { std::cout << "exception: " << e.what() << std::endl; } } exception: vector } Propagating exceptions from threads must be done manually current_exception obtains a shared pointer exception_ptr to the caught exception; the pointer can be passed around by copy and the exception stays alive; rethrow_exception(eptr) rethrows the exception pointed to by eptr.

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 39 / 92 Synchronizing threads with mutexes Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 40 / 92 Synchronizing threads with mutexes Protecting shared data

C++ offers a handful of ways to protect access to shared data Mutexes Locks (wrappers over mutexes) Atomics

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 41 / 92 Locking and data races: all lock and unlock operations on the same mutex are totally ordered prior unlock on the same mutex synchronizes-with a lock locking introduces memory fences

Synchronizing threads with mutexes Mutex

MUTual EXclusion object A thread acquires ownership of a mutex by locking and releases it by unlocking Ownership is exclusive; no two threads can simultaneously own the same mutex member functions: lock() — block until mutex available, then acquire ownership (lock) and continue try_lock() — lock if available, return false if not unlock() — release ownership UB if not locked by the current thread A thread must not own the mutex prior to calling lock or try_lock unless the mutex is a recursive_mutex Destroying a locked mutex leads to undefined behavior

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 42 / 92 Synchronizing threads with mutexes Mutex

MUTual EXclusion object A thread acquires ownership of a mutex by locking and releases it by unlocking Ownership is exclusive; no two threads can simultaneously own the same mutex member functions: lock() — block until mutex available, then acquire ownership (lock) and continue try_lock() — lock if available, return false if not unlock() — release ownership UB if not locked by the current thread A thread must not own the mutex prior to calling lock or try_lock unless the mutex is a recursive_mutex Destroying a locked mutex leads to undefined behavior Locking and data races: all lock and unlock operations on the same mutex are totally ordered prior unlock on the same mutex synchronizes-with a lock locking introduces memory fences

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 42 / 92 acc = 385 occurred 1000 times

Synchronizing threads with mutexes Mutex example

int acc=0; mutex acc_mutex;

void acc_square(int x) { int tmp=x* x; acc_mutex.lock(); acc += tmp; acc_mutex.unlock(); }

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.push_back(thread(acc_square, i)); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 43 / 92 Synchronizing threads with mutexes Mutex example

int acc=0; mutex acc_mutex; acc = 385 occurred 1000 times void acc_square(int x) { int tmp=x* x; acc_mutex.lock(); acc += tmp; acc_mutex.unlock(); }

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.push_back(thread(acc_square, i)); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 43 / 92 Synchronizing threads with mutexes Kinds of mutexes

recursive_mutex can be locked repeatedly by the same thread released when unlocked equally many times timed_mutex only wait for a locked mutex for a certain period of time, then give up shared_timed_mutex, shared_recursive_mutex two kinds of access, shared and exclusive useful when simultaneous read access can be given to many, but when writing must be exclusive

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 44 / 92 Results of a few runs

value = 2350 value = 2090 value = 1600 value = 1990 value = 2240

Synchronizing threads with mutexes try_lock, recursive_mutex example

class counter{ recursive_mutex mut; intc=0; public: void tick() { mut.lock(); ++c; mut.unlock(); } void tickManyIfCan(int n) { if (mut.try_lock()) { while (n-->0) tick(); mut.unlock(); } } int value() { return c; } };

void task(counter& ctr) { for(inti=0; i<100; i++){ pause_thread_ms(1); ctr.tickManyIfCan(10); } }

int main() { counter ctr; thread t1(task, ref(ctr)), t2(task, ref(ctr)), t3(task, ref(ctr)); t1.join(); t2.join(); t3.join(); cout <<"\nvalue = " << ctr.value(); } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 45 / 92 Synchronizing threads with mutexes try_lock, recursive_mutex example

class counter{ recursive_mutex mut; intc=0; public: Results of a few runs void tick() { mut.lock(); ++c; mut.unlock(); } void tickManyIfCan(int n) { value = 2350 if (mut.try_lock()) { value = 2090 while (n-->0) tick(); value = 1600 mut.unlock(); value = 1990 } value = 2240 } int value() { return c; } };

void task(counter& ctr) { for(inti=0; i<100; i++){ pause_thread_ms(1); ctr.tickManyIfCan(10); } }

int main() { counter ctr; thread t1(task, ref(ctr)), t2(task, ref(ctr)), t3(task, ref(ctr)); t1.join(); t2.join(); t3.join(); cout <<"\nvalue = " << ctr.value(); } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 45 / 92 #successes = 21 #successes = 21 #successes = 20

Synchronizing threads with mutexes timed_mutex example

timed_mutex mut;

void attempt(int& successes) { if (mut.try_lock_for(chrono::milliseconds(50))) { // now we have the lock ++successes; pause_thread_ms(2); mut.unlock(); } }

void run() { thread ts[100]; int successes=0; for(inti=0; i<100; ++i) ts[i]= thread(attempt, ref(successes)); for(auto&t : ts) t.join(); cout << "#successes = " << successes << endl; }

int main() { run(); run(); run(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 46 / 92 Synchronizing threads with mutexes timed_mutex example

timed_mutex mut;

void attempt(int& successes) { if (mut.try_lock_for(chrono::milliseconds(50))) { // now we have the lock ++successes; pause_thread_ms(2); #successes = 21 mut.unlock(); #successes = 21 } #successes = 20 }

void run() { thread ts[100]; int successes=0; for(inti=0; i<100; ++i) ts[i]= thread(attempt, ref(successes)); for(auto&t : ts) t.join(); cout << "#successes = " << successes << endl; }

int main() { run(); run(); run(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 46 / 92 AlternativeReader A 0.094351result Reader A 3005.2 Reader A 0.0759884005.2 ReaderWriter AC 3005.26008.22 ReaderWriter AB 4005.27008.53 Writer B 5009.25 Writer C 7014.45

Synchronizing threads with mutexes shared_mutex example

std::shared_mutex s_mut; timer tm;

void reading(const string& data, int secs) { s_mut.lock_shared(); pause_thread_s(secs); alang::logl("Reader ", data,"", tm.elapsed()); s_mut.unlock_shared(); } void writing(string& data, string d, int secs) { s_mut.lock(); pause_thread_s(secs); data= d; s_mut.unlock(); alang::logl("Writer ", d,"", tm.elapsed()); }

int main() { string data= "A"; vector ts; tm.reset(); ts.emplace_back(reading, cref(data),3); ts.emplace_back(reading, cref(data),4);

ts.emplace_back(writing, ref(data), "B",1); ts.emplace_back(writing, ref(data), "C",2);

ts.emplace_back(reading, cref(data),0); for(auto&t : ts) t.join(); } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 47 / 92 Alternative result

Reader A 0.075988 Reader A 3005.2 Reader A 4005.2 Writer B 5009.25 Writer C 7014.45

Synchronizing threads with mutexes shared_mutex example

std::shared_mutex s_mut; timer tm;

void reading(const string& data, int secs) { s_mut.lock_shared(); pause_thread_s(secs); alang::logl("Reader ", data,"", tm.elapsed()); s_mut.unlock_shared(); } void writing(string& data, string d, int secs) { s_mut.lock(); pause_thread_s(secs); data= d; s_mut.unlock(); alang::logl("Writer ", d,"", tm.elapsed()); }

int main() { Reader A 0.094351 string data= "A"; Reader A 3005.2 vector ts; Reader A 4005.2 tm.reset(); Writer C 6008.22 ts.emplace_back(reading, cref(data),3); Writer B 7008.53 ts.emplace_back(reading, cref(data),4);

ts.emplace_back(writing, ref(data), "B",1); ts.emplace_back(writing, ref(data), "C",2);

ts.emplace_back(reading, cref(data),0); for(auto&t : ts) t.join(); } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 47 / 92 Synchronizing threads with mutexes shared_mutex example

std::shared_mutex s_mut; timer tm;

void reading(const string& data, int secs) { s_mut.lock_shared(); pause_thread_s(secs); alang::logl("Reader ", data,"", tm.elapsed()); s_mut.unlock_shared(); } void writing(string& data, string d, int secs) { s_mut.lock(); pause_thread_s(secs); data= d; s_mut.unlock(); alang::logl("Writer ", d,"", tm.elapsed()); }

int main() { AlternativeReader A 0.094351result string data= "A"; Reader A 3005.2 vector ts; Reader A 0.0759884005.2 tm.reset(); ReaderWriter AC 3005.26008.22 ts.emplace_back(reading, cref(data),3); ReaderWriter AB 4005.27008.53 ts.emplace_back(reading, cref(data),4); Writer B 5009.25 Writer C 7014.45 ts.emplace_back(writing, ref(data), "B",1); ts.emplace_back(writing, ref(data), "C",2);

ts.emplace_back(reading, cref(data),0); for(auto&t : ts) t.join(); } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 47 / 92 Synchronizing threads with mutexes Locking and RAII

With lock() – unlock() pairs, some care needed so that unlock happens even in the case of exceptions int c; mutex cm; ... cm.lock(); foo(c); // exception? cm.unlock();

The standard library offers convenient “mutex wrappers” that unlock in the destructor int c; mutex cm; ... { lock_guard lock(cm); // mutex locked here foo(c); } // unlocked here when lock destroyed

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 48 / 92 Synchronizing threads with mutexes std::lock and lock_guard

It is possible to “adopt” an already locked lock container a, b; // assume both have mutex member variable { std::lock(a.mutex, b.mutex); lock_guard l1(a.mutex, std::adopt_lock_t); lock_guard l2(b.mutex, std::adopt_lock_t);

a.put(b.get()); // maybe exceptions

} // locks released here

std::lock() takes any number of mutexes, and locks them without deadlocking

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 49 / 92 Synchronizing threads with mutexes unique_lock, a more versatile mutex wrapper

unique_lock can be given more policies adopt_lock_t defer_lock_t try_to_lock_t or a duration or time point for how long or until what time to try to acquire the lock movable, transfers the mutex unlocking and (re)locking possible std::unique_lock g(mutex); ... g.unlock(); ... g.lock();

condition variables wait for a unique_lock

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 50 / 92 Synchronizing threads with mutexes unique_lock example

container a, b; // assume both have mutex member variable { unique_lock l1(a.mutex, defer_lock_t); unique_lock l2(b.mutex, defer_lock_t); lock(l1, l2);

a.put(b.get()); // maybe exceptions

} // locks released here

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 51 / 92 Synchronizing threads with mutexes C++17: scoped_lock wrapper

Easiest mutex wrapper Can wrap many mutexes, constructor locks all withouth deadlock std::mutex m1, m2, m3; ... { std::scoped_lock lock(m1, m2, m3); // critical section } // mutexes released here (in reverse order)

Mutexes can even be of different types Note: no need to specify mutex types, because of C++17’s class template argument deduction same true for lock_guard and unique_lock

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 52 / 92 Synchronizing threads with mutexes Locking/unlocking and the memory model

lock and unlock operations on the same mutex are (totally) ordered Unlocking a mutex synchronizes-with the next locking of the same mutex. All memory operations prior to mutex released (unlock) are visible after the same mutex acquired (lock).

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 53 / 92 Condition variables Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 54 / 92 Condition variables Condition variables

Condition variables allow blocking a thread until notified Waiting methods (of condition_variable class) put the current thread to sleep, to start waiting on a condition variable void wait(unique_lock); void wait_for(unique_lock, chrono::duration<...>); wait_until(unique_lock, chrono::time_point<...>); A thread starts waiting holding a lock, and wakes up holding a lock the unique_lock object must be associated with a mutex, locked by the current thread, when calling any of the wait functions Notification methods wake threads up to check a condition void notify_one(); wake up one (unspecified) thread void notify_all(); wake up all threads The scheduler may choose to wake up threads even when no notify functions have been called spurious wakeup Always check the condition being waited for after wakeup!

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 55 / 92 Handled 0, remains 0 Handled 9, remains 91 Handled 4, remains 87 Handled 3, remains 84 Handled 5, remains 79 Handled 1, remains 78 Handled 6, remains 72 Handled 2, remains 70 Handled 7, remains 63 Handled 8, remains 55

Condition variables Condition variable example

#include

mutex mut; condition_variable cv; int resource=0;

void worker(int amount) { unique_lock lock(mut); while (amount> resource) cv.wait(lock); resource -= amount; cout << "Handled " << amount << ", remains " << resource << endl; }

int main(){ thread ts[10]; for(inti=0; i< 10; ++i) ts[i]= thread(worker, i);

{ unique_lock lock(mut); pause_thread_s(1); resource= 100; cv.notify_all(); } // unlock here

for(auto&t : ts) t.join(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 56 / 92 Condition variables Condition variable example

#include

mutex mut; condition_variable cv; int resource=0;

void worker(int amount) { unique_lock lock(mut); while (amount> resource) cv.wait(lock); resource -= amount; cout << "Handled " << amount << ", remains " << resource << endl; }

int main(){ Handled 0, remains 0 thread ts[10]; Handled 9, remains 91 for(inti=0; i< 10; ++i) ts[i]= thread(worker, i); Handled 4, remains 87 Handled 3, remains 84 { Handled 5, remains 79 unique_lock lock(mut); Handled 1, remains 78 pause_thread_s(1); Handled 6, remains 72 resource= 100; Handled 2, remains 70 cv.notify_all(); Handled 7, remains 63 } // unlock here Handled 8, remains 55

for(auto&t : ts) t.join(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 56 / 92 Handled 0, remains 0 Handled 8, remains 92 Handled 9, remains 83 Handled 1, remains 82

Condition variables Condition variable example 2

#include

mutex mut; condition_variable cv; int resource=0;

void worker(int amount) { unique_lock lock(mut); while (amount> resource) cv.wait(lock); resource -= amount; cout << "Handled " << amount << ", remains " << resource << endl; }

int main(){ thread ts[10]; for(inti=0; i< 10; ++i) ts[i]= thread(worker, i);

{ unique_lock lock(mut); pause_thread_s(1); resource= 100; cv.notify_one(); } // unlock here

pause_thread_s(2); for(auto&t : ts) t.detach(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 57 / 92 Condition variables Condition variable example 2

#include

mutex mut; condition_variable cv; int resource=0;

void worker(int amount) { unique_lock lock(mut); while (amount> resource) cv.wait(lock); resource -= amount; cout << "Handled " << amount << ", remains " << resource << endl; }

int main(){ thread ts[10]; Handled 0, remains 0 for(inti=0; i< 10; ++i) ts[i]= thread(worker, i); Handled 8, remains 92 Handled 9, remains 83 { Handled 1, remains 82 unique_lock lock(mut); pause_thread_s(1); resource= 100; cv.notify_one(); } // unlock here

pause_thread_s(2); for(auto&t : ts) t.detach(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 57 / 92 Thread local variables, call once functions Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 58 / 92 Thread local variables, call once functions Thread local variables

C++11 introduced a new storage class specifier: thread_local Each thread has a distinct instance of a thread local variable A thread local variable is allocated (and possibly initialized) when a thread begins and deallocated when it ends Any namespace scope, block scope, or static member variables can be declared thread_local

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 59 / 92 A0 B0 C0 A1 B1 C1 A2 B2 C2 A3 B3 C3

Thread local variables, call once functions Thread local variable example mutex io_m;

string freshvar(string s) { thread_local unsigned int counter=0; returns+ to_string(counter++); }

void work(string s){ for(inti=0; i<4; i++){ // do something that needs fresh variable names lock_guard lock(io_m); cout << freshvar(s) << endl; } }

int main(){ thread t1(work, "A"); thread t2(work, "B"); thread t3(work, "C"); t1.join(); t2.join(); t3.join();

Jaakko} Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 60 / 92 Thread local variables, call once functions Thread local variable example mutex io_m;

string freshvar(string s) { thread_local unsigned int counter=0; returns+ to_string(counter++); } A0 B0 void work(string s){ C0 for(inti=0; i<4; i++){ A1 // do something that needs fresh variable names B1 lock_guard lock(io_m); C1 cout << freshvar(s) << endl; A2 } B2 } C2 A3 int main(){ B3 thread t1(work, "A"); C3 thread t2(work, "B"); thread t3(work, "C"); t1.join(); t2.join(); t3.join();

Jaakko} Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 60 / 92 Thread local variables, call once functions Call once

Sometimes necessary to execute a piece of code only in one thread E.g., to initialize a resource For this, call_once() provides a solution

template< class Callable, class... Args> void call_once(std::once_flag& flag, Callable&& f, Args&&... args);

The Callablef need not be the same in all threads Each call_once invocation with the same once_flag defines a group f invoked only once per group more precisely: until the flag is set, which takes place when f returns normally, not by throwing threads block on call_once so that only one thread at a time executes, until the flag is set (returning invocation) each invocation synhronizes-with the next the returning invocation syncrhonizes-with all subsequent invocations

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 61 / 92 C initialized CCC

Thread local variables, call once functions call_once example

string* ptr; once_flag flag;

void work(string s) { call_once( flag, [&s]{ ptr= new string(s); cout <

int main() { std::thread t1(work, "A"); std::thread t2(work, "B"); std::thread t3(work, "C");

t1.join(); t2.join(); t3.join(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 62 / 92 Thread local variables, call once functions call_once example

string* ptr; once_flag flag;

void work(string s) { call_once( flag, [&s]{ ptr= new string(s); cout <

int main() { C initialized std::thread t1(work, "A"); CCC std::thread t2(work, "B"); std::thread t3(work, "C");

t1.join(); t2.join(); t3.join(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 62 / 92 Atomics Outline

1 C++ Standardization and concurrency

2 C++ memory model

3 Standard library offerings for concurrency

4 Threads

5 Synchronizing threads with mutexes

6 Condition variables

7 Thread local variables, call once functions

8 Atomics

9 Tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 63 / 92 Atomics std::atomic

atomic x instructs that x should be treated atomically every write and read is indivisible writes and reads happen in program order Accesses synchronize-with each other a store synchronizes-with operations that load the stored value In addition, there are special operations guaranteed to be atomic e.g., “compare-and-swap” building-blocks for lock-free algorithms T cannot be an arbitrary type atomic specialized for integral and pointer types (also shared/weak pointers) atomic can be instantiated with any trivially copyable type essentially PODs Depending on T, atomic variables may be implemented using hardware’s lock-free atomic operations implemented using mutexes

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 64 / 92 Atomics Members of atomic objects

atomic assignment T operator=(T) void store(T, memory_order = std::memory_order_seq_cst) atomic read operator T() const T load( memory_order = std::memory_order_seq_cst) const atomic swap T exchange(T, memory_order = std::memory_order_seq_cst) compare-and-swap bool compare_exchange_weak (T& expected, T desired, ...) bool compare_exchange_strong(T& expected, T desired, ...) (the ... stands for memory order parameters) atomic bit manipulation and arithmetic for those T that they make sense fetch_and, fetch_or, fetch_xor, fetch_add, fetch_sub in addition to anding, oring, etc. the value, returns the old value The operators ++, --, &=, |=, ^=, +=, and -= Most of the same operations exist as non-members as well

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 65 / 92 acc=385: 1000 times

Atomics Example

#include atomic acc(0);

void acc_square(int x) { acc +=x* x; } // += is atomic

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts; for(inti=1; i <= 10; i++) ts.emplace_back(acc_square, i); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; // atomic acc converted to regular int whenever needed }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 66 / 92 Atomics Example

#include atomic acc(0);

void acc_square(int x) { acc +=x* x; } // += is atomic

int main() { map m; acc=385: 1000 times for(intj=0; j<1000; ++j) { acc=0; vector ts; for(inti=1; i <= 10; i++) ts.emplace_back(acc_square, i); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; // atomic acc converted to regular int whenever needed }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 66 / 92 Atomics Compare-and-swap (CAS)

This usually hardware-supported primitive is a key operation for many lock-free algorithms In C++ spelled as compare_exchange_strong and compare_exchange_weak: bool atomic::compare_exchange_weak(T& expected, T desired);

if the atomic’s value == expected (bitwise), then replace it with desired, and return true; otherwise load atomic’s value to expected and return false The canonical use is in a loop, trying for as long as the value is as expected

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 67 / 92 Atomics CAS example

#include // std::cout #include // std::atomic #include // std::thread #include // std::vector

// a simple linked list: struct Node { int value; Node* next; }; std::atomic list_head (nullptr);

void prepend(int val) { Node* oldHead= list_head; Node* newNode= new Node {val, oldHead};

while(!list_head.compare_exchange_weak(oldHead, newNode)) newNode->next= oldHead; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 68 / 92 Atomics Memory orders

Atomic loads and stores can be given a memory order flag:

enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst };

Default is memory_order_seq_cst, which specifies a global total order of memory operations a memory location cannot have multiple simultaneous values all memory operations before an atomic store operation to m on thread A will be visible on thread B by all opearations after a later atomic load of m synchronizes-with relation It often suffices to have weaker reordering guarantees, or even no guarantees E.g., counters, such as shared pointers’ reference counters can be implemented with memory_order_relaxed Not for casual concurrent programming

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 69 / 92 counter 40000000, bad 0

Atomics Relaxed memory order example

std::atomic counter=0, bad=0;

void worker() { for(intn=0; n< 10000000; ++n) { counter.fetch_add(1, std::memory_order_relaxed); } } void observer() { intc=0; while(true){ int cn= counter.load(std::memory_order_relaxed); if (c> cn) bad.fetch_add(1, std::memory_order_relaxed); c= cn; } }

int main() { std::vector v1; double tm; tm= time_ms([&]() { thread(observer).detach(); for(intn=0; n<4; ++n) { v1.emplace_back(worker); } for(auto&t : v1) t.join(); }); std::cout << "counter " << counter << ", bad " << bad << ’\n’; } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 70 / 92 Atomics Relaxed memory order example

std::atomic counter=0, bad=0;

void worker() { for(intn=0; n< 10000000; ++n) { counter.fetch_add(1, std::memory_order_relaxed); } } void observer() { intc=0; while(true){ int cn= counter.load(std::memory_order_relaxed); counter 40000000, bad 0 if (c> cn) bad.fetch_add(1, std::memory_order_relaxed); c= cn; } }

int main() { std::vector v1; double tm; tm= time_ms([&]() { thread(observer).detach(); for(intn=0; n<4; ++n) { v1.emplace_back(worker); } for(auto&t : v1) t.join(); }); std::cout << "counter " << counter << ", bad " << bad << ’\n’; } Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 70 / 92 Tasks Task based parallelism

Programming directly with threads is complex and low-level computations are married to OS threads OS thread creation is typically rather expensive Task-based parallelism abstracts over OS threads Algorithms are decomposed into independent tasks, regardless of how much parallelism is available Allocating tasks to individual processors/cores is a separate problem Task-based parallelism separates what can be run in parallel from what is run in parallel A task-dependency graph specifies the data dependencies between tasks

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 71 / 92 Ideally, futures/promises and async would construct task dependency graphs In current C++, still somewhat married to threads (not specified whether thread pools or threads) The standards committee is actively working towards supporting task-based parallelism composable futures along the lines of .Future

Tasks Task-based parallelism implementation ideas

Thread pools Work stealing tasks initially queued on one thread’s queue, from which idle processors steal tasks that are on blocked threads should be moved back to work stealing-queues tasks should freely migrate between threads ⇒

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 72 / 92 Tasks Task-based parallelism implementation ideas

Thread pools Work stealing tasks initially queued on one thread’s queue, from which idle processors steal tasks that are on blocked threads should be moved back to work stealing-queues tasks should freely migrate between threads ⇒ Ideally, futures/promises and async would construct task dependency graphs In current C++, still somewhat married to threads (not specified whether thread pools or threads) The standards committee is actively working towards supporting task-based parallelism composable futures along the lines of Boost.Future

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 72 / 92 Tasks Futures

Future is a “delayed value”, a promise that eventually there will be a value futuref= async([]{ return hailstone(9780657631) });

Binding an asynchronous computation, a promise of a value, to a future does not block When the promised value is needed, future can be queried This may mean blocking, if the value is not yet available

assert(f.get() ==1);

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 73 / 92 Tasks Promises

A promise object can provide a future object and can later give a value (or an exception) to that future promise future is a kind of one-off communication

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 74 / 92 1

Tasks Example (no concurrency)

long hailstone(long n) { if (n ==1) return1; if (n%2 ==0) return hailstone(n/2); return hailstone(3*n+1); }

int main() { std::promise p; std::futuref= p.get_future();

p.set_value(hailstone(9780657631));

std::cout << f.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 75 / 92 Tasks Example (no concurrency)

long hailstone(long n) { if (n ==1) return1; if (n%2 ==0) return hailstone(n/2); return hailstone(3*n+1); }

int main() { std::promise p; std::futuref= p.get_future();

p.set_value(hailstone(9780657631)); 1 std::cout << f.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 75 / 92 1

Tasks Example (concurrency)

long hailstone(long n) { if (n ==1) return1; if (n%2 ==0) return hailstone(n/2); return hailstone(3*n+1); }

int main() { std::promise p; std::futuref= p.get_future();

std::thread([&](){ p.set_value(hailstone(9780657631)); }).detach();

std::cout << f.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 76 / 92 Tasks Example (concurrency)

long hailstone(long n) { if (n ==1) return1; if (n%2 ==0) return hailstone(n/2); return hailstone(3*n+1); }

int main() { std::promise p; std::futuref= p.get_future();

std::thread([&](){ p.set_value(hailstone(9780657631)); }).detach(); 1 std::cout << f.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 76 / 92 Tasks Promise future communication →

Promise has a shared state (with a future): ready flag evaluated result or exception when ready set_value, set_exception make promise ready: set both the ready flag and value/exception atomically, then unblock threads waiting on promise Promise destruction releases the promise gives up shared state deletes if no futures waiting throws broken_promise if futures waiting Synchronization: set_value and set_exception synchronize-with functions waiting on the shared state e.g., futures’ get and wait functions

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 77 / 92 acc=385: 1000 times

Tasks Reminder: sum of squares with mutexes

int acc=0; mutex acc_mutex;

void acc_square(int x) { int tmp=x* x; acc_mutex.lock(); acc += tmp; acc_mutex.unlock(); }

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.emplace_back(acc_square, i); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 78 / 92 Tasks Reminder: sum of squares with mutexes

int acc=0; mutex acc_mutex; acc=385: 1000 times void acc_square(int x) { int tmp=x* x; acc_mutex.lock(); acc += tmp; acc_mutex.unlock(); }

int main() { map m;

for(intj=0; j<1000; ++j) { acc=0; vector ts;

for(inti=1; i <= 10; i++) ts.emplace_back(acc_square, i); for(auto&t : ts) t.join();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 78 / 92 acc=385: 1000 times

Tasks Sum of squares with futures

int square(int x) { returnx* x; }

int main() { map m;

for(intj=0; j<1000; ++j) { int acc=0; vector> fs; for(inti=1; i <= 10; i++) fs.push_back(async(square, i));

for(auto&f : fs) acc += f.get();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 79 / 92 Tasks Sum of squares with futures

int square(int x) { returnx* x; } acc=385: 1000 times int main() { map m;

for(intj=0; j<1000; ++j) { int acc=0; vector> fs; for(inti=1; i <= 10; i++) fs.push_back(async(square, i));

for(auto&f : fs) acc += f.get();

if (m.count(acc) ==0) m[acc]=1; else m[acc]++; }

for(auto kv : m) cout << "acc=" << kv.first <<":" << kv.second << " times\n"; }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 79 / 92 Tasks The future class

template class future; T is the type of the result (can be void) T must be MoveConstructible future cannot be copied, only moved! void wait() const; blocks until future’s result available there’s also wait_for() and wait_until() T get(); blocks until result available, then returns it get can only be called once! bool valid() const; has the value not been queried yet? (with get or share) does the future have a shared state std::shared_future share(); gets the value and moves the future’s value to a shared future

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 80 / 92 Tasks shared_future

the same as future, except that can be copied, so many threads can wait for the same shared state T must be CopyConstructible must construct one shared_future object for each context where the shared state needed

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 81 / 92 Expecting 1 Expecting 2 Expecting 3 Expecting 4 Expecting 0 Expecting 5 Expecting 6 Found 7 Expecting 9 Expecting 8

Tasks Shared future example

void detect(int x, shared_future y) { static std::mutex iom; if (x == y.get()) { unique_lock l(iom); cout << "Found " <

int main() { promise p; shared_futuref= p.get_future().share();

for(intj=0; j<10; ++j) thread(detect, j, f).detach();

pause_thread_s(1); p.set_value(7); pause_thread_s(1); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 82 / 92 Tasks Shared future example

void detect(int x, shared_future y) { static std::mutex iom; if (x == y.get()) { unique_lock l(iom); cout << "Found " < p; Expecting 4 shared_futuref= p.get_future().share(); Expecting 0 Expecting 5 for(intj=0; j<10; ++j) thread(detect, j, f).detach(); Expecting 6 Found 7 pause_thread_s(1); Expecting 9 p.set_value(7); Expecting 8 pause_thread_s(1); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 82 / 92 Tasks Futures, promises and exceptions

A promise can pass a value, or an exception (concretely an exception_ptr), via the shared state to a future try { p.set_value(foo()); // foo might throw } catch{ p.set_exception(current_exception()); }

Exception manifests at future::get() try { auto val= my_future.get(); } catch (std::exception& e) { cout << "Exception: " << e.what(); }

If get never called, exception does not manifest

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 83 / 92 Tasks Futures, promises, packaged tasks, and async

Promise and future are the basic building blocks A packaged task wraps any function object with a promise-future channel a future object is attached to the wrapped function object calling the wrapped function object resolves the promise with set_value an exception within the function object sets the exception with set_exception async abstracts over a packaged task

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 84 / 92 Too difficult

Tasks Packaged task example

#include #include using namespace std;

bool is_prime(int n) { if (n==1) return false; if (n==2) return true; throw std::logic_error("Too difficult"); }

int main() { packaged_task task(&is_prime); futuref= task.get_future();

thread(move(task),3).detach();

try { cout << f.get(); } catch (exception& e) { cout << e.what(); } }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 85 / 92 Tasks Packaged task example

#include #include using namespace std;

bool is_prime(int n) { if (n==1) return false; if (n==2) return true; throw std::logic_error("Too difficult"); }

int main() { packaged_task task(&is_prime); futuref= task.get_future(); Too difficult

thread(move(task),3).detach();

try { cout << f.get(); } catch (exception& e) { cout << e.what(); } }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 85 / 92 Too difficult

Tasks Same with async

#include #include using namespace std;

bool is_prime(int n) { if (n==1) return false; if (n==2) return true; throw std::logic_error("Too difficult"); }

int main() {

try { cout << async(&is_prime,3).get(); } catch (exception& e) { cout << e.what(); } }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 86 / 92 Tasks Same with async

#include #include using namespace std;

bool is_prime(int n) { if (n==1) return false; if (n==2) return true; throw std::logic_error("Too difficult"); }

int main() {

Too difficult

try { cout << async(&is_prime,3).get(); } catch (exception& e) { cout << e.what(); } }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 86 / 92 Tasks async details

Two overloads template std::future::type> async(Function&& f, Args&&... args);

template< class Function, class... Args> std::future::type> async(std::launch policy, Function&& f, Args&&... args);

Launch policy launch::async execute on a new thread (with thread locals initialized) launch::deferred execute lazily: in the same thread, but only at get() call The first form behaves as if launch policy is async|deferred which lets the implementation choose If async’s result bound to a temporary, blocks at that temporary’s destructor

async(launch::async,&foo); // blocks for foo to finish here async(launch::async,&bar); // blocks for bar to finish here

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 87 / 92 Tasks Future of futures

Current C++ futures rather cumbersome cannot block on one of several futures to become ready, must block on one particular future no composable tasks: not easy to set up data flows, where one task to wait for completion and result of the other Key feature missing: future::then Instead of blocking with get until a future is ready, register a continuation, to be run when future is ready No blocking Not clear what goes into C++2a

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 88 / 92 Tasks Example: then

#include using namespace std; future f1= async([]{ return hailstone(12342342); }); future f2= f1.then( [](future f) { return to_string(f.get()); } );

cout << f2.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 89 / 92 Tasks Example: then (?)

#include using namespace std; future f1= async([]{ return hailstone(12342342); }); future f2= f1.then( [](int i) { return to_string(i); } };

cout << f2.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 90 / 92 Tasks Example: then (?)

#include using namespace std; future f1= async([]{ return hailstone(12342342); }); future f2= f1.then(to_string);

cout << f2.get(); }

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 91 / 92 Tasks Future combinators when_any, when_all

Wait until (at least) one of a set of futures is ready

auto f_or_g= when_any(async(f), async(g)); f_or_g.then([](future f) { ... });

Wait until all futures in a given set are ready

future, future>> f_and_g= when_all(async(f), async(g));

future futf, futg; tie(futf, futg)= f_and_g.get(); // blocks cout << futf.get()+ futg.get(); // does not block

Jaakko Järvi (University of Bergen) Concurrent Programming in Standard C++ <2018-10-02 Tue> 92 / 92 Task System Implementations

Jaakko Järvi

University of Bergen

<2018-10-02 Tue>

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 1 / 61 Outline

1 About performance

2 Task System

3 Communication between tasks

4 Deadlock and task systems

5 Task stealing

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 2 / 61 do not block!

About performance What kind of speedup can be obtained with threads?

GFLOPS in a typical computer Unit GFLOPS Prog. technology GPU 2250 CUDA, OpenGL, OpenCL, DirectX Vectorization unit 500 Autovectorization, intrinsics, OpenCL Multi-threading 50 C++, Intel’s TBB, Apple’s GCD Serial 7 C++ From Sean Parent

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 3 / 61 About performance What kind of speedup can be obtained with threads?

GFLOPS in a typical computer Unit GFLOPS Prog. technology GPU 2250 CUDA, OpenGL, OpenCL, DirectX Vectorization unit 500 Autovectorization, intrinsics, OpenCL Multi-threading 50 C++, Intel’s TBB, Apple’s GCD Serial 7 C++ From Sean Parent do not block!

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 3 / 61 About performance Amdahl’s Law

1 S(n) = (1 p) + p − n S is the theoretical speedup n is the factor of increase in resources (number of cores) p is the portion of the program that benefits from the resources Unit is time

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 4 / 61 Observation: diminishing returns Note: any waiting because of thread synchronization adds to the serial part minimize syncrhonization! ⇒

About performance Amdahl’s Law, Example

Example: program fetches data (10%) and computes (90%); fetching is serial, computation can be parallelized n S(n) formula S(n) 1 1 ( . )+ 0.9 1.00 1 0 9 1 − 1 2 ( . )+ 0.9 1.82 1 0 9 2 − 1 4 ( . )+ 0.9 3.08 1 0 9 4 − 1 8 ( . )+ 0.9 4.71 1 0 9 8 − 1 16 ( . )+ 0.9 6.40 1 0 9 16 − 1 32 ( . )+ 0.9 7.80 1 0 9 32 − 1 64 ( . )+ 0.9 8.77 1 0 9 64 − 1 1000000 ( . )+ 0.9 10.00 1 0 9 1000000 − 1 10000000 (1 0.9)+ 0.9 10.00 − 10000000

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 5 / 61 About performance Amdahl’s Law, Example

Example: program fetches data (10%) and computes (90%); fetching is serial, computation can be parallelized n S(n) formula S(n) 1 1 ( . )+ 0.9 1.00 1 0 9 1 − 1 2 ( . )+ 0.9 1.82 1 0 9 2 − 1 4 ( . )+ 0.9 3.08 1 0 9 4 − 1 8 ( . )+ 0.9 4.71 1 0 9 8 − 1 16 ( . )+ 0.9 6.40 1 0 9 16 − 1 32 ( . )+ 0.9 7.80 1 0 9 32 − 1 64 ( . )+ 0.9 8.77 1 0 9 64 − 1 1000000 ( . )+ 0.9 10.00 1 0 9 1000000 − 1 10000000 (1 0.9)+ 0.9 10.00 − 10000000 Observation: diminishing returns Note: any waiting because of thread synchronization adds to the serial part minimize syncrhonization! ⇒ Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 5 / 61 About performance Amdahl’s law

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 6 / 61 Task System Outline

1 About performance

2 Task System

3 Communication between tasks

4 Deadlock and task systems

5 Task stealing

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 7 / 61 We closely follow Sean Parent’s portable task system and his talks at various C++ forums Code almost verbatim

Task System Task System?

We just learned about the use of async Next: how to implement one

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 8 / 61 Task System Task System?

We just learned about the use of async Next: how to implement one

We closely follow Sean Parent’s portable task system and his talks at various C++ forums Code almost verbatim

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 8 / 61 Task System Some includes and using declarations

Bunch of definitions that the examples assume

#include #include #include #include #include #include #include #include #include #include

using std::forward; using std::move; using std::function; using std::thread; using std::string; using std::future;

using lock_t= std::unique_lock;

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 9 / 61 Task System We need something to compute, concurrently

bool is_prime(long num) { long limit= sqrt(num); if (num<2) return false; for(longi=2; i<=limit ; i++){ if (num%i ==0) return false; } return true; }

is_prime(2) == true is_prime(4) == false

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 10 / 61 Task System A notification queue (implemented as a monitor)

class notification_queue{ std::deque> _q; std::mutex _mutex; std::condition_variable _ready; public: void pop(function& f) { <> }

template void push(F&& f) { <> } };

a queue for void () functions pop blocks if queue is empty Note: <>

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 11 / 61 notice that f is an “out” parameter front accesses the first element, pop_front discards it functions are moved, not copied tasks will not get duplicated

Task System Pop

void pop(function& f) { { lock_t lock(_mutex); while (_q.empty()) _ready.wait(lock); f= move(_q.front()); _q.pop_front(); } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 12 / 61 Task System Pop

void pop(function& f) { { lock_t lock(_mutex); while (_q.empty()) _ready.wait(lock); f= move(_q.front()); _q.pop_front(); } }

notice that f is an “out” parameter front accesses the first element, pop_front discards it functions are moved, not copied tasks will not get duplicated

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 12 / 61 forward(f) is C++ trickery that says: pass f through as exactly the same kind of reference it was received const as const rvalues as rvalues (temporaries) lvalues as lvalues

Task System Push

template void push(F&& f) { { lock_t lock(_mutex); _q.emplace_back(forward(f)); } _ready.notify_one(); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 13 / 61 Task System Push

template void push(F&& f) { { lock_t lock(_mutex); _q.emplace_back(forward(f)); } _ready.notify_one(); }

forward(f) is C++ trickery that says: pass f through as exactly the same kind of reference it was received const as const rvalues as rvalues (temporaries) lvalues as lvalues

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 13 / 61 Task System Notification Queue 1

class notification_queue{ std::deque> _q; std::mutex _mutex; std::condition_variable _ready; public: void pop(function& f) { { lock_t lock(_mutex); while (_q.empty()) _ready.wait(lock); f= move(_q.front()); _q.pop_front(); } }

template void push(F&& f) { { lock_t lock(_mutex); _q.emplace_back(forward(f)); } _ready.notify_one(); } };

reminder: we defined lock_t is std::unique_lock

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 14 / 61 Task System Some helpers

The task-system-utilities.hpp header provides the headers and using declarations above, plus other necessary headers log, logl sleep functions sleep_s(int), sleep_ms(int) sleep_random_ms(int limit) timer class with void reset(); and double elapsed(); methods function time_ms(f) to time execution of f() number_of_threads() function retrieves an “optimal” number of threads for the current computer

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 15 / 61 Task System number_of_threads()

The standard function thread::hardware_concurrency may(!) provide the number of cores in the current system thread::hardware_concurrency() is only a hint We wrap it, in case it does not return a useful value unsigned int number_of_threads() { return std::min( 32u, std::max(1u, thread::hardware_concurrency())); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 16 / 61 Hello, World!

Task System Test notification queue 1: nq1-example1.cpp

#include "task-system-utilities.hpp"

<>

int main() { notification_queue q;

q.push([]{ std::cout << "Hello, "; });

thread t([&]{ function f; q.pop(f); f(); q.pop(f); f(); });

q.push([]{ std::cout << "World!"; }); t.join(); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 17 / 61 Task System Test notification queue 1: nq1-example1.cpp

#include "task-system-utilities.hpp"

<>

int main() { notification_queue q;

q.push([]{ std::cout << "Hello, "; });

thread t([&]{ function f; q.pop(f); f(); q.pop(f); f(); }); Hello, World!

q.push([]{ std::cout << "World!"; }); t.join(); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 17 / 61 Task System Task system 1

class task_system{ const unsigned int _nthreads; std::vector _threads; notification_queue _q;

void run() { while(true){ function f; _q.pop(f); f(); } }

public: <> <> <> };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 18 / 61 Task System Constructor, destructor, and async

Constructor

task_system(int nthreads=0) : _nthreads(nthreads>0? nthreads : number_of_threads()) { for(unsigned intn=0; n< _nthreads; ++n) { _threads.emplace_back([&]{ run(); }); } }

Destructor

~task_system() { for(thread&t: _threads) t.join(); }

async

template void async(F&& f) { _q.push(forward(f)); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 19 / 61 oneanother possible possible result: result:

Hello,World!Hello, World! World!

After output, hangs. . . Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> <>

int main() { task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 another possible result:

World!Hello,Hello, World!

After output, hangs. . . Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> one possible result: <> Hello, World! int main() { task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 another possible result:

World!Hello,

After output, hangs. . . Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> oneanother possible possible result: result: <> Hello, World! int main() { World! task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 After output, hangs. . . Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> oneanother possible possible result: result: <> Hello,World!Hello, World! int main() { World! task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> oneanother possible possible result: result: <> Hello,World!Hello, World! int main() { World! task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

After output, hangs. . .

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 Task System Test task system 1: ts1-example1.cpp

#include "task-system-utilities.hpp"

<> oneanother possible possible result: result: <> Hello,World!Hello, World! int main() { World! task_system ts; ts.async([]{ std::cout << "Hello, " << std::endl; }); ts.async([]{ std::cout << "World!" << std::endl; }); }

After output, hangs. . . Need a way to shut down the task system 1 Destructor tells the notification queue to be done 2 All waiting pop calls are woken up. 3 Queue’s pop no longer blocks on empty, but instead returns false 4 Task system’s run methods stop if pop returns false

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 20 / 61 Task System Notification queue 2

class notification_queue{ std::deque> _q; std::mutex _mutex; std::condition_variable _ready; bool _done= false;

public: void done() { { lock_t lock(_mutex); _done= true;} _ready.notify_all(); }

bool pop(function& f) { lock_t lock(_mutex); while (_q.empty()&&!_done) _ready.wait(lock); if (_q.empty()) return false; f= move(_q.front()); _q.pop_front(); return true; }

template void push(F&& f) { { lock_t lock(_mutex); _q.emplace_back(forward(f)); } _ready.notify_one(); } };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 21 / 61 Task System Task system 2

class task_system{ const unsigned int _nthreads; std::vector _threads; notification_queue _q;

void run() { while(true){ function f; if(!_q.pop(f)) break; // ! f(); } }

public: task_system(int nthreads=0) : _nthreads(nthreads>0? nthreads : number_of_threads()) { for(unsigned intn=0; n< _nthreads; ++n) { _threads.emplace_back([&]{ run(); }); } } ~task_system() { _q.done(); for(thread&t: _threads) t.join(); }

template void async(F&& f) { _q.push(forward(f)); } };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 22 / 61 (one possible result)

Hello, World!

no longer hangs

Task System Test task system 2: ts2-example1.cpp

#include "task-system-utilities.hpp"

<> <>

int main() { task_system ts; ts.async([]{ std::cout << "Hello, "; }); ts.async([]{ std::cout << "World!"; }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 23 / 61 Task System Test task system 2: ts2-example1.cpp

#include "task-system-utilities.hpp"

<> (one possible result) <> Hello, World! int main() { task_system ts; ts.async([]{ std::cout << "Hello, "; }); ts.async([]{ std::cout << "World!"; }); }

no longer hangs

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 23 / 61 Task System Choosing the number of threads

Number of threads allocated for the task system affects the speedup What is the best number? Too few threads cores sit idle, while tasks sit on the queue ⇒ Too many threads OS costs of opening, closing, managing threads go up ⇒

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 24 / 61 time 185.78 using 1 threads time 95.0063 using 2 threads time 49.2618 using 4 threads time 73.3565 using 8 threads time 84.6876 using 16 threads time 85.8173 using 32 threads time 85.1905 using 64 threads time 87.6551 using 128 threads time 86.9235 using 256 threads time 84.6692 using 512 threads time 152.756 using 1024 threads time 304.862 using 2048 threads time 67.7195 using 8 threads

Task System Example

#include "task-system-utilities.hpp" <> <> <> std::atomic found=0; int count_primes(long n) { int count=0; while (n-->1){ if (is_prime(n)) ++count; } return count; }

const int ntasks= 4096; void test(int nthreads) { double time; timer tmr; { task_system ts(nthreads); for(inti=0; i

int main() { for(intn=1; n <= 2048; n *=2) test(n); test(thread::hardware_concurrency()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 25 / 61 Task System Example

time 185.78 using 1 threads #include "task-system-utilities.hpp" time 95.0063 using 2 threads <> time 49.2618 using 4 threads <> time 73.3565 using 8 threads <> time 84.6876 using 16 threads std::atomic found=0; time 85.8173 using 32 threads int count_primes(long n) { time 85.1905 using 64 threads int count=0; while (n-->1){ if (is_prime(n)) ++count;time } 87.6551 using 128 threads return count; time 86.9235 using 256 threads } time 84.6692 using 512 threads time 152.756 using 1024 threads const int ntasks= 4096; time 304.862 using 2048 threads void test(int nthreads) { time 67.7195 using 8 threads double time; timer tmr; { task_system ts(nthreads); for(inti=0; i

int main() { for(intn=1; n <= 2048; n *=2) test(n); test(thread::hardware_concurrency()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 25 / 61 Communication between tasks Outline

1 About performance

2 Task System

3 Communication between tasks

4 Deadlock and task systems

5 Task stealing

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 26 / 61 Communication between tasks Result of a task

Thus far we have had communcation between tasks Tasks did not return values Tasks that do return a value afford a simple way of communication async schedules a packaged task, and returns a future the future becomes ready when task completes Reasoning with tasks that return futures is relatively simple if a task depends on some future, it cannot progress before the value is available there is no earlier (inconsistent) value of a shared variable that might be used by accident Dataflow is explicit Operations become non-blocking (possibly) easier to exploit parallelism ⇒

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 27 / 61 Communication between tasks Aside: Correspondence between monitors and task parallelism

Parent’s observation: monitor, whose operations are synchronized and therefore can block, can be turned into a non-blocking operations with a task queue

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 28 / 61 Both set and get may block, making client wait Lock held an unbounded amount of time, hash computation takes time proportional to the length of key

Communication between tasks Example

class database{ std::mutex _mutex; std::unordered_map _map; public: void set(string key, string value) { lock_t lock(_mutex); _map.emplace(move(key), move(value)); }

auto get(const string& key) -> string { lock_t lock(_mutex); try { return _map.at(key); } catch (...) { return string("not found"); } } };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 29 / 61 Communication between tasks Example

class database{ std::mutex _mutex; std::unordered_map _map; public: void set(string key, string value) { lock_t lock(_mutex); _map.emplace(move(key), move(value)); }

auto get(const string& key) -> string { lock_t lock(_mutex); try { return _map.at(key); } catch (...) { return string("not found"); } } };

Both set and get may block, making client wait Lock held an unbounded amount of time, hash computation takes time proportional to the length of key

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 29 / 61 Communication between tasks Testing blocking database

#include "task-system-utilities.hpp"

<>

int main() { database db; thread t1([&]{db.set("Bergen", "Norway");}), t2([&]{db.set("Turku", "Finland");}), t3([&]{db.set("College Station", "USA");}); logl(db.get("Turku")); logl(db.get("Turku")); t1.join(); t2.join(); t3.join(); } not found Finland

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 30 / 61 Communication between tasks Nonblocking

<> <> class database{ std::unordered_map _map; // declaration order matters task_system _ts{1}; // One thread => sequential tasks public: void set(string key, string value) { _ts.async([&, key= move(key), value= move(value)]{ _map.insert_or_assign(move(key), move(value)); }); } auto get(string key) -> std::future{ return _ts.async([&, key= move(key)]{ try { return _map.at(key); } catch (...) { return string("not found"); } }); } <> };

Caution! tasks store references to _map, and must not outlive _map. Here OK because _ts will be destructed before _map per C++ destruction order Note that the system is single threaded! Serial/sequential queue (no need for lock on the database).

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 31 / 61 Interactivity for “free” But of course extra overhead queue management constructing , temporary objects

Communication between tasks Set many

void set_many(std::vector> v) { _ts.async([&, v= move(v)]() { _map.insert(std::make_move_iterator(v.begin()), std::make_move_iterator(v.end())); }); }

Now set and get return (practically) immediately Still can be congestion, but work under lock is constant Even potentially expensive operations (set_many) return almost immediately here, possibly must copy the vector v, but time only depends on length of the vector, not whether there are other tasks accessing the database Client can choose to wait an issued task to finish, or do other things instead

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 32 / 61 Communication between tasks Set many

void set_many(std::vector> v) { _ts.async([&, v= move(v)]() { _map.insert(std::make_move_iterator(v.begin()), std::make_move_iterator(v.end())); }); }

Now set and get return (practically) immediately Still can be congestion, but work under lock is constant Even potentially expensive operations (set_many) return almost immediately here, possibly must copy the vector v, but time only depends on length of the vector, not whether there are other tasks accessing the database Client can choose to wait an issued task to finish, or do other things instead Interactivity for “free” But of course extra overhead queue management constructing thunks, temporary objects

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 32 / 61 not found Norway Norway Norway Norway Norway Independent Norway Norway Norway

Communication between tasks Testing non-blocking database

#include "task-system-utilities.hpp" <>

int main() { database db;

thread t1([&db]{ std::vector> res; for(inti=0; i<10; ++i) { sleep_ms(1); res.emplace_back(db.get("Bergen")); } for(auto&r : res) logl(r.get()); });

thread t2([&db]{ for(inti=0; i<100; ++i) { sleep_ms(4); db.set("Bergen", "Independent"); } });

thread t3([&db]{ for(inti=0; i<100; ++i) { sleep_ms(1); db.set("Bergen", "Norway"); } });

t1.join(); t2.join(); t3.join(); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 33 / 61 Communication between tasks Testing non-blocking database

not found #include "task-system-utilities.hpp" Norway <> Norway Norway int main() { Norway database db; Norway Independent thread t1([&db]{ Norway std::vector> res; Norway for(inti=0; i<10; ++i) Norway { sleep_ms(1); res.emplace_back(db.get("Bergen")); } for(auto&r : res) logl(r.get()); });

thread t2([&db]{ for(inti=0; i<100; ++i) { sleep_ms(4); db.set("Bergen", "Independent"); } });

thread t3([&db]{ for(inti=0; i<100; ++i) { sleep_ms(1); db.set("Bergen", "Norway"); } });

t1.join(); t2.join(); t3.join(); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 33 / 61 T1=0.025144, T2=0.02587, TG=98.0445

Communication between tasks Testing set_many

#include "task-system-utilities.hpp" <>

int main() { timer tmg, tml; double time1, time2, timeg; { std::vector> p, q;

for(inti=0; i<100 ’000; ++i) p.emplace_back(std::to_string(i)+"P", std::to_string(i+1)); for(inti=0; i<100 ’000; ++i) q.emplace_back(std::to_string(i)+"Q", std::to_string(i+1)); { database db;

tmg.reset(); tml.reset(); db.set_many(move(p)); time1= tml.elapsed();

db.set_many(move(q)); time2= tml.elapsed(); } timeg= tmg.elapsed(); logl("T1=", time1, ", T2=", time2, ", TG=", timeg); } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 34 / 61 Communication between tasks Testing set_many

#include "task-system-utilities.hpp" <>

int main() { timer tmg, tml; double time1, time2, timeg; { std::vector> p, q;

for(inti=0; i<100 ’000; ++i) p.emplace_back(std::to_string(i)+"P", std::to_string(i+1)); for(inti=0; i<100 ’000; ++i) q.emplace_back(std::to_string(i)+"Q", std::to_string(i+1)); { database db;

tmg.reset(); tml.reset(); db.set_many(move(p)); T1=0.025144, T2=0.02587, TG=98.0445 time1= tml.elapsed();

db.set_many(move(q)); time2= tml.elapsed(); } timeg= tmg.elapsed(); logl("T1=", time1, ", T2=", time2, ", TG=", timeg); } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 34 / 61 Communication between tasks Back to communicating between tasks: async that returns a future

The scheme is as follows:

1 async receives a function f 2 constructs a packaged task for f 3 obtains the future of the packaged task 4 pushes the packaged task to the task queue 5 returns the future

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 35 / 61 Communication between tasks Task system 3

class task_system{ <> <> public: <> <>

template auto async(F&& f) { using result_type= std::invoke_result_t; using packaged_type= std::packaged_task;

autop= new packaged_type(forward(f)); // must use dynamic memory, because packaged_task not copyable std::future result=p->get_future();

_q.push([p]{ (*p)(); delete p; }); // delete package after task executed return result; } }; Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 36 / 61 42 Hello, World!

Communication between tasks Test task system 3: ts3-example1.cpp

#include "task-system-utilities.hpp" <> <>

int f() { return 42;}

int main() { task_system ts; autoa= ts.async(f); autob= ts.async([]{ return "Hello, "; }); autoc= ts.async([]{ return "World!"; });

logl(a.get(),"", b.get(), c.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 37 / 61 Communication between tasks Test task system 3: ts3-example1.cpp

#include "task-system-utilities.hpp" <> <>

int f() { return 42;}

int main() { 42 Hello, World! task_system ts; autoa= ts.async(f); autob= ts.async([]{ return "Hello, "; }); autoc= ts.async([]{ return "World!"; });

logl(a.get(),"", b.get(), c.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 37 / 61 Communication between tasks Tasks with arguments

The tasks must be nullary functions This leads to frequent wrapping with lambdas, and syntactic noise int plus(int i, int j) { returni+ j; };

ts.async([]{ return plus(1,2); });

Next: allow functions with arguments the same way as thread, emplace, etc. ts.async(plus,1,2);

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 38 / 61 Communication between tasks Task system 4: async

template auto async(F&& f, Args&&... args) { using result_type= decltype(f(forward(args)...)); using packaged_type= std::packaged_task;

autop= new packaged_type( [f= forward(f), tpl= std::make_tuple(std::forward(args)...)]() mutable{ return std::apply(f, tpl); });

std::future result=p->get_future(); _q.push([p]{ (*p)(); delete p; }); return result; }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 39 / 61 42 Hello, World!

Communication between tasks Test task system 4: ts4-example1.cpp

#include "task-system-utilities.hpp" <> <>

int f(int n, int m, int k) { returnn+m+ k; }

int main() { task_system ts; autoa= ts.async(f, 10, 20, 12); autob= ts.async([](string s1, string s2) { return s1+ s2; }, "Hello, ", "World!");

logl(a.get(),"", b.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 40 / 61 Communication between tasks Test task system 4: ts4-example1.cpp

#include "task-system-utilities.hpp" <> <>

int f(int n, int m, int k) { returnn+m+ k; }

int main() { 42 Hello, World! task_system ts; autoa= ts.async(f, 10, 20, 12); autob= ts.async([](string s1, string s2) { return s1+ s2; }, "Hello, ", "World!");

logl(a.get(),"", b.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 40 / 61 Deadlock and task systems Outline

1 About performance

2 Task System

3 Communication between tasks

4 Deadlock and task systems

5 Task stealing

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 41 / 61 Deadlock and task systems Tasks, threads, deadlocks

Task system implementations can be prone to deadlock Task systems/thread pools typically do not know when a task is blocked A blocked task occupies a thread, and does not let go With fixed number of threads, running out of threads means deadlock if all threads are blocked to wait for computations that are either blocked or still in the queue

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 42 / 61 2 not found 2003 3001 4001 5003 not found 7001 not found 9001

Deadlock and task systems Example (no deadlock)

#include "task-system-4.hpp" <> task_system ts;

int prime_finder(int a, int b) { std::vector> vf; for(inti=a; i

int main() { for(inti=0; i<10; ++i) { try { logl(prime_finder(i*1000, i*1000+4)); } catch (string e) { logl(e); } } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 43 / 61 Deadlock and task systems Example (no deadlock)

#include "task-system-4.hpp" <> task_system ts; 2 not found int prime_finder(int a, int b) { 2003 std::vector> vf; 3001 for(inti=a; i

int main() { for(inti=0; i<10; ++i) { try { logl(prime_finder(i*1000, i*1000+4)); } catch (string e) { logl(e); } } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 43 / 61 Deadlock and task systems Example (yes deadlock)

#include "task-system-4.hpp" <>

task_system ts;

int prime_finder(int a, int b) { std::vector> vf; for(inti=a; i

int main() { std::vector> pf; for(inti=0; i<10; ++i) { pf.emplace_back(ts.async(prime_finder, i*1000, i*1000+4)); } for(auto&f : pf) { try { logl(f.get()); } catch (string e) { logl(e); } } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 44 / 61 Deadlock and task systems Deadlock with a serial queue

#include "task-system-4.hpp" <>

task_system ts2{2}; task_system ts1{1};

int main() { // OK ts2.async([]{ ts2.async([]{ return0; }).get(); });

// Deadlock ts1.async([]{ ts1.async([]{ return0; }).get(); }); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 45 / 61 Deadlock and task systems Conservative programming advice

A task should not wait/block for another task in the same queue This can be relaxed, but difficult to formulate a modular easy rule

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 46 / 61 Task stealing Outline

1 About performance

2 Task System

3 Communication between tasks

4 Deadlock and task systems

5 Task stealing

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 47 / 61 Task stealing Using several notification queues

Pushing and popping to a single queue causes congestion there will be blocking even if threads available New idea: one queue for each thread only need to block if no tasks atomic _index variable indicates the current queue/thread; _index advanced after every enqueue operations

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 48 / 61 Task stealing Task system 5

class task_system{ const unsigned int _nthreads; std::vector _threads; std::vector _qs; std::atomic _index=0;

<> public: <> <> <> };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 49 / 61 Task stealing Task system 5: constructor and destructor

task_system(int nthreads=0) : _nthreads(nthreads>0? nthreads : number_of_threads()), _qs(_nthreads) // note: brittle { for(unsigned intn=0; n< _nthreads; ++n) { _threads.emplace_back([this, n]{ run(n); }); } }

Each thread’s run is invoked with a unique index Member initializers are evaluated in the order the members are declared in the class body, hence the comment about brittle code

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 50 / 61 Task stealing Task system 5: run

void run(unsigned int i) { while(true){ function f; if(!_qs[i].pop(f)) break; f(); } }

Each thread’s run gets a differnt index i that identifies the thread’s queue thread i pops from _qs[i]

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 51 / 61 Task stealing Task system 5: async

template auto async(F&& f, Args&&... args) { using result_type= decltype(f(forward(args)...)); using packaged_type= std::packaged_task;

autop= new packaged_type( [f= forward(f), tpl= std::make_tuple(std::forward(args)...)]() mutable{ return std::apply(f, tpl); }); std::future result=p->get_future();

autoi= _index++; _qs[i% _nthreads].push([p]{ (*p)(); delete p; }); return result; }

Note: _index overflowing is OK, since we use modulo-arithmetic with unsigned integers

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 52 / 61 BAC

client API unchanged

Task stealing Test task system 5: ts5-example1.cpp

#include "task-system-utilities.hpp" <> <>

int main() { task_system ts; autop= ts.async([](long t) { sleep_ms(t); return "C"; }, 100); ts.async([]{ std::cout << "A"; }); ts.async([]{ std::cout << "B"; }); logl(p.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 53 / 61 Task stealing Test task system 5: ts5-example1.cpp

#include "task-system-utilities.hpp" <> <> BAC

int main() { task_system ts; autop= ts.async([](long t) { sleep_ms(t); return "C"; }, 100); ts.async([]{ std::cout << "A"; }); ts.async([]{ std::cout << "B"; }); logl(p.get()); }

client API unchanged

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 53 / 61 Task stealing Notification queue 3

class notification_queue{ std::deque> _q; std::mutex _mutex; std::condition_variable _ready; bool _done= false; public: <> <>

void done() { { lock_t lock(_mutex); _done= true; } _ready.notify_all(); }

bool pop(function& f) { lock_t lock(_mutex); while (_q.empty()&&!_done) _ready.wait(lock); if (_q.empty()) return false; f= move(_q.front()); _q.pop_front(); return true; }

template void push(F&& f) { { lock_t lock(_mutex); _q.emplace_back(forward(f)); } _ready.notify_one(); } };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 54 / 61 Task stealing Notification queue 3: try_push

template bool try_push(F&& f) { { lock_t lock(_mutex, std::try_to_lock); if(!lock) return false; _q.emplace_back(forward(f)); } _ready.notify_one(); return true; }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 55 / 61 Task stealing Notification queue 3: try_pop

bool try_pop(function& f) { lock_t lock(_mutex, std::try_to_lock); if(!lock || _q.empty()) return false; f= move(_q.front()); _q.pop_front(); return true; }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 56 / 61 Task stealing Task system 6

class task_system{ <>

<> public: <> <>

<> };

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 57 / 61 Task stealing Task system 6: run

void run(unsigned int i) { while(true){ function f; for(unsigned intn=0; n< _nthreads; ++n) { if (_qs[(i+ n)% _nthreads].try_pop(f)) break; } if(!f&&!_qs[i].pop(f)) break; f(); } }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 58 / 61 Task stealing Task system 6: async

template auto async(F&& f, Args&&... args) { using result_type= decltype(f(forward(args)...)); using packaged_type= std::packaged_task;

autop= new packaged_type( [f= forward(f), tpl= std::make_tuple(std::forward(args)...)]() mutable{ return std::apply(f, tpl); }); std::future result=p->get_future();

const intK= 56;

autoi= _index++; auto pl= [p]{ (*p)(); delete p; }; for(unsigned intn=0; n< _nthreads*K; ++n) { if (_qs[(i+ n)% _nthreads].try_push(pl)) return result; } _qs[i% _nthreads].push(pl); return result; }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 59 / 61 ABC

client API unchanged

Task stealing Test task system 6: ts6-example1.cpp

#include "task-system-utilities.hpp" <> <>

int main() { task_system ts; autop= ts.async([](long t) { sleep_ms(t); return "C"; }, 100); ts.async([]{ std::cout << "A"; }); ts.async([]{ std::cout << "B"; }); logl(p.get()); }

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 60 / 61 Task stealing Test task system 6: ts6-example1.cpp

#include "task-system-utilities.hpp" <> <> ABC

int main() { task_system ts; autop= ts.async([](long t) { sleep_ms(t); return "C"; }, 100); ts.async([]{ std::cout << "A"; }); ts.async([]{ std::cout << "B"; }); logl(p.get()); }

client API unchanged

Jaakko Järvi (University of Bergen) Task System Implementations <2018-10-02 Tue> 60 / 61 time 4 268.635 using 1 threads time 5 246.731 using 1 threads time 6 74.2265 using 1 threads -- time 4 453.62 using 2 threads time 5 103.271 using 2 threads time 6 59.1793 using 2 threads -- time 4 563.441 using 4 threads time 5 102.584 using 4 threads time 6 100.315 using 4 threads -- time 4 562.981 using 8 threads time 5 119.062 using 8 threads time 6 116.617 using 8 threads -- time 4 561.763 using 16 threads time 5 112.588 using 16 threads time 6 118.333 using 16 threads -- time 4 568.246 using 32 threads time 5 112.912 using 32 threads time 6 118.695 using 32 threads -- time 4 588.09 using 64 threads time 5 113.778 using 64 threads time 6 124.002 using 64 threads -- time 4 623.733 using 128 threads time 5 121.989 using 128 threads time 6 134.022 using 128 threads --

Task stealing Timings

#include "task-system-utilities.hpp" <> namespace ts4 { <> } namespace ts5 { <> } namespace ts6 { <> } <> std::atomic found=0; int count_primes(long n) { int count=0; while (n-->1){ if (is_prime(n)) ++count; } return count; }

const int ntasks= 40960; const int maxprime= 10; void test(int nthreads) { double time4; timer tmr4; { ts4::task_system t4(nthreads); for(inti=0; i 61 / 61 ts6::task_system t6(nthreads); for(inti=0; i

int main() { for(intn=1; n <= 128; n *=2) test(n); } Task stealing Timings

#include "task-system-utilities.hpp" <> namespace ts4 { <> } namespace ts5 { <> } namespace ts6 { <> } <> std::atomic found=0; int count_primes(long n) { time 4 268.635 using 1 threads int count=0; while (n-->1){ if (is_prime(n)) ++count; } time 5 246.731 using 1 threads return count; time 6 74.2265 using 1 threads } -- time 4 453.62 using 2 threads const int ntasks= 40960; time 5 103.271 using 2 threads const int maxprime= 10; time 6 59.1793 using 2 threads void test(int nthreads) { -- double time4; timer tmr4; time 4 563.441 using 4 threads { time 5 102.584 using 4 threads ts4::task_system t4(nthreads); time 6 100.315 using 4 threads for(inti=0; i 61 / 61 ts6::task_system t6(nthreads); time 4 588.09 using 64 threads for(inti=0; i

Jaakko Järvi

University of Bergen

<2018-11-08 Thu>

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 1 / 66 Outline

1 Coroutines

2 Emulating coroutines

3 Passing data with suspend and resume

4 Async/await (asynchronous coroutines)

5 Promises

6 Asynchronous Iterators and Iterables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 2 / 66 Coroutines Coroutines in Nutshell

Generalization of subroutines that allows suspending the execution of the subroutine, and resuming a suspended execution local variables and program counter persist since suspension Coroutines are in progress simultaneously, but not executed simultaneously A control abstraction for co-operative, or non-preemptive, multitasking Many applications: cooperative tasks event loops pipelining iterators, generators

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 3 / 66 2 3 5 7 11 13 17 19

Coroutines Example (JavaScript)

function* primes() { let primes= []; letc=2; while(true){ let composite= false; for(let p of primes) { if (c%p ==0) { composite= true; break;} } if(!composite) { primes.push(c); yield c; } ++c; } }

letp= primes(); while(true){ letr= p.next(); if (r.value> 20) break; console.log(r.value); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 4 / 66 Coroutines Example (JavaScript)

function* primes() { let primes= []; letc=2; while(true){ let composite= false; for(let p of primes) { if (c%p ==0) { composite= true; break;} } if(!composite) { primes.push(c); yield c; } ++c; } 2 } 3 5 letp= primes(); 7 while(true){ 11 letr= p.next(); 13 if (r.value> 20) break; 17 console.log(r.value); 19 }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 4 / 66 Coroutines History

First introduced by Melvin Conway in -58 (in Assembly, for COBOL compiler) In some early languages Simula, Modula-2, BCPL Then seemingly forgotten for many years Re-emerged recently, today found in many of the relevant languages Coming to C++ Prominently in Go JavaScript Kotlin Scheme (continuations), Haskell, F#, . . .

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 5 / 66 Coroutines Subroutine

Subroutine has a call and a return call pushes a new activation record/frame to stack suspends caller jumps to the beginning of the function return passes return value to caller pops the frame resumes caller’s execution Frames on a stack for most languages Stack based frame allocation even hardware-supported register for top of stack frame allocation/deallocation == modify top-of-stack register

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 6 / 66 Coroutines

Coroutine has a call, suspend, resume, destroy, and return suspend suspend execution, remember the current point, save current frame, transfer execution back to caller resume restore saved frame, continue from point of suspesion destroy deallocate saved frame (without resuming execution) Corollary: activation frame lifetimes are not nested Heap allocation Smart compiler may be able to optimize (and use alloca)

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 7 / 66 Coroutines Variations

Symmetric vs. asymmetric Coroutines first class or not Stackful vs. stackless Parallel execution or not

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 8 / 66 Coroutines Symmetric Coroutines

Symmetric coroutines have a single control transfer operator, specifies target yield_to proposed to C++

void producer_body(producer_type::self& self, std::string base, consumer_type& consumer) { std::sort(base.begin(), base.end()); do{ self.yield_to(consumer, base); } while (std::next_permutation(base.begin(), base.end())); }

void consumer_body(consumer_type::self& self, const std::string& value, producer_type& producer) { std::cout << value <<"\n"; while(true){ std::cout << self.yield_to(producer)<<"\n"; } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 9 / 66 Coroutines Asymmetric Coroutines

Similar to subroutines, in that control transferred back to the caller

typedef coro::generator generator_type;

int range_generator(generator_type::self& self, int min, int max) { while(min< max-1) self.yield(min++); return min; }

Control structure simpler Asymmetric/Symmetric coroutines equally expressive Either can emulate the other Most modern languages implement asymmetric coroutines

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 10 / 66 Coroutines First class coroutines

First class means “behaves like any value” Can be stored in a variable Can be passed as a parameter Can be returned from a function Can be suspended and yielded! Old languages (CLU, Sather) had restrictions Most modern implementations provide first class coroutines JavaScript, Python, C++

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 11 / 66 Coroutines Parallel Execution or Not

Coroutines can run asynchronously with its caller Caller can await for suspension/return Unproblematic in languages where no shared memory between caller and coroutine Go, JavaScript Web worker operations and other asynchronous APIs’ operations

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 12 / 66 Coroutines Stackful vs. stackless

Stackful coroutines can suspend in nested functions When resuming, continue in the nested function Stackless coroutines can only suspend at top level This is a bit of a limitation Yielding from wrapper/helper functions cumbersome, must create a new coroutine layer Python’s, C++(?) coroutines stackless JavaScript has kind of both

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 13 / 66 0 4

Coroutines Example (wrong!)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; yieldAll([1,2,3]); yield4; }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 14 / 66 Coroutines Example (wrong!)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; 0 yieldAll([1,2,3]); 4 yield4; }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 14 / 66 0 Object [Generator] {} 4

Coroutines Example (wrong also!)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; yield yieldAll([1,2,3]); yield4; }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 15 / 66 Coroutines Example (wrong also!)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; 0 yield yieldAll([1,2,3]); Object [Generator] {} yield4; 4 }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 15 / 66 0 1 2 3 4

Coroutines Example (OK)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; yield* yieldAll([1,2,3]); yield4; }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 16 / 66 Coroutines Example (OK)

function* yieldAll(arr) { for(let v of arr) yield v; }

function* f() { yield0; 0 yield* yieldAll([1,2,3]); 1 yield4; 2 } 3 4 letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 16 / 66 0 1 2 3 4

This works because function* defines a generator, which is an iterable

Coroutines Example (OK also)

// function* yieldAll(arr) { // for (let v of arr) yield v; // }

function* f() { yield0; yield*[1,2,3]; yield4; }

letp= f(); for(let v of f()) console.log(v);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 17 / 66 Coroutines Example (OK also)

// function* yieldAll(arr) { // for (let v of arr) yield v; // }

function* f() { yield0; 0 yield*[1,2,3]; 1 yield4; 2 } 3 4 letp= f(); for(let v of f()) console.log(v);

This works because function* defines a generator, which is an iterable

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 17 / 66 Emulating coroutines Outline

1 Coroutines

2 Emulating coroutines

3 Passing data with suspend and resume

4 Async/await (asynchronous coroutines)

5 Promises

6 Asynchronous Iterators and Iterables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 18 / 66 Emulating coroutines Generators—Iterators—Coroutines

JavaScript demonstrates that a coroutine implementation can be merely syntactic sugar over objects and normal procedure calls Python, C++ coroutines also mostly library implementations C++ plans compiler/runtime support for performance

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 19 / 66 Emulating coroutines JavaScript Iterator and Iterable Protocols

An object is an iterator if it implements the next() method as follows: no arguments returns an object p such that p.done isboolean if p.done is false, then p.value is the value returned by the iterator An object is an iterable if it has the computed property [Symbol.iterator], which is a nullary function that returns an iterator

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 20 / 66 Emulating coroutines Example

class PrimeIterator {

constructor() { this._c=2 this._primes= []; }

next() { while(true){ let composite= false; for(let p of this._primes) { if(this._c%p ==0) { composite= true; break;} } if(!composite) { this._primes.push(this._c); return { done: false, value: this._c }; } else ++this._c; } } }

class PrimeIterable { [Symbol.iterator]() { return new PrimeIterator(); } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 21 / 66 2 3 5 7 ----- 2 3 5 7

Emulating coroutines Testing iterators

letp= new PrimeIterator(); while(true){ letr= p.next(); if (r.value> 10) break; console.log(r.value); } console.log("-----"); for(let v of new PrimeIterable()) { if (v> 10) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 22 / 66 Emulating coroutines Testing iterators

letp= new PrimeIterator(); while(true){ letr= p.next(); if (r.value> 10) break; 2 console.log(r.value); 3 } 5 console.log("-----"); 7 for(let v of new PrimeIterable()) { ----- if (v> 10) break; 2 console.log(v); 3 } 5 7

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 22 / 66 2 3 5 7 ----- 2 3 5 7

Emulating coroutines Generator Functions Return Iterators and Iterables

function* primes() { let primes= []; letc=2; while(true){ let composite= false; for(let p of primes) { if (c%p ==0) { composite= true; break;} } if(!composite) { primes.push(c); yield c; } ++c; } }

letp= primes(); while(true){ letr= p.next(); if (r.value> 10) break; console.log(r.value); } console.log("-----"); for(let v of primes()) { if (v> 10) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 23 / 66 Emulating coroutines Generator Functions Return Iterators and Iterables

function* primes() { let primes= []; letc=2; while(true){ let composite= false; for(let p of primes) { if (c%p ==0) { composite= true; break;} } if(!composite) { primes.push(c); yield c; } ++c; } } 2 letp= primes(); 3 while(true){ 5 letr= p.next(); 7 if (r.value> 10) break; ----- console.log(r.value); 2 } 3 console.log("-----"); 5 for(let v of primes()) { 7 if (v> 10) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 23 / 66 In the iterator object local variables are iterator object’s member variables

Emulating coroutines Coroutine Frame

Where is the coroutine frame stored in JavaScript coroutines?

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 24 / 66 Emulating coroutines Coroutine Frame

Where is the coroutine frame stored in JavaScript coroutines?

In the iterator object local variables are iterator object’s member variables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 24 / 66 2 3 5 7 ---

Generators are closable iterators by default One can clean up resources at close

Emulating coroutines Closing coroutines

Above, we stopped calling the prime generator, even though it would have continued to yield primes Breaking out from for..of loop, however, closed the generator

letp= primes(); for(let v of p) { if (v> 10) break; console.log(v); } console.log("---"); for(let v of p) { if (v> 20) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 25 / 66 Emulating coroutines Closing coroutines

Above, we stopped calling the prime generator, even though it would have continued to yield primes Breaking out from for..of loop, however, closed the generator

letp= primes(); for(let v of p) { if (v> 10) break; 2 console.log(v); 3 } 5 console.log("---"); 7 for(let v of p) { --- if (v> 20) break; console.log(v); }

Generators are closable iterators by default One can clean up resources at close

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 25 / 66 Emulating coroutines Cleaning up at close

Let’s make primes clean up at closing

function* primes() { let primes= []; letc=2; try{ while(true){ let composite= false; for(let p of primes) { if (c%p ==0) { composite= true; break;} } if(!composite) { primes.push(c); yield c; } ++c; } } finally{ console.log("cleaning up"); } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 26 / 66 2 3 5 7 cleaning up ---

Emulating coroutines Cleaning up at close

<>

letp= primes(); for(let v of p) { if (v> 10) break; console.log(v); } console.log("---"); for(let v of p) { if (v> 20) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 27 / 66 Emulating coroutines Cleaning up at close

<> 2 3 letp= primes(); 5 for(let v of p) { 7 if (v> 10) break; cleaning up console.log(v); --- } console.log("---"); for(let v of p) { if (v> 20) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 27 / 66 Emulating coroutines Closable iterators

class PrimeIterator { constructor() { this._c=2 this._primes= []; this._closed= false; } next() { if(this._closed) return { done: true, value: undefined};

while(true){ let composite= false; for(let p of this._primes) { if(this._c%p ==0) { composite= true; break;} } if(!composite) { this._primes.push(this._c); return { done: false, value: this._c }; } else ++this._c; } } return() { console.log("cleaning up"); this._closed= true; return this.next(); } } Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 28 / 66 class PrimeIterable { [Symbol.iterator]() { return new PrimeIterator(); } } 2 3 5 7 cleaning up --- 2 3 5 7 11 13 17 19 cleaning up

Emulating coroutines Testing closable iterators

<>

letp= new PrimeIterable(); for(let v of p) { if (v> 10) break; console.log(v); } console.log("---"); for(let v of p) { if (v> 20) break; console.log(v); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 29 / 66 Emulating coroutines Testing closable iterators

<> 2 3 letp= new PrimeIterable(); 5 for(let v of p) { 7 if (v> 10) break; cleaning up console.log(v); --- } 2 console.log("---"); 3 for(let v of p) { 5 if (v> 20) break; 7 console.log(v); 11 } 13 17 19 cleaning up

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 29 / 66 2 3 5 7 ----- 11 13 cleaning up { done: true, value: undefined } { done: true, value: undefined }

Emulating coroutines Closing closable iterators directly

letp= new PrimeIterator(); let r; while(true){ r= p.next(); if (r.value> 10) break; console.log(r.value); } console.log("-----"); console.log(r.value); r= p.next(); console.log(r.value); r= p.return(); console.log(r); r= p.next(); console.log(r);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 30 / 66 Emulating coroutines Closing closable iterators directly

letp= new PrimeIterator(); 2 let r; 3 while(true){ 5 r= p.next(); 7 if (r.value> 10) break; ----- console.log(r.value); 11 } 13 console.log("-----"); cleaning up console.log(r.value); { done: true, value: undefined } r= p.next(); console.log(r.value); { done: true, value: undefined } r= p.return(); console.log(r); r= p.next(); console.log(r);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 30 / 66 Passing data with suspend and resume Outline

1 Coroutines

2 Emulating coroutines

3 Passing data with suspend and resume

4 Async/await (asynchronous coroutines)

5 Promises

6 Asynchronous Iterators and Iterables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 31 / 66 But yield e also recieves data when coroutine resumed

Passing data with suspend and resume Suspending with a value

yield communicates data from a coroutine to its caller when the coroutine is suspended function* counter() { letc=1; while(true) yieldc++; }

letp= counter(); let sum= p.next().value+ p.next().value; console.log(sum);

3

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 32 / 66 Passing data with suspend and resume Suspending with a value

yield communicates data from a coroutine to its caller when the coroutine is suspended function* counter() { letc=1; while(true) yieldc++; }

letp= counter(); let sum= p.next().value+ p.next().value; console.log(sum);

3

But yield e also recieves data when coroutine resumed

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 32 / 66 Passing data with suspend and resume Resuming with a value

yield e is an expression; its value is the value sent to coroutine when it is resumed

function* skipcounter() { letc=0; while(true) c= yield ++c; }

letp= skipcounter(); let s1= p.next().value; let s2= p.next(s1+100).value; let s3= p.next(s2).value; console.log(s1, s2, s3);

1 102 103

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 33 / 66 error: malfunction. resetting... 1 102 1 2

Passing data with suspend and resume Resuming with an exception

yield e can resume with an exception, sent from the caller

function* skipcounter() { letc=0; while(true){ try{ c= yield ++c; } catch (e) { console.log("error: "+e+ ". resetting..."); c=0; } } } letp= skipcounter(); let s1= p.next().value; let s2= p.next(s1+100).value; let s3= p.throw("malfunction").value; let s4= p.next(s3).value; console.log(s1, s2, s3, s4);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 34 / 66 Passing data with suspend and resume Resuming with an exception

yield e can resume with an exception, sent from the caller

function* skipcounter() { letc=0; while(true){ try{ c= yield ++c; } catch (e) { console.log("error: "+e+ ". resetting..."); c=0; } error: malfunction. resetting... } 1 102 1 2 } letp= skipcounter(); let s1= p.next().value; let s2= p.next(s1+100).value; let s3= p.throw("malfunction").value; let s4= p.next(s3).value; console.log(s1, s2, s3, s4);

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 34 / 66 Async/await (asynchronous coroutines) Outline

1 Coroutines

2 Emulating coroutines

3 Passing data with suspend and resume

4 Async/await (asynchronous coroutines)

5 Promises

6 Asynchronous Iterators and Iterables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 35 / 66 Async/await (asynchronous coroutines) Async/await

Many languages offer ways of running coroutines concurrently with their caller This is perhaps not in line with the original notion of a coroutine, but the term has expanded to cover such abstractions Note! If coroutines are run in parallel and also communicate with shared memory, we have all the usual problems (synchronization needed) Common primitives are async functions/tasks await statements Found in JavaScript, Python, C#, C++(?), . . .

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 36 / 66 Even though sendServerRequest is asynchronous (it returns a promise), the structure of the code is similar to if it was a regular function call

when called, an async function runs until await (or return), then returns a promise In case of await, registers the rest of the function body as a continuation for the awaited promise

Async/await (asynchronous coroutines) Async/await in JavaScript

A function defined as async can contain await statements await e evaluates evaluates e to a promise, and waits until that promise is resolved the resolved promise’s value is the value of the expression await e a rejected promise turns into an exception

async function client(request) { let result= await sendServerRequest(request); return "Result="+ result; }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 37 / 66 when called, an async function runs until await (or return), then returns a promise In case of await, registers the rest of the function body as a continuation for the awaited promise

Async/await (asynchronous coroutines) Async/await in JavaScript

A function defined as async can contain await statements await e evaluates evaluates e to a promise, and waits until that promise is resolved the resolved promise’s value is the value of the expression await e a rejected promise turns into an exception

async function client(request) { let result= await sendServerRequest(request); return "Result="+ result; }

Even though sendServerRequest is asynchronous (it returns a promise), the structure of the code is similar to if it was a regular function call

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 37 / 66 Async/await (asynchronous coroutines) Async/await in JavaScript

A function defined as async can contain await statements await e evaluates evaluates e to a promise, and waits until that promise is resolved the resolved promise’s value is the value of the expression await e a rejected promise turns into an exception

async function client(request) { let result= await sendServerRequest(request); return "Result="+ result; }

Even though sendServerRequest is asynchronous (it returns a promise), the structure of the code is similar to if it was a regular function call

when called, an async function runs until await (or return), then returns a promise In case of await, registers the rest of the function body as a continuation for the awaited promise

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 37 / 66 The above is essentially synchronous call to asynchronous operations

Async/await (asynchronous coroutines) Async/await, many suspend/resume points

async function client(req1, req2, req3) { let p1= await sendServerRequest(req1); let p2= await sendServerRequest(req2); let p3= await sendServerRequest(req3); return "Results="+ combine(r1, r2, r3); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 38 / 66 Async/await (asynchronous coroutines) Async/await, many suspend/resume points

async function client(req1, req2, req3) { let p1= await sendServerRequest(req1); let p2= await sendServerRequest(req2); let p3= await sendServerRequest(req3); return "Results="+ combine(r1, r2, r3); }

The above is essentially synchronous call to asynchronous operations

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 38 / 66 Async/await (asynchronous coroutines) Async/await, concurrent execution

Several asynchronous co-routines can be launched first, awaited later

async function client(req1, req2, req3) { let p1= sendServerRequest(req1); let p2= sendServerRequest(req2); let p3= sendServerRequest(req3); // do other stuff let r1= await p1; let r2= await p2; let r3= await p3; return "Results="+ combine(r1, r2, r3); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 39 / 66 Async/await (asynchronous coroutines) Async/await, concurrent execution

One can wait for many promises simultaneously

async function client(req1, req2, req3) { let p1= sendServerRequest(req1); let p2= sendServerRequest(req2); let p3= sendServerRequest(req3); // do other stuff let [r1, r2, r3]= await Promise.all([p1, p2, p3]); return "Results="+ combine(r1, r2, r3); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 40 / 66 Async/await (asynchronous coroutines) Async/await, concurrent execution

One can wait for the first out of many promises

async function client(req1, req2, req3) { let p1= sendServerRequest(req1); let p2= sendServerRequest(req2); let p3= sendServerRequest(req3); // do other stuff letr= await Promise.race([p1, p2, p3]); return "First result="+ r; }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 41 / 66 Async/await (asynchronous coroutines) Intermediate Summary

function* and yield are used to define synchronous coroutines Under the covers, they are just generator objects, where the object instance variables are the local variables of the coroutine frame async and await are used to define asynchronous coroutines Under the covers, async functions are functions that return promises, and await expressions register continuations for promises What then are promises in JavaScript?

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 42 / 66 Promises Callbacks

JavaScript is single-threaded One event loop schedules tasks (JavaScript functions) for sequential execution from a task queue Each task runs into completion, before giving control back to the event loop Each task has been put to the task queue as a result of some event Programmer can register event handlers (JavaScript functions) for various events, e.g.:

button.addEventListener("click", () => { alert("button clicked"); });

Environment (Browser, or nodejs) has APIs for launching asynchronous operations, and registering handlers/callbacks for when they finish networking file IO animation timeouts GUI (users perform asynchronous operations)

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 43 / 66 Promises Example asynchronous APIs

let input= document.getElementById(’input1’); let handler= null;

input.addEventListener("keyup", (event) =>{ clearTimeout(handler);

let up=() => { input.value= input.value.toUpperCase(); };

handler= setTimeout(up, 1000); });

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 44 / 66 Promises Callback hell

Callbacks are a seemingly simple way to program Every asynchronous function takes a callback, a continuation function, to be called when the operation finishes This, however, composes poorly unstructured mess, in general

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 45 / 66 Promises Example: Callbacks

Let’s package the stable reading functionality into a function

function readStable(input, duration, cont) {

let handler= setTimeout(() => { cleanup(); cont(input.value); }, duration); input.addEventListener("keyup", keyup);

function keyup(event) { clearTimeout(handler); handler= setTimeout(() => { cleanup(); cont(input.value); }, duration); }; function cleanup() { input.removeEventListener("keyup", keyup); } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 46 / 66 Promises Example: Callbacks

Another asynchronous function: lookup user’s account

function findUser(key, cont) { // emulate asynchronous request setTimeout(() =>{ switch (key) { case "EDGAR": cont("Dijkstra"); return; case "TONY": cont("Hoare"); return; default: cont(""); return; } }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 47 / 66 Promises Example: Callbacks

Composing with callbacks

function showUserData() { let input= document.getElementById(’input1’); let output= document.getElementById(’output1’); readStable(input, 2000, (key) =>{ let uKey= key.toUpperCase(); findUser(uKey, (v) => { output.value= v; }); }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 48 / 66 Promises Promises

A promise represents the eventual result of an asynchronous operation. It is a placeholder for either an eventual value, which materializes if the operation succeeds error, which materializes if the operation fails Promises make executing, composing, and managing asynchronous operations much simpler than programming with callbacks and events directly Corresponds roughly to C++ futures (except, for now, are more composable)

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 49 / 66 Promises Promise states

A promise can be in one of three states: Pending promise has no value yet Fulfilled the asynchronous operation has completed, and the promise has a value (that will not change) Rejected the asynchronous operation failed, and the promise will never be fulfilled. An error value indicates the reason for failure. Valid state transitions: Pending Fulfilled, Pending Rejected → →

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 50 / 66 Promises Constructing a Promise

Promise constructor’s argument is a function (resolve, reject) => { ... }

The constructor invokes the function, binding resolve to a function that resolves the promise reject to a function that rejectes the promise Example: a promise that immediately resolves to 1 Promise((resolve, reject) => { resolve(1); });

Example: a promise that immediately rejects, with "error" Promise((resolve, reject) => { reject("error"); });

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 51 / 66 Promises Example: Promises

function readStable(input, duration) { return new Promise((resolve, reject) =>{

let handler= setTimeout(() => resolve(input.value), duration);

input.onkeyup= (event) =>{ clearTimeout(handler); handler= setTimeout(() => resolve(input.value), duration); };

}); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 52 / 66 Promises Example: Promises

function findUser(key) { return new Promise((resolve, reject) =>{ setTimeout(() =>{ switch (key) { case "EDGAR": resolve("Dijkstra"); return; case "TONY": resolve("Hoare"); return; default: reject(key); return; } }); }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 53 / 66 Promises Example: Promises

function showUserData() { let input= document.getElementById(’input1’); let output= document.getElementById(’output1’); readStable(input, 2000) .then(key => key.toUpperCase()) .then(uKey => findUser(uKey)) .then(v => { output.value= v; }) .catch(e => { output.value= "No user "+ e; }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 54 / 66 Promises Example: Promises (variation 2)

function showUserData() { let input= document.getElementById(’input1’); let output= document.getElementById(’output1’); readStable(input, 2000) .then(key => key.toUpperCase()) .then(findUser) .then(v => { output.value= v; }) .catch(e => { output.value= "No user "+ e; }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 55 / 66 Promises Example: Promises (variation 3)

function showUserData() { let input= document.getElementById(’input1’); let output= document.getElementById(’output1’); readStable(input, 2000)

.then(key => findUser(key.toUpperCase())) .then(v => { output.value= v; }) .catch(e => { output.value= "No user "+ e; }); }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 56 / 66 Promises Example: async/await

async function showUserData() { let input= document.getElementById(’input1’); let output= document.getElementById(’output1’);

try{ let key= await readStable(input, 2000); let uKey= key.toUpperCase() letv= await findUser(uKey); output.value= v; } catch (e) { output.value= "No user "+ e; } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 57 / 66 Promises Translate async/await to promises and coroutines

function asyncInc(i) { return Promise.resolve(i+1); }

async function countToThree() { leti=0; i= await asyncInc(i); i= await asyncInc(i); i= await asyncInc(i); return i; } 3 countToThree().then(v => console.log(v));

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 58 / 66 Promises Translation

function _countToThree() { return spawn(function*(){ leti=0; i= yield asyncInc(i); i= yield asyncInc(i); i= yield asyncInc(i); return i; }); }

function spawn(generator) { return new Promise(function(resolve, reject) { let iter= generator(); function step(nextf) { let r; try{r= nextf(); } catch(e) { reject(e); return;} if (r.done) { resolve(r.value); return;} Promise.resolve(r.value).then(v => (step(() => iter.next(v))), e => (step(() => iter.throw(e)))); }; step(() => iter.next()); }); } 3 _countToThree().then((v) => console.log(v));

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 59 / 66 Asynchronous Iterators and Iterables Outline

1 Coroutines

2 Emulating coroutines

3 Passing data with suspend and resume

4 Async/await (asynchronous coroutines)

5 Promises

6 Asynchronous Iterators and Iterables

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 60 / 66 Asynchronous Iterators and Iterables Asynchronous Iterators and Iterables

To further hide the distinction between synchronous and asynchronus functions from the programmer, JavaScript has protocols for asynchronous iterables and iterators They are not merely synchronous iterables/iterators of promises the number of elements the iterable has may depend on the asynchronous computations that produce the elements The asynchronous iterator protocol is the same as the synchronous iterator protocol, except that the next method returns a promise, and can be async The asynchronous iterable protocol requires that an object has a method [Symbol.asyncIterator]() that returns an object that conforms to the asynchronous iterator protocol

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 61 / 66 Asynchronous Iterators and Iterables Example

function sleep(time, v) { return new Promise((resolve) => setTimeout(() => resolve(v), time)); }

class TickAsyncIterator {

constructor(time) { this._time= time; this._ticks=0; this._done= false; }

next() { if(this._done) return { done: true}; return sleep(this._time, ++this._ticks).then(v => ({ done: false, value: v })); };

return() { this._done= true; return this.next(); } }

class TickAsyncIterable { constructor(time) { this._time= time; } [Symbol.asyncIterator]() { return new AsyncPrimeIterator(this._time); } }

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 62 / 66 —

1 1008 2 1011 3 1012

the iterator does not (necessarily) await for the promise it yields

Asynchronous Iterators and Iterables Testing asynchronous iterators

letp= new TickAsyncIterator(1000); letd= Date.now(); p.next().then(v => console.log(v.value, Date.now()-d)); p.next().then(v => console.log(v.value, Date.now()-d)); p.next().then(v => console.log(v.value, Date.now()-d));

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 63 / 66 Asynchronous Iterators and Iterables Testing asynchronous iterators

letp= new TickAsyncIterator(1000); letd= Date.now(); p.next().then(v => console.log(v.value, Date.now()-d)); p.next().then(v => console.log(v.value, Date.now()-d)); p.next().then(v => console.log(v.value, Date.now()-d));

1 1008 2 1011 3 1012

the iterator does not (necessarily) await for the promise it yields

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 63 / 66 Asynchronous Iterators and Iterables Example

class TickAsyncIterator {

constructor(time) { this._time= time; this._ticks=0; this._done= false; }

async next() { if(this._done) return { done: true}; letv= await sleep(this._time, ++this._ticks); return { done: false, value: v }; };

return() { this._done= true; return this.next(); } }

class TickAsyncIterable { constructor(time) { this._time= time; } [Symbol.asyncIterator]() { return new TickAsyncIterator(this._time); } }

Here, await guarantees that the yielded promise is resolved

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 64 / 66 Asynchronous Iterators and Iterables Testing asynchronous iterators that await before yielding

async function f() { letp= new TickAsyncIterator(1000); letd= Date.now(); await p.next().then(v => console.log(v.value, Date.now()-d)); await p.next().then(v => console.log(v.value, Date.now()-d)); await p.next().then(v => console.log(v.value, Date.now()-d)); }; f();

1 1003 2 2013 3 3018

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 65 / 66 Asynchronous Iterators and Iterables for..await..of

Asynchronous iterables (will be) integrated with the rest of the language With generators (yielding promises), with loops

function mkDeferred() { let rr, jj; letp= new Promise((r, j) => { rr= r; jj= j; }); p.resolve= rr; p.reject= jj; return p; }

let mpos= mkDeferred(); document.addEventListener("mousemove", evt => { mpos.resolve(evt); mpos= mkDeferred(); });

async function* mouseMoves() { while(true){ let evt= await mpos; yield "("+ evt.clientX+ ","+ evt.clientY+ ")"; } }

async function() { for await (m of mouseMoves()) console.log(m); }();

Jaakko Järvi (University of Bergen) Coroutines <2018-11-08 Thu> 66 / 66 Taste of Distributed Systems

Jaakko Järvi

University of Bergen

<2018-11-20 Tue>

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 1 / 31 Outline

1 Introduction

2 Computational Model

3 Consensus in Synchronous Distributed Systems

4 Time in distributed systems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 2 / 31 Introduction Outline

1 Introduction

2 Computational Model

3 Consensus in Synchronous Distributed Systems

4 Time in distributed systems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 3 / 31 Important and complex topic ignored: Next: a small taste

Introduction Class thus far

Programs run on one computer Centrally coordinated Processes are assumed not to fail Message delivery is assumed not to fail

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 4 / 31 Introduction Class thus far

Programs run on one computer Centrally coordinated Processes are assumed not to fail Message delivery is assumed not to fail

Important and complex topic ignored: distributed computing Next: a small taste

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 4 / 31 Programming distributed systems is difficult Tricky algorithms how to get a group of possibly faulty independent agents to converge to a desired result Tricky proofs of those algorithms

Introduction Distributed Systems

Composed of independently executing activities Typically in different computers No central authority, no central clock Loosely coupled Connections not fixed Message delivery may fail, varying latencies Agents can come and go Uncertainty Agents can fail, behave erratically, even maliciously

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 5 / 31 Introduction Distributed Systems

Composed of independently executing activities Typically in different computers No central authority, no central clock Loosely coupled Connections not fixed Message delivery may fail, varying latencies Agents can come and go Uncertainty Agents can fail, behave erratically, even maliciously

Programming distributed systems is difficult Tricky algorithms how to get a group of possibly faulty independent agents to converge to a desired result Tricky proofs of those algorithms

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 5 / 31 Introduction Distributed Systems ubiquitous

Replicated Data Bases How is Facebook’s data stored? Software fault-tolerance Communication networks Mobile ad-hoc networks Grid computing Blockchain Robot swarms

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 6 / 31 Many are some forms of consensus problems

Introduction Typical Problems

Distributed Graph algorihtms Leader election Mutual exclusion Consistency Think of “Coordinated Attack Problem” Notions of time and causality

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 7 / 31 Introduction Typical Problems

Distributed Graph algorihtms Leader election Mutual exclusion Consistency Think of “Coordinated Attack Problem” Notions of time and causality

Many are some forms of consensus problems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 7 / 31 Introduction About Consensus in Asynchronous Systems

FLP Theorem: “Distributed Consenus is Impossible with One Faulty Process” It is possible that all processes stay undecided forever Randomized algorithms can make probability of non-decision near 0, for long enough runs Programming without consensus tricky “Conflict-free replicated data types” Concurrently updatable objects that eventually converge to consensus state if all updates are performed by all replicas Updates monotonous operations, merges associative and commutative

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 8 / 31 Goal often self-stabilization distributed system recovers from failures that cause corruption of state

Introduction Failures

crash faulty processor stops Byzantine faulty processor can do anything, even be adverserial

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 9 / 31 Introduction Failures

crash faulty processor stops Byzantine faulty processor can do anything, even be adverserial

Goal often self-stabilization distributed system recovers from failures that cause corruption of state

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 9 / 31 Computational Model Outline

1 Introduction

2 Computational Model

3 Consensus in Synchronous Distributed Systems

4 Time in distributed systems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 10 / 31 Computational Model Message Passing

p0, p1,..., pn 1 processors − bidirectional channels between processors, labelled in each processor processors may not know the other end of a channel Processor’s state is its local variables and concents of incoming message buffer Configuration/snapshot of the system Each processor’s state Messages on the fly (outgoing buffers)

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 11 / 31 Execution configuration event configuration event configuration ... → → → →

Computational Model Abstraction of Events in Distributed System

Delivery Event Move message from out-buffer to in-buffer, making it available to receiver Computation Event State transition that Handles all incoming messages Possibly modifies local variables Ends with empty in-buffer, and new messages in out-buffers

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 12 / 31 Computational Model Abstraction of Events in Distributed System

Delivery Event Move message from out-buffer to in-buffer, making it available to receiver Computation Event State transition that Handles all incoming messages Possibly modifies local variables Ends with empty in-buffer, and new messages in out-buffers

Execution configuration event configuration event configuration ... → → → →

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 12 / 31 Computational Model Simple example: Flooding

Processor’s local state: color (black/white)

Initially, p0 black, p0s out-buffers contain flood message B Transition for all pi if message B is in in-buffer and color is white, change color to black and send B

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 13 / 31 Computational Model Nondeterminism

Several possible executions (sequences of events) possible, depending on delays (order) in message delivery

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 14 / 31 Computational Model Complexity measures

Message complexity maximum number of messages sent in any (admissible) execution size of messages Time complexity maximum “time” until termination Set arbitrary upper limit (1) for any message delivery delay sets limit to worst case message delivery delay (1), allows arbitrary interleavings

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 15 / 31 Computational Model Flooding Algorithms Complexity

Terminating states: color is black Message complexity 2m, m number of channels One message over each channel (edge) in each directiom Time complexity diameter + 1 Black from node a reaches a node b through the shortest path between a and b graph’s diameter is the greatest distance between any two nodes

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 16 / 31 Consensus in Synchronous Distributed Systems Outline

1 Introduction

2 Computational Model

3 Consensus in Synchronous Distributed Systems

4 Time in distributed systems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 17 / 31 Consensus in Synchronous Distributed Systems Synchronous Distributed Systems

Execution is an infinite sequence of rounds On each round: perform all delivery events: move all sent messages into in-buffers of their receivers

perform a computation events on each processor pi

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 18 / 31 Consensus in Synchronous Distributed Systems Consensus Results in a Synchronous System

crash failures Byzantine failures number of rounds f + 1 f + 1 number of processors f + 1 3f + 1 message size polynomial polynomial

at most f faulty processors the above are tight bounds

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 19 / 31 Consensus in Synchronous Distributed Systems Consensus algorithm for Crash failures

Example: each processor holds value vi , agree on minimum f the maximum number of faulty processors tolerated Each processor’s transition function: vsent= false; for (round=1tof+1){ if(!vsent) { send v to all; vsent= true;} receive all vi // from the non-faulty ones if (some vi

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 20 / 31 Time in distributed systems Outline

1 Introduction

2 Computational Model

3 Consensus in Synchronous Distributed Systems

4 Time in distributed systems

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 21 / 31 Time in distributed systems Time in distributed systems

Our perception of time is that all events are totally ordered by when they occur But time is relative noticeably so in a distributed system, where different processors’ “clocks” are not in sync Leslie Lamport’s seminal paper (78): Time, Clocks, and the Ordering of Events in a Distributed System

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 22 / 31 irreflexive transitive antisymmetric (implied by the above two)

Time in distributed systems Happens Before Relation

a b denotes that an event a happened before b → 1 if a and b occur on the same processor and a precedes b, then a b → 2 if a is the sending of message m and b the receiving of m, then a b → 3 if a b and b c, then a c → → → 4 Happens before is an strict partial order

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 23 / 31 Time in distributed systems Happens Before Relation

a b denotes that an event a happened before b → 1 if a and b occur on the same processor and a precedes b, then a b → 2 if a is the sending of message m and b the receiving of m, then a b → 3 if a b and b c, then a c → → → 4 Happens before is an strict partial order irreflexive transitive antisymmetric (implied by the above two)

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 23 / 31 Time in distributed systems Concurrent events

If a b and b a then a and b are concurrent 6→ 6→ a b k

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 24 / 31 Time in distributed systems Logical clocks

For each process Pi , logical clock is function Ci that assigns a number Ci (a) to any event in the process The entire system of logical clocks is function C such that C(a) = Ci (a) if a is an event in Pi C satisfies clock condition: For all events a and b, if a b then C(a) < C(b) → If C should define a total ordering, use process id to order concurrent events

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 25 / 31 Note, happens before is a partial order but clock values (integers) are a total order a b C(a) < C(b) C(→a) <⇒C(b) a b 6⇒ →

Time in distributed systems Logical Timestamps Algorithm

Each process keeps a counter ci , initialized to 0

Timestamp every message sent by Pi with ci At every computational event a, increment ci to be greater than its current value all the time stamps received on this event

C(a) is then the value of ci

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 26 / 31 Time in distributed systems Logical Timestamps Algorithm

Each process keeps a counter ci , initialized to 0

Timestamp every message sent by Pi with ci At every computational event a, increment ci to be greater than its current value all the time stamps received on this event

C(a) is then the value of ci

Note, happens before is a partial order but clock values (integers) are a total order a b C(a) < C(b) C(→a) <⇒C(b) a b 6⇒ →

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 26 / 31 Time in distributed systems Vector Clocks

For vector clock V we want a b if and only if V (a) < V (b) → i Each process pi keeps an n-vector v (one element for each process) i vj is pi ’s estimate of how many steps pj has taken Vector operations

v = w iff vi = wi for all i v w iff v w for all i ≤ i ≤ i v < w iff v w and v = w ≤ 6 v w iff v w and w v k 6≤ 6≤ Examples (2, 1, 3) = (2, 1, 3) (2, 1, 3) (3, 1, 3) ≤ (2, 1, 3) < (3, 1, 4) (2, 1, 3) (2, 0, 5) k

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 27 / 31 Time in distributed systems Vector Timestamps Algorithm

Initialize all v i to the 0 vector. i With every message, pi sends its vector v . i At every computational event, increment vi by one. At reseaving a message with vector t, pi updates its clock vector, i i i other than vi , so that for all j, vj = max(tj , vj ) i For an event a at pi , V (a) = v at the end of a

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 28 / 31 Time in distributed systems Correctness

1 a b implies V (a) < V (b) → i 1 if a and b on processor i, vi increases on every step j 2 if a on pi sends m which b on pj receives, pj updates v so that V (a) V (b) ≤ i j j the estimate vj is never an overestimate of vj ; since b increases vj , V (a) < V (b) 3 if c, s.t. a c and c b ∃ → → by induction from 1. and 2., and transitivity of < 2 V (a) < V (b) implies a b; i.e. a b implies V (a) < V (b) → 6→ 6 Assume a occurrs at p , b at p and a b. Let k = V (a) . i j 6→ i Since a b there is no chain of messages from pi to pj originating on th 6→ pi ’s k step or later and ending at pj before b Therefore V (b)i < k. Therefore V (a) < V (b). 6

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 29 / 31 Time in distributed systems About vector timestamps

They are big n components each values grow without bound There is no more efficient way n is a tight bound

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 30 / 31 Each processor stores a snapshot periodically, including vector clock value To restore the system, must find a (recent) consistent snapshot, i.e., a consistent cut that is K. ≤ a cut of execution is K = (k0,..., kn1 ), where ki is number steps in processor pi in a consistent cut if step s in pi happens before step kj of pj , then s k . ≤ i Every received message in the cut was sent within the cut

Time in distributed systems Applications

Recover distributed system from crash

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 31 / 31 Time in distributed systems Applications

Recover distributed system from crash

Each processor stores a snapshot periodically, including vector clock value To restore the system, must find a (recent) consistent snapshot, i.e., a consistent cut that is K. ≤ a cut of execution is K = (k0,..., kn1 ), where ki is number steps in processor pi in a consistent cut if step s in pi happens before step kj of pj , then s k . ≤ i Every received message in the cut was sent within the cut

Jaakko Järvi (University of Bergen) Taste of Distributed Systems <2018-11-20 Tue> 31 / 31