“Too Many Cookies” Problem

u Roommates Lance and James want a bag of cookies in the room COS 318: Operating Systems at all times, but don’t’ want to buy too many cookies

u They buy cookies independently, using the following sequence

Synchronization: Mutual l Look in cabinet: Out of cookies Exclusion l Leave for Wawa to buy cookies l Arrive at Wawa Jaswinder Pal Singh and a Fabulous Course Staff l Buy a bag of cookies Computer Science Department l Arrive home and put cookies in cabinet Princeton University

(http://www.cs.princeton.edu/courses/cos318/)

2

“Too Many Cookies” Problem Using A Note?

James Lance James and Lance’s Cookie Optimization Algorithm: 15:00 Look in cabinet: out of cookies

15:05 Leave for Wawa if (noCookies) { // check if roommate left a note if (noNote) { 15:10 Arrive at Wawa Look in cabinet: out of cookies leave note; // let them know you went to Wawa 15:15 Buy a bag of cookies Leave for Wawa buy cookies; remove note; 15:20 Arrive home; put cookies away Arrive at Wawa } 15:25 Buy a bag of cookies }

Arrive home; put cookies away u Any issue with this approach?

u Oh No! Too many cookies.

3 4

1 Using A Note? Why Solution #1 Does Not Work

James Lance James Lance 3:00 if (noCookies) { if (noCookies) { if (noCookies) { if (noNote) { if (noNote) { if (noNote) { 3:05 leave note; leave note; 3:10 if (noCookies) { buy cookies; buy cookies; 3:15 if (noNote) { remove note; remove note; 3:20 leave Note; leave Note; } } 3:25 buy cookies; buy cookies; } } 3:30 remove Note } } remove Note} } u Any issue with this approach?

Threads can get context-switched at any time

Too many cookies! 5

Possible Solution #2: Leave Note First Possible Solution #3: One Spin-waits

u Problem was that threads checked once and moved on James Lance l So have one of them spin-wait on the note James Lance leave noteA leave noteB if (noNoteB) { if (noNoteA) { leave noteA leave noteB if (noCookies) { if(noCookies){ while (noteB) if (noNoteA) { buy cookies buy cookies do nothing; if (noCookies) { } } if (noCookies) buy cookies } } buy cookies; } remove noteA remove noteB remove noteA } remove noteB Didn’t buy cookies u Would this fix the problem? Didn’t buy u Yes, but complicated, different code for different threads, u Does this method work? cookies busy waiting wasteful, and not fair 7 8

2 Threads Example: Shared Counter Problem with Shared Counters

u Google gets millions of hits a day. Uses multiple threads (on multiple processors) to speed things up. u One possible result: lost update! u Simple shared state error: each increments a hits = 0 shared counter to track the number of hits today: T2 time T1 … read hits (0) hits = hits + 1; read hits (0) … hits = 0 + 1 hits = 0 + 1 u What happens when two threads execute this code hits = 1 concurrently?

Problem with Shared Counters Race Conditions

u Race condition: accesses to shared state that can lead u Another possible result: everything works! to a timing dependent error hits = 0 l Whether it happens depends on how threads are scheduled T2 u Difficult to avoid because: time T1 l Must make sure all possible schedules are safe. read hits (0) l Number of possible schedule permutations is huge. hits = 0 + 1 l One or more of them may be “bad” read hits (1) l They are intermittent hits = 1 + 1 l Timing dependent => small changes can hide or reveal bug hits = 2 • Adding a print statement • Running on a different machine u Another possible result: everything works u This is called a “race condition”

3 It’s Actually Even Worse Preventing Race Conditions: Atomicity

u Compilers reorder instruction issue within a thread u Atomic unit = instruction sequence guaranteed to l To optimize register usage and hence run code faster execute indivisibly (also called a “critical section”). • If two threads execute the same atomic unit at the same time, one u Hardware reorders instruction execution/completion thread will execute the whole sequence before the other begins l E.g. write buffers, etc hits = 0 u All done to optimize execution speed T2 time T1 u But they don’t know about multiple threads and issues hits = hits + 1 across them hits = hits + 1

hits = 2

u How to make multiple instrs seem like an atomic one? 13

Providing Atomicity Preventing Race Conditions: Atomicity

u Have hardware provide better primitives than atomic u Counter problem load and store. Acquire(); u Build higher-level programming abstractions on this new hits = hits + 1;) Critical section hardware support. Release(lock);

u Example: locks

Acquire --- wait until lock is free, then grab it Release --- unlock/release the lock, waking up a waiter if any

These must be atomic operations --- if two threads are waiting for the lock, and both see it is free, only one grabs it

17

4 Preventing Race Conditions: Atomicity Rules for Using Locks

u Cookies problem u Lock is initially free

u Always acquire before accessing shared data structure Acquire(lock); if (noCookies) Critical section u Always release after finishing with shared data buy cookies; l Only the lock holder can release Release(lock); u Don’t access shared data without lock

Desirable Properties: 1. At most one holder, or thread in critical section, at a time (safety) 2. If no one is holding the lock, an acquire gets the lock (progress) 3. If all lock holders finish and there are no higher priority waiters, waiter eventually gets the lock (progress)

18

Some Definitions Implementing (Locks)

u Sychronization: What makes a good solution? l Ensuring proper cooperation among threads l Mutual exclusion, event synchronization u Only one process/thread inside a critical section at a time u Mutual exclusion: u No assumptions need to be made about CPU speeds l Ensuring that only one thread does a particular thing at a time. One thread u A process/thread inside a critical section should not be doing it excludes another from doing it at the same time. blocked by any process outside the critical section u Critical section: l Piece of code that only one thread can “be in” at a given time. Only one u No one waits forever thread at a time will be allowed to get into the section of code.

u Lock: prevents someone from doing something u Should work for multiprocessors l Lock before entering critical section, before accessing shared data l Unlock when leaving, after done accessing shared data u Should allow same code for all processes/threads l Wait if locked u Event synchronization: l Making sure an event in one thread does not happen before/after an event in another thread 24

5 Simple Lock Variables Prevent Context Switches in Critical Section

Acquire(lock) { Release(lock) { u On a uniprocessor, operations are atomic as long as a while (lock.value == 1) lock.value = 0; doesn’t occur ; } lock.value = 1; u Context switches are caused either by actions the } thread takes (e.g. traps etc) or by external u Thread 1: Thread 2: The former can be controlled Acquire(lock) { while (lock.value == 1) ; u Disable interrupts during certain portions of code? {context switch) Acquire(lock) { l Delay the handling of external events while (lock.value == 1) ; {context switch) lock.value = 1; } {context switch) lock.value = 1; } 25 26

Why Enable or Disable Interrupts Disabling Interrupts for Critical Section?

u Interrupts are important l Process I/O requests (e.g. keyboard) Acquire(): disable interrupts Acquire() l Implement preemptive CPU Release(): enable interrupts critical section? u Disabling interrupts can be helpful Release() l Introduce uninterruptible code regions l Think sequentially most of the time l Delay handling of external events Issues: DisableInt() l Critical sections can be arbitrarily long . Uninterruptible . region . EnableInt()

30 31

6 “Disable Interrupts” to Implement Mutex Fix “Disable Forever” problem?

Acquire(lock) { Release(lock) { Acquire(lock) { Release(lock) { disable interrupts; disable interrupts; disable interrupts; disable interrupts; while (lock.value != 0) lock.value = 0; while (lock.value != 0){ lock.value = 0; ; enable interrupts; enable interrupts; enable interrupts; lock.value = 1; } disable interrupts; } enable interrupts; } } lock.value = 1; enable interrupts; u Don’t let acquire be interrupted before sets lock.value to 1 } l This was the problem when interrupts weren’t disabled u Enable interrupts during spin loop u Don’t disable interrupts for entire critical section u Disable interrupts only when accessing lock.value u Issues: l Cannot be interrupted after loop and before setting value l May disable interrupts forever Issues: l Consume a lot of CPU cycles doing enable and disable 32 33

Another Implementation Atomic Operations

u A thread executing an atomic instruction can’t be Acquire(lock) { Release(lock) { disable interrupts; disable interrupts; preempted or interrupted while it’s doing it if (lock.value != 0) if (anyone in queue) { u Atomic operations on same memory value are serialized { Dequeue a thread; Enqueue me for lock; make it ready; l Even on multiprocessors! Yield(); } l Result is consistent with some sequential ordering of operations } lock.value = 0; lock.value = 1; enable interrupts; l Without atomic ops, simultaneous writes by different threads enable interrupts; } may produce a garbage value, or read that happens } simultaneously with a write may read garbage value Avoid busy-waiting u Don’t usually require special privileges, can be user level

Issues l based approaches don’t work for multiprocessors l Cannot allow user code to disable interrupts

34 35

7 Peterson’s Algorithm Atomic Read-Modify-Write Instructions u See textbook u LOCK prefix in x86 int turn; l Make a specific of set instructions atomic int interested[N]; l Can be used to implement Test&Set

void enter_region(int process) u Exchange (xchg, x86 architecture) { l Swap register and memory int other; l Atomic (even without LOCK) other = 1 – process; u Fetch&Add or Fetch&Op interested[process] = TRUE; /* express interest */ l Atomic instructions for large shared memory multiprocessors turn = other; /* give turn to other process */ while(turn == process && interested[other] == TRUE); u Load linked and store conditional (LL-SC) /* wait till other loses interest or gives me turn */ l Two separate instructions (LL, SC) that are used together } l Read value in one instruction (load linked) u L. Lamport, “A Fast Mutual Exclusion Algorithm,” ACM Do some operations; Trans. on Computer Systems, 5(1):1-11, Feb 1987. l When time to store, check if value has been modified. If not, l 5 writes and 2 reads ok; otherwise, jump back to start

36 37

A Simple Solution with Test&Set Mutex with Less Waiting? u Define TAS(lock) l If successfully set (wasn’t already set when tested but this Release(lock) { operation set it), return 1; Acquire(lock) { while (!TAS(lock.guard)) while (!TAS(lock.guard)) l Otherwise, return 0; ; ; if (anyone in queue) { u Any issues with the following solution? if (lock.value) { enqueue the thread; dequeue a thread; Acquire(lock) { make it ready; while (!TAS(lock.value)) block and lock.guard = 0; } else { } else ; lock.value = 1; lock.value = 0; } lock.guard = 0; lock.guard = 0; } } Release(lock.value) { } lock.value = 0; } u Separate access to lock variable from value of it

38 39

8 Example: Protect a Shared Variable Available Primitives and Operations

u Test-and-set Acquire(lock); /* system call */ count++; l Works at either user or kernel level Release(lock) /* system call */

u System calls for block/unblock u Acquire(mutex) system call l Block takes some token and goes to sleep l Pushing parameter, sys call # onto stack “ ” l Generating trap/interrupt to enter kernel l Unblock wakes up a waiter on token l Jump to appropriate function in kernel l Verify process passed in valid pointer to mutex l Minimal spinning l Block and unblock process if needed l Get the lock u Execute “count++;” u Release(mutex) system call

40 41

Block and Unblock System Calls Always Block

Block( lock ) Unblock( lock ) l Spin on lock.guard l Spin on lock.guard Acquire(lock) { Release(lock) { l Save the context to TCB l Dequeue a TCB from lock.q while (!TAS(lock.value)) lock.value = 0; l Enqueue TCB to lock.q l Put TCB in ready queue Block( lock ); Unblock( lock ); l Clear lock.guard l Clear lock.guard } } l Call scheduler

u Good l Acquire won’t make a system call if TAS succeeds u Bad l TAS instruction locks the memory bus l Block/Unblock still has substantial overhead

42

9 Always Spin Optimal Algorithms

u What is the optimal solution to spin vs. block? Acquire(lock) { Release(lock) { while (!TAS(lock.value)) lock.value = 0; l Know the future while (lock.value) } l Exactly when to spin and when to block ; } u But, we don’t know the future l There is no online optimal algorithm u Two spinning loops in Acquire()?

CPU CPU CPU CPU … … L1 $ L1 $ L1 $ L1 $ L2 $ L2 $ TAS L2 $ TAS u Offline optimal algorithm Memory l Afterwards, derive exactly when to block or spin (“what if”) l Useful to compare against online algorithms Multicore SMP 44 45

Competitive Algorithms Constant Competitive Algorithms u An algorithm is c-competitive if Acquire(lock, N) { for every input sequence s int i;

while (!TAS(lock.value)) { CA(s) ≤ c × Copt(s) + k i = N; while (!lock.value && i) i--; l c is a constant

l CA(s) is the cost incurred by algorithm A in processing s if (!i) Block(lock); l C ( ) is the cost incurred by the optimal algorithm in opt s } processing s } u Spin up to N times if the lock is held by another thread u What we want is to have c as small as possible u If the lock is still held after spinning N times, block l Deterministic u l Randomized If spinning N times is equal to the context-switch time, what is the competitive factor of the algorithm?

46

10 Approximate Optimal Online Algorithms The Big Picture u Main idea Concurrent applications/software l Use past to predict future u Approach Shared l Random walk Barrier Bounded Buffer Objects • Decrement N by a unit if the last Acquire() blocked • Increment N by a unit if the last Acquire() didn’t block High-Level Monitors/Con l Recompute N each time for each Acquire() based on some Atomic API Mutex Semaphores dition Send/Recv lock-waiting distribution for each lock (portable) Variables

Low-Level Interrupt Other atomic u Atomic Ops Load/store Test&Set Theoretical results instructions (specific) disable/enable E CA(s (P)) ≤ (e/(e-1)) × E Copt(s(P))

Interrupts Multiple The competitive factor is about 1.58. (I/O, timer) processors

48 50

Summary u Disabling interrupts for mutex l There are many issues l When making it work, it works for only uniprocessors u Atomic instruction support for mutex l Atomic load and stores are not good enough l Test&set and other instructions are the way to go u Competitive spinning l Spin at the user level most of the time l Make no system calls in the absence of contention l Have more threads than processors

51

11