4/18/2017

Synchronization Mechanisms &Problems Semaphores – signaling devices

7H. Semaphores when direct communication was not an option e.g. between villages, ships, trains 7I. Producer/Consumer Problems 7J. Object Level Locking 7K. Bottlenecks, Contention and Granularity

Synchronization Mechanisms and Problems 1

Semaphores - History Semaphores - Operations

• Concept introduced in 1968 by Edsger Dijkstra • has two parts: – cooperating sequential processes – an integer counter (initial value unspecified) • THE classic synchronization mechanism – a FIFO waiting queue – behavior is well specified and universally accepted • P (proberen/test) ... “wait” – a foundation for most synchronization studies – decrement counter, if count >= 0, return – a standard reference for all other mechanisms – if counter < 0, add process to waiting queue • more powerful than simple locks • V (verhogen/raise) ... “post” or “signal” – they incorporate a FIFO waiting queue – increment counter – they have a counter rather than a binary flag – if counter >= 0 & queue non-empty, wake 1 st proc

Synchronization Mechanisms and Problems 3 Synchronization Mechanisms and Problems 4

using semaphores for exclusion Semaphores - for exclusion

struct account { • initialize semaphore count to one struct semaphore s; /* initialize count to 1, queue empty, 0 */ int balance; count reflects # threads allowed to hold lock … – };

• use P/wait operation to take the lock int write_check( struct account *a, int amount ) { int ret; – the first will succeed p( &a->semaphore ); /* get exclusive access to the account */ if ( a->balance >= amount ) { /* check for adequate funds */ – subsequent attempts will block amount -= balance; ret = amount; • use V/post operation to release the lock } else ret = -1; – restore semaphore count to non-negative v( &a->semaphore ); /* release access to the account */ return( ret ); – if any threads are waiting, unblock the first in line }

Synchronization Mechanisms and Problems 5 Synchronization Mechanisms and Problems 6

1 4/18/2017

using semaphores for notifications Semaphores - completion events

struct semaphore pipe_semaphore = { 0, 0, 0 }; /* count = 0; pipe empty */ • initialize semaphore count to zero char buffer[BUFSIZE]; int read_ptr = 0, write_ptr = 0; char pipe_read_char() { – count reflects # of completed events p (&pipe_semaphore ); /* wait for input available */ c = buffer[read_ptr++]; /* get next input character */ • use P/wait operation to await completion if (read_ptr >= BUFSIZE) /* circular buffer wrap */ read_ptr -= BUFSIZE; return(c); – if already posted, it will return immediately } – else all callers will block until V/post is called void pipe_write_string( char *buf, int count ) { while( count-- > 0 ) { • use V/post operation to signal completion buffer[write_ptr++] = *buf++; /* store next character */ if (write_ptr >= BUFSIZE) /* circular buffer wrap */ write_ptr -= BUFSIZE; – increment the count v( &pipe_semaphore ); /* signal char available */ } – if any threads are waiting, unblock the first in line } • one signal per wait: no broadcasts

Synchronization Mechanisms and Problems 7 Synchronization Mechanisms and Problems 8

Counting Semaphores Implementing Semaphores

• initialize semaphore count to ... void sem_wait(sem_t *s) { pthread_mutex_lock(&s->lock); – count reflects # of available resources while (s->value <= 0) • use P/wait operation to consume a resource pthread_cond_wait(&s->cond, &s->lock); s->value--; – if available, it will return immediately pthread_mutex_unlock(&s->lock); – else all callers will block until V/post is called } void sem_post(sem_t *s) { • use V/post operation to produce a resource pthread_mutex_lock(&s->lock); – increment the count s->value++; pthread_cond_signal(&s->cond); – if any threads are waiting, unblock the first in line pthread_mutex_unlock(&s->lock) • one signal per wait: no broadcasts }

Synchronization Mechanisms and Problems 9 Synchronization Mechanisms and Problems 10

Implementing Semaphores in OS (locking to solve sleep/wakeup race) void sem_wait(sem_t *s ) { for (;;) { • requires a spin-lock to work on SMPs save = intr_enable( ALL_DISABLE ); while( TestAndSet( &s->lock ) ); – sleep/wakeup may be called on two processors if (s->value > 0) { – the critical section is short and cannot block s->value--; void sem_post(struct sem_t *s) { s->sem_lock = 0; struct proc_desc *p = 0; – we must spin, because we cannot sleep ... the lock we intr_enable( save ); save = intr_enable( ALL_DISABLE ); need is the one that protects the sleep operation return; while ( TestAndSet( &s->lock ) ); } s->value++; • also requires interrupt disabling in sleep add_to_queue( &s->queue, myproc ); if (p = get_from_queue( &s->queue )) { myproc->runstate |= PROC_BLOCKED; p->runstate &= ~PROC_BLOCKED; – wakeup is often called from interrupt handlers s->lock = 0; } intr_enable( save ); s->lock = 0; – interrupt possible during sleep/wakeup critical section yield(); intr_enable( save ); – If spin-lock already is held, wakeup will block for ever } if (p) } reschedule( p ); • very few operations require both of these } Synchronization Mechanisms and Problems 11 Synchronization Mechanisms and Problems 12

2 4/18/2017

Limitations of Semaphores Using Condition Variables

• semaphores are a very spartan mechanism pthread_mutex_t lock = PTHEAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHEAD_COND_INITIALIZER; – they are simple, and have few features … – more designed for proofs than synchronization pthread_mutex_lock(&lock); • they lack many practical synchronization features while (ready == 0) pthread_cond_wait(&cond, &lock); – It is easy to with semaphores pthread_mutex_unlock(&lock) – one cannot check the lock without blocking … – they do not support reader/writer shared access if (pthread_mutex_lock(&lock)) { ready = 1; – no way to recover from a wedged V'er pthread_cond_signal(&cond); – no way to deal with priority inheritance pthread_mutex_unlock(&lock); • none the less, most OSs support them }

Synchronization Mechanisms and Problems 13 Synchronization Mechanisms and Problems 14

Bounded Buffer Problem w/CVs Producer/Consumer w/Semaphores void producer( FIFO *fifo, char *msg, int len ) { void producer( FIFO *fifo, char *msg, int len ) { for( int i = 0; i < len; i++ ) { for( int i = 0; i < len; i++ ) { pthread_mutex_lock(&mutex); sem_wait(&empty); while (fifo->count == MAX) sem_wait(&mutex); pthread_cond_wait(&empty, &mutex); put(fifo, msg[i]); put(fifo, msg[i]); pthread_cond_signal(&fill); sem_post(&mutex); pthread_mutex_unlock(&mutex); sem_post(&full); } } void consumer( FIFO *fifo, char *msg, int len ) { void consumer( FIFO *fifo, char *msg, int len ) { } } for( int i = 0; i < len; i++ ) { for( int i = 0; i < len; i++ ) { pthread_mutex_lock(&mutex); sem_wait(&full); while (fifo->count == 0) sem_wait(&mutex); pthread_cond_wait(&fill, &mutex); msg[i] = get(fifo); msg[i] = get(fifo); pthread_cond_signal(&empty); sem_post(&mutex); pthread_mutex_unlock(&mutex); sem_post(&empty); } } } } Synchronization Mechanisms and Problems 15 Synchronization Mechanisms and Problems 16

Object Level Locking Whole File Locking • mutexes protect code critical sections int flock( fd , operation ) – brief durations (e.g. nanoseconds, milliseconds) • supported operation s: – other threads operating on the same data – LOCK_SH … shared lock (multiple allowed) – all operating in a single address space – LOCK_EX … exclusive lock (one at a time) • persistent objects are more difficult – LOCK_UN … release a lock – critical sections are likely to last much longer • lock is associated with an open file descriptor – many different programs can operate on them – lock is released when that file descriptor is closed – may not even be running on a single computer • locking is purely advisory • solution: lock objects (rather than code) – does not prevent reads, writes, unlinks

Synchronization Mechanisms and Problems 17 Synchronization Mechanisms and Problems 18

3 4/18/2017

Advisory vs Enforced Locking Ranged File Locking

• Enforced locking int lockf(fd , cmd, offset, len ) – done within the implementation of object methods • supported cmds : – guaranteed to happen, whether or not user wants it – F_LOCK … get/wait for an exclusive lock – may sometimes be too conservative – F_ULOCK … release a lock • Advisory locking – F_TEST/F_TLOCK … test, or non-blocking request – a convention that “good guys” are expected to follow – offset/len specifies portion of file to be locked – users expected to lock object before calling methods • lock is associated with a file descriptor – gives users flexibility in what to lock, when – lock is released when file descriptor is closed – gives users more freedom to do it wrong (or not at all) • locking may or may not be enforced – mutexes are advisory locks – depending on the underlying file system

Synchronization Mechanisms and Problems 19 Synchronization Mechanisms and Problems 20

Cost of not getting a Lock Probability of Conflict

• protect critical sections to ensure correctness • many critical sections are very brief – in and out in a matter of nano-seconds • blocking is much more (e.g. 1000x) expensive – micro-seconds to yield, context switch – milliseconds if swapped-out or a queue forms • performance depends on conflict probability

Cexpected = (Cget * (1-Pconflict )) + (Cblock * Pconflict )

Synchronization Mechanisms and Problems 21 Synchronization Mechanisms and Problems 22

Convoy Formation Performance: resource convoys • in general threads-1 Pconflict = 1 – (1 – (Tcritical / Ttotal )) (nobody else in critical section at the same time) ideal unless (or until) a FIFO queue forms • throughput threads Pconflict = 1 – (1 – ((Twait + Tcritical )/ Ttotal )) convoy if Twait >> Tcritical , Pconflict rises significantly

• if Twait exceeds the mean inter-arrival time the line becomes permanent, parallelism ceases, offered load (cheap) Tcritical is replaced by (expensive) Twait

Higher Level Synchronization 23 Synchronization Mechanisms and Problems 24

4 4/18/2017

Contention Reduction Reducing Time in Critical Section

• eliminate the critical section entirely • eliminate potentially blocking operations – eliminate shared resource, use atomic instructions allocate required memory before taking lock • eliminate preemption during critical section – – by disabling interrupts … not always an option – do I/O before taking or after releasing lock – avoid resource allocation within critical section • minimize code inside the critical section • reduce time spent in critical section – only code that is subject to destructive races – reduce amount of code in critical section • reduce frequency of critical section entry – move all other code out of the critical section – reduce use of the serialized resource – especially calls to other routines – reduce exclusive use of the serialized resource • cost: this may complicate the code – spread requests out over more resources – unnaturally separating parts of a single operation

Synchronization Mechanisms and Problems 25 Synchronization Mechanisms and Problems 26

Reduce Time or Preemption Reduced Use of Critical Section int List_Insert(list_t *l, int key) { pthread_mutex_lock(&l->lock); • can we use critical section less often node_t new = (node_t*) malloc(sizeof(node_t)); – less use of high-contention resource/operations if (new == NULL) { perror(“malloc”); – batch operations pthread_mutex_unlock(&l->lock); int List_Insert(list_t *l, int key) { return(-1); node_t new = (node_t*) malloc(sizeof(node_t)); • consider “sloppy counters” if (new == NULL) { } – move most updates to a private resource new->key = key; perror(“malloc”); new->next = l->head; return(-1); – costs: } l->head = new; • global counter is not always up-to-date pthread_mutex_unlock(&l->lock); new->key = key; return 0; pthread_mutex_lock(&l->lock); • failure could lose many updates } new->next = l->head; l->head = new; – alternative: pthread_mutex_unlock(&l->lock); • sum single-writer private counters when needed return 0; } Synchronization Mechanisms and Problems 27 Synchronization Mechanisms and Problems 28

Non-Exclusivity: read/write locks Spreading requests: lock granularity

• reads and writes are not equally common • coarse grained - one lock for many objects – file read/write: reads/writes > 50 – simpler, and more idiot-proof – directory search/create: reads/writes > 1000 – greater resource contention (threads/resource) • only writers require exclusive access • fine grained - one lock per object (or sub-pool) – spreading activity over many locks reduces contention • read/write locks – dividing resources into pools shortens searches – allow many readers to share a resource – a few operations may lock multiple objects/pools – only enforce exclusivity when a writer is active • TANSTAAFL – policy: when are writers allowed in? – time/space overhead, more locks, more gets/releases • potential starvation if writers must wait for readers – error-prone: harder to decide what to lock when

Synchronization Mechanisms and Problems 29 Synchronization Mechanisms and Problems 30

5 4/18/2017

Partitioned Hash Table Mid-Term Exam

int Hash_Insert(hash_t *h, int key) { • When int bucket = key % h->num_buckets; Thursday, the full 110 minute period list_t *l = &h->lists[bucket]; – return List_Insert(l, key); • Value } – 15% of course grade • Each list_t is still protected by a lock • Form and content – but contention has been greatly reduced – 10 multi-part, brief-answer questions • covering all lectures and reading to date • Partitioning function must be race-free • based on key learning objectives – no critical-section to protect – one hard extra credit question – per partition load depends on request randomness • similar to those on part II of the final

Synchronization Mechanisms and Problems 31 Synchronization Mechanisms and Problems 32

Example: P and V void p(struct semaphore *s) { struct proc_desc *p = 0; void v(struct semaphore *s) { for (;;) { struct proc_desc *p = 0; save = intr_enable( ALL_DISABLE ); while ( TestAndSet( &s->sem_lock ) ); save = intr_enable( ALL_DISABLE ); if (s->sem_count > 0) { while ( TestAndSet( &s->sem_lock ) ); s->sem_cont--; s->sem_count++; s->sem_lock = 0; if (p = get_from_queue( &s->sem_queue)) { intr_enable( save ); p->runstate &= ~PROC_BLOCKED; return; Supplementary Slides } } s->sem_lock = 0; add_to_queue( &s->sem_queue, myproc ); intr_enable( save ); myproc->runstate |= PROC_BLOCKED; if (p) s->sem_lock = 0; reschedule(p); intr_enable( save ); } yield(); } } A P P WAKE process B V V Semaphore lock NO YES YES YES YES YES count 0 1 0 1 0 queue 0 A int disable NO YES YES YES YES YES

Synchronization Mechanisms and Problems 34

Example: Producer/Consumer Active/Passive - the preemption thing

char pipe_read_char() { void pipe_write_string( char *buf, int count ) { • standard semaphore semantics are not complete

p (&pipe_semaphore ); while( count-- > 0 ) { – who runs after a V unblocks a P? c = buffer[read_ptr++]; buffer[write_ptr++] = *buf++ if (write_ptr >= BUFSIZE) – the running V'er or the blocked P'er if (read_ptr >= BUFSIZE) write_ptr -= BUFSIZE; read_ptr -= BUFSIZE; return(c); v( &pipe_semaphore ); • there are arguments for each behavior } } } – gratuitous context switches increase overhead

process A WRITE “abc” – producers and consumers should take turns process B READ WAKE READ – if we delay P'er, someone else may get semaphore buffer a b c • preemptive priority-based scheduler can do this read_ptr 0 1 2 0 123 write_ptr – reassess whenever someone wakes up sem count 0 132 2 1 – P'ers priority controls who will run after wake-up

Synchronization Mechanisms and Problems 35 Synchronization Mechanisms and Problems 36

6 4/18/2017

Where to put the locking Performance of Locking

• there is a choice about where to do locking • Locking typically performed as an OS system – A ,B require serialization, and are called by C,D call – should we lock in objects (A,B) or in callers (C,D) – Particularly for enforced locking • OO modularity says: as low as possible (in A,B) • Typical system call overheads for lock – correct locking is part of correct implementation operations • but as high as necessary (in C,D) • If they are called frequently, high overheads – locking needs may depend on how object is used • Even if not in OS, extra instructions run to lock – one logical transaction may span many method calls and unlock – in such cases, only the caller knows start/end/scope

Synchronization Mechanisms and Problems 37 Synchronization Mechanisms and Problems 38

Eliminating Critical Sections Locking Costs

• Eliminate shared resource • Locking called when you need to protect – Give everyone their own copy critical sections to ensure correctness – Find a way to do your work without it • Many critical sections are very brief • Use atomic instructions – In and out in a matter of nano-seconds – Only possible for simple operations • Overhead of the locking operation may be • Great when you can do it much higher than time spent in critical section • But often you can’t

Synchronization Mechanisms and Problems 39 Synchronization Mechanisms and Problems 40

Performance: lock contention

• The riddle of parallelism: – parallelism: if one task is blocked, CPU runs another – concurrent use of shared resources is difficult – critical sections serialize tasks, eliminating parallelism • What if everyone needs to use one resource? – one process gets the resource – other processes get in line behind him (convoy) – parallelism is eliminated; B runs after A finishes – that resource becomes a bottle-neck

Synchronization Mechanisms and Problems 41

7