System Building Events, Co-routines, and Threads • General purpose systems need to deal with – Many activities - • potentially overlapping OS (and application) • may be interdependent – Activities that depend on external phenomena Execution Models • may requiring waiting for completion (e.g. disk read) • reacting to external triggers (e.g. interrupts) • Need a systematic approach to system structuring

2

© Kevin Elphinstone

Events Model • External entities generate (post) events. Memory • The event model only – keyboard presses, mouse clicks, system calls requires a single stack CPU Event • Event loop waits for events and calls an Loop – All event handlers must PC Event return to the event loop appropriate event handler. SP Handler 1 • No blocking – common paradigm for GUIs REGS Event • No yielding Handler 2 • Event handler is a function that runs until Event • No of Handler 3 completion and returns to the event loop. handlers Data – Handlers generally short lived

Stack • No concurrency issues within a handler

3 4

© Kevin Elphinstone © Kevin Elphinstone

Event-based kernel on CPU Event-based kernel on CPU with protection with protection Kernelonly Memory User Memory Kernelonly Memory User Memory CPU CPU PC Event Trap SP Dispatcher Loop Scheduling? User Scheduler User REGS PC Event Code Event Code SP Handler 1 Handler 1 REGS • Userlevel state in Event Event PCB PCB Handler 2 User Handler 2 A User Event Data Event Data • Kernel starts on Handler 3 • Huh? Handler 3 PCB fresh stack on each B trap • How to support Data Data PCB • No interrupts, no Stack multiple C Stack blocking in kernel Stack processes? Stack mode

5 6

© Kevin Elphinstone © Kevin Elphinstone

1 Co-routines Co-routines

Memory • yield() saves state of routine • A subroutine with extra entry and exit A and starts routine B points CPU – or resumes B’s state from its Routine A previous yield() point. PC • No preemption • Via yield() SP REGS Routine B • No concurrency issues/races – supports long running subroutines as globals are exclusive – subtle variations (yieldto, asymmetric and Data between yields() • Implementation strategy? symmetric) Stack • Also called Fibers

7 8

© Kevin Elphinstone © Kevin Elphinstone

Co-routines Co-routines

Memory • Usually implemented with a Memory • Routine A state currently stack per routine loaded CPU CPU Routine A • Preserves current state of Routine A • Routine B state stored on execution of the routine stack PC PC SP SP • Routine switch from A → B Routine B Routine B REGS REGS – saving state of A a • regs, sp, pc Data Data – restoring the state of B • regs, sp, pc Stack Stack Stack Stack A B A B

9 10

© Kevin Elphinstone © Kevin Elphinstone

A hypothetical yield() Save the registers yield: /* Save the registers */ /* sw ra, 40(sp) that the ‘C’ * a0 contains a pointer to the previous routine’s struct. sw gp, 36(sp) procedure calling * a1 contains a pointer to the new routine’s struct. sw s8, 32(sp) * sw s7, 28(sp) convention * The registers get saved on the stack, namely: sw s6, 24(sp) * expects sw s5, 20(sp) * s0-s8 preserved * gp, ra sw s4, 16(sp) * sw s3, 12(sp) * sw s2, 8(sp) */ sw s1, 4(sp) sw s0, 0(sp) /* Allocate stack space for saving 11 registers. 11*4 = 44 */ addi sp, sp, -44 /* Store the old stack pointer in the old pcb */ sw sp, 0(a0)

11 12

2 Routine A Routine B Yield yield(a,b) } {

/* Get the new stack pointer from the new pcb */ lw sp, 0(a1) } yield(b,a) nop /* delay slot for load */ { /* Now, restore the registers */ lw s0, 0(sp) lw s1, 4(sp) lw s2, 8(sp) lw s3, 12(sp) lw s4, 16(sp) yield(a,b) } lw s5, 20(sp) lw s6, 24(sp) { lw s7, 28(sp) lw s8, 32(sp) lw gp, 36(sp) lw ra, 40(sp) nop /* delay slot for load */

/* and return. */ j ra addi sp, sp, 44 /* in delay slot */ .end mips_switch 13 14

Coroutines Cooperative Multithreading

• What about subroutines combined with • Also called green threads • Conservative assumes a multithreading – i.e. what is the issue with calling model subroutines? – i.e. uses synchronisation to avoid races, • Subroutine calling might involve an – and makes no assumption about implicit yield() subroutine behaviour – potentially creates a race on globals • it can potentially yield() • either understand where all yields lie, or • cooperative multithreading 15 16

© Kevin Elphinstone © Kevin Elphinstone

A Thread Control Block

Memory • Thread attributes Memory • To support more than a single thread we to CPU – processor related CPU • memory need store thread state Code Code PC • program counter PC and attributes SP • stack pointer SP REGS REGS • Stored in thread control Data • registers (and status) Data block – OS/package related Stack Stack – also indirectly in stack • state (running/blocked) • identity

• scheduler (queues, TCB priority) A • etc

17 18

© Kevin Elphinstone © Kevin Elphinstone

3 Thread A and Thread B Approx OS

mi_switch() Memory • Thread A state currently { loaded struct thread *cur, *next; next = scheduler(); CPU • Thread B state stored in Note: global Code TCB B /* update curthread */ PC variable curthread cur = curthread; SP • Thread switch from A → B REGS curthread = next; Data – saving state of thread a • regs, sp, pc /* – restoring the state of thread B Stack Stack * Call the machine-dependent code that actually does the • regs, sp, pc * . • Note: registers and PC can */ be stored on the stack, and md_switch(&cur->t_pcb, &next->t_pcb); TCB TCB A B only SP stored in TCB /* back running in same thread */

} 19 20

© Kevin Elphinstone © Kevin Elphinstone

OS/161 mips_switch OS/161 mips_switch Save the registers mips_switch: /* Save the registers */ /* sw ra, 40(sp) that the ‘C’ * a0 contains a pointer to the old thread's struct pcb. sw gp, 36(sp) procedure calling * a1 contains a pointer to the new thread's struct pcb. sw s8, 32(sp) * sw s7, 28(sp) convention * The only thing we touch in the pcb is the first word, which sw s6, 24(sp) * we save the stack pointer in. The other registers get saved expects sw s5, 20(sp) * on the stack, namely: preserved * sw s4, 16(sp) * s0-s8 sw s3, 12(sp) * gp, ra sw s2, 8(sp) * sw s1, 4(sp) * The order must match arch/mips/include/switchframe.h. sw s0, 0(sp) */ /* Store the old stack pointer in the old pcb */ /* Allocate stack space for saving 11 registers. 11*4 = 44 */ sw sp, 0(a0) addi sp, sp, -44

21 22

Thread a Thread b Thread OS/161 mips_switch mips_switch(a,b) } Switch {

/* Get the new stack pointer from the new pcb */ lw sp, 0(a1) } mips_switch(b,a) nop /* delay slot for load */ { /* Now, restore the registers */ lw s0, 0(sp) lw s1, 4(sp) lw s2, 8(sp) lw s3, 12(sp) lw s4, 16(sp) mips_switch(a,b) } lw s5, 20(sp) lw s6, 24(sp) { lw s7, 28(sp) lw s8, 32(sp) lw gp, 36(sp) lw ra, 40(sp) nop /* delay slot for load */

/* and return. */ j ra addi sp, sp, 44 /* in delay slot */ .end mips_switch 23 24

4 Preemptive Multithreading Threads on simple CPU

• Switch can be triggered by Memory

asynchronous external event Scheduling & Switching – timer interrupt Code

• Asynch event saves current state Data

– on current stack, if in kernel (nesting) Stack Stack Stack – on kernel stack or in TCB if coming from

userlevel TCB TCB TCB A B C • call thread_switch()

25 26

© Kevin Elphinstone © Kevin Elphinstone

Threads on CPU with Threads on CPU with protection protection Kernelonly Memory User Memory • What is Kernelonly Memory User Memory CPU Scheduling missing? Scheduling User & Switching & Switching PC Code Code Code SP REGS

Data Data User Data

Stack Stack Stack Stack Stack Stack • What happens on Stack TCB TCB TCB TCB TCB TCB kernel entry A B C A B C and exit? 27 28

© Kevin Elphinstone © Kevin Elphinstone

Switching Address Spaces Switching Address Spaces on Thread Switch = Processes on Thread Switch = Processes Kernelonly Memory User Memory Kernelonly Memory User Memory CPU CPU Scheduling Scheduling User User & Switching PC & Switching PC Code Code Code Code SP SP REGS REGS

Data User Data User Data Data

Stack Stack Stack Stack Stack Stack

Stack Stack TCB TCB TCB TCB TCB TCB A B C A B C

29 30

© Kevin Elphinstone © Kevin Elphinstone

5 What is this? What is this?

Kernelonly Memory User Memory Kernelonly Memory User Memory CPU CPU Scheduling Scheduling Scheduling User & Switching PC & Switching & Switching PC Code Code Code Code SP SP REGS REGS

Data User Data Data Data

Stack Stack Stack Stack Stack Stack Stack Stack Stack

Stack Stack Stack TCB TCB TCB TCB TCB TCB TCB TCB TCB A B C A B C 1 2 3

31 32

© Kevin Elphinstone © Kevin Elphinstone

User-level Threads User-level Threads User Mode  Fast thread management (creation, deletion, switching, synchronisation)  Blocking blocks all threads in a process Scheduler Scheduler Scheduler – Syscalls – Page faults  No threadlevel parallelism on multiprocessor

Process A Process B Process C

Scheduler Kernel Mode

Kernel-Level Threads Kernel-level Threads User Mode  Slow thread management (creation, deletion, switching, synchronisation) • System calls  Blocking blocks only the appropriate thread in a process  Threadlevel parallelism on multiprocessor

Process A Process B Process C

Scheduler Kernel Mode

6 Continuations (in call/cc in Scheme Functional Languages) • Definition of a call/cc = call-with-current-continuation – representation of an instance of a • A function computation at a point in time – takes a function ( f) to call as an argument – calls that function with a reference to current continuation ( cont ) as an argument – when cont is later called, the continuation is restored. • The argument to cont is returned from to the 37 caller of call/cc 38 © Kevin Elphinstone © Kevin Elphinstone

Another Simple Example Simple Example (define the-continuation #f) (define (test) (let ((i 0)) (define (f return) ; call/cc calls its first function argument, passing (return 2) ; a continuation variable representing this point in 3) ; the program as the argument to that function. ; ; In this case, the function argument assigns that (display (f (lambda (x) x))); displays 3 ; continuation to the variable the-continuation. ; (display (call-with-current-continuation f)) (call/cc (lambda (k) (set! the-continuation k))) ;displays 2 ; ; The next time the-continuation is called, we start here. (set! i (+ i 1)) i)) Derived from http://en.wikipedia.org/wiki/Call-with-current-continuation

39 40

© Kevin Elphinstone © Kevin Elphinstone

Another Simple Example Yet Another Simple Example > (test) 1 > (the-continuation) ;;; Return the first element in LST for which WANTED? returns a true 2 ;;; value. > (the-continuation) (define (search wanted? lst) 3 (call/cc (lambda (return) > ; stores the current continuation (which will print 4 next) away (for-each (lambda (element) > (define another-continuation the-continuation) (if (wanted? element) > (test) ; resets the-continuation (return element))) 1 lst) > (the-continuation) #f))) 2 > (another-continuation) ; uses the previously stored continuation 4

Derived from http://community.schemewiki.org/?call-with-current-continuation

Derived from http://en.wikipedia.org/wiki/Continuation

41 42

© Kevin Elphinstone © Kevin Elphinstone

7 Example ;;; This starts a new thread running (proc). Continuations (define (fork proc) (call/cc (lambda (k) (enqueue k) (proc)))) • A method to snapshot current state and

;;; This yields the processor to another thread, if there is one. return to the computation in the future (define (yield) (call/cc • In the general case, as many times as (lambda (k) (enqueue k) we like ((dequeue))))) • Variations and language environments (e.g. in C) result in less general continuations – e.g. one shot continuations 43 44

© Kevin Elphinstone © Kevin Elphinstone

What should be a kernel’s The two alternatives execution model?

Note that the same question can be No one correct answer asked of applications From the view of the designer there are two alternatives. Single Kernel Stack Per-Thread Kernel Stack

Only one stack is Every user thread has a used all the time to support kernel stack. all user threads.

45

© Kevin Elphinstone

Per-Thread Kernel Stack Single Kernel Stack Processes Model “Event” or “Interrupt” Model • A thread’s kernel state is implicitly example(arg1, arg2) { • How do we use a single kernel stack to encoded in the kernel activation P1(arg1, arg2); support many threads? stack if (need_to_block) { thread_block(); – If the thread must block in – Issue: How are system calls that block P2(arg2); kernel, we can simply switch handled? } else { from the current stack, to P3(); ⇒ either continuations another threads stack until } thread is resumed /* return control to user */ – Using Continuations to Implement Thread – Resuming is simply switching return SUCCESS; Management and Communication in Operating back to the original stack } Systems. [Draves et al. , 1991] – Preemption is easy – no conceptual difference between kernel mode and user ⇒ or stateless kernel (interrupt model) mode • Interface and Execution Models in the Fluke Kernel. [Ford et al., 1999]

8 Continuations Stateless Kernel • State required to resume a example(arg1, arg2) { blocked thread is explicitly P1(arg1, arg2); • System calls can not block within the kernel saved in a TCB if (need_to_block) { • A function pointer save_arg_in_TCB; – If syscall must block (resource unavailable) thread_block(example_continue); • Variables • Modify userstate such that syscall is restarted when /* NOT REACHED */ resources become available • Stack can be discarded and } else { • Stack content is discarded (functions all return) reused to support new P3(); thread } • Preemption within kernel difficult to achieve. • Resuming involves thread_syscall_return(SUCCESS); ⇒ Must (partially) roll syscall back to a restart point discarding current stack, } restoring the continuation, example_continue() { • Avoid page faults within kernel code and continuing recover_arg2_from_TCB; ⇒ Syscall arguments in registers P2(recovered arg2); thread_syscall_return(SUCCESS); • Page fault during rollback to restart (due to a page fault) } is fatal.

IPC examples - Continuations IPC examples – Per thread msg_send_rcv(msg, option, send_size, rcv_size, ...) { rc = msg_send(msg, option, stack Send and Receive send_size, ...); if (rc != SUCCESS) msg_send_rcv(msg, option, system call return rc; send_size, rcv_size, ...) { implemented by a cur_thread->continuation.msg = msg; nonblocking send cur_thread->continuation.option = option; rc = msg_send(msg, option, part and a blocking cur_thread->continuation.rcv_size = rcv_size; send_size, ...); ... receive part. rc = msg_rcv(msg, option, rcv_size, ..., msg_rcv_continue); if (rc != SUCCESS) return rc; The function to return rc; } continue with if msg_rcv_continue() { blocked rc = msg_rcv(msg, option, rcv_size, ...); msg = cur_thread->continuation.msg; option = cur_thread->continuation.option; return rc; rcv_size = cur_thread->continuation.rcv_size; } Block inside ... msg_rcv if no rc = msg_rcv(msg, option, rcv_size, ..., msg_rcv_continue); message return rc; available }

IPC Examples – stateless Single Kernel Stack kernel • either continuations per Processor, event model – complex to program msg_send_rcv(cur_thread) { – must be conservative in state saved (any state that might be needed) rc = msg_send(cur_thread); – (Draves), L4Ka::Strawberry, NICTA Pistachio, OKL4 if (rc != SUCCESS) return rc; Set userlevel PC • or stateless kernel to restart msg_rcv – no kernel threads, kernel not interruptible, difficult to program rc = msg_rcv(cur_thread); only – request all potentially required resources prior to execution if (rc == WOULD_BLOCK) { – blocking syscalls must always be restartable – Processorprovided stack management can get in the way set_pc(cur_thread, msg_rcv_entry); – system calls need to be kept simple “atomic”. return RESCHEDULE; – e.g. the fluke kernel from Utah } return rc; • low cache footprint } – always the same stack is used ! – reduced memory footprint RESCHEDULE changes curthread on exiting the kernel

9 Per-Thread Kernel Stack

• simple, flexible – kernel can always use threads, no special techniques required for keeping state while interrupted / blocked – no conceptual difference between kernel mode and user mode – e.g. traditional L4, , Windows, OS/161

• but larger cache footprint • and larger memory consumption

10