K42

CS380L: Mike Dahlin

October 13, 2003

1 Preliminaries

1.1 Review 1.2 Outline • Overview

– Clustered objects – Existence locks – IPC

• Discussion and open issues

1.3 Preview 2 K42 overview

Takes lots of well-received research ideas from last 10+ years and builds an OS with them...

• Exokernel/scheduler activations – User-level libraries approach

-like virtual memory

, mach, L4

• Emulate old OS with library + (Hydra, mach, exokernel, ...)

• OS-kit device drivers (internal API)

virtualization (Disco?)

• Local RPC (LRPC) for interprocess communication

– LRPC + Clustered objects similar to RPC + {default sub, caching, ...}

• ...

1 • Key ideas in design

– Structure (from intro) ∗ Modular, object oriented ∗ Avoid centralized code paths, global data structures, global locks ∗ move system functionality from kernel to server processes and application libraries – Claimed benefits ∗ Easy prototyping of new OS technologies ∗ Scale up (to large multiprocessor and large applications), Scale down (to small multiprocessors), support small scale applications ∗ Backwards compatible with but easy to add specialized components ∗ Customizable: Allow applications to determine how OS manages their resources ∗ Flexible: support wide range of applications and problem domains, support wide range of hardware, scale down to embedded HW and up to future enterprise server

• These ideas (largely) taken from past systems (and adapted)

– Kernel: memory management, process management, IPC, base scheduling – User-level libraries for abstractions (Linux API/ABI) (a la exokernel) ∗ Customizable ∗ Overhead is reduced – avoid crossing address space boundaries (e.g., setting timers that will be cleared before they go off) ∗ Space and time are consumed by calling application not by the kernel or servers – Implement key functionality as user-level servers (a la microkernel) ∗ Why is “Linux file server” a separate process rather than a library? – Memory management . mach ∗ Compare K42 memory management to mach

2 3 Admin

Next week:

• Monday 10/20: Guest lecture – Emmett Witchel: ACES 2.402 3:30-4:30

• Wednesday 10/22: Midterm – in class

– Bring one 8.5x11 inch page of notes (font ≥ 8pt)

4 K42 Scalability

4.1 Clustered objects • Multiprocessor is like a distributed system

– Use RPC for communication ∗ LRPC – local remote procedure call (anderson) – RPC a good match for interprocess communication (fewer problems than remote RPC – performance, reliability, ...) – Optimize RPC for scalability (when needed) ∗ Default stub (usually) ∗ Cache/replicate (when needed) (see NFS, Globe) – Mechanism: ∗ Object reference – pointer to pointer to object representative ∗ Object representative – interface (global, shared, local) to global object (cache, replicate, or not) · Global pointer to same virtual address but (possibly) different physical address on different processors · Fill local table on demand with trap handler · e.g., to create a new clustered object, bump a pointer in per-process table and use next entry as Object reference; set it to NULL on each processor. Then, when a processor first references the object, go to global trap handler, see that it is a Clustered object reference, look at global table to see which clustered object and call its handler. It can then decide whether to instantiate a pointer to single global object representative, to object representative shared by 4 processors, or to create new object representative

4.2 Existence locks • Hairy problem in OO programming + concurrency – how to cleanly shut an object down

– Need to wait until no one is using the object – Can’t put lock inside object (b/c deallocation is a race between deallocators and threads contending for that lock; e.g., what to do about waiting threads on that lock...) – Putting “existence lock” for object outside of object is complex ∗ Somewhat non-modular ∗ Need existence lock for existence lock, so on to global lock

3 ∗ “Tends to encourage holding of locks for a long time” • Solution: semi-automatic garbage collection – 3 phases 1. Remove all persistent references (e.g., references in global data structures) ∗ “just as you normally would when removing object” ∗ Now, what about threads that have references on their stacks? 2. Uniprocessor: Wait until zero threads active in kernel ∗ Then, no stack can have reference to object → garbage collect it ∗ Not provably live, but intuition is: “system server calls tend to be short...we do not believe this to be a problem in practice” ∗ Future work: keep several generations of “active counts” 3. Multiprocessor: Hand token to each processor and repeat step 2 for all processors

4.3 IPC • Interprocess communication: Must be fast (servers in different processes) • Optimize common case: same processor – process trap, kernel reti to callee – Callee receives caller’s registers, protected authenticator, (optional) page – Cheaper than context switch: no save registers, no scheduling – Maintains locality

5 Discussion and open issues

What were contributions of • Multics • THE • Unix • Disco • Exokernel • Mesa/Threads/Monitors • Scheduling/Scheduler activations • RPC • Active messages • Virtual memory • K42 What are the solved problems in OS? What are the open problems (or at least not addressed so far) in OS?

4