Paging provides protection by: • Each (user or OS) has different virtual memory space. Machines and Virtualization • The OS maintain the tables for all processes. • A reference outside the process allocated space cause an exception that lets the OS decide what to do. Systems and Networks • Memory sharing between processes is done via different Jeff Chase Virtual spaces but common physical frames. Spring 2006

[Kedem, CPS 104, Fall05]

Architectural Foundations of OS Kernels Memory and the CPU

0 • One or more privileged execution modes (e.g., kernel mode) OS code protected device control registers CPU OS data privileged instructions to control basic machine functions • instruction and protected fault handling Program A User processes safely enter the kernel to access shared OS services. Datadata R0 • Virtual memory mapping x Program B OS controls virtual-physical translations for each address space. Rn Data • Device to notify the kernel of I/O completion etc. PC x Includes timer hardware and clock interrupts to periodically return registers control to the kernel as user code executes. code library

2n • Atomic instructions for coordination on multiprocessors main memory

Kernel Mode Introduction to Virtual Addressing

0 CPU OS code virtual physical CPU mode (a field memory User processes memory The kernel controls in some status OS data address memory (big?) (small?) the virtual-physical register) indicates through virtual translations in effect whether the CPU is text Program A addresses. for each space. running in a user physical data data program or in the mode Data address BSS protected kernel. R0 space The kernel and the user stack The machine does not x Program B machine collude to Some instructions or translate virtual args/env allow a user process Rn register accesses are Data addresses to kernel to access memory unless the kernel only legal when the PC x physical addresses. CPU is executing in “says it’s OK”. registers kernel mode. code library virtual-to-physical The specific mechanisms for 2n translations main memory implementing virtual address translation are machine-dependent.

1 Processes and the Kernel The Kernel • Today, all “real” operating systems have protected kernels. The kernel resides in a well-known file: the “machine” processes The kernel sets automatically loads it into memory (boots) on power-on/reset. in private data data up process virtual execution Our “kernel” is called the executive in some systems (e.g., XP). address contexts to spaces “virtualize” the • The kernel is (mostly) a library of service procedures shared machine. by all user programs, but the kernel is protected:

...and upcalls (e.g., system call traps User code cannot access internal kernel data structures directly, signals) and it can invoke the kernel only at well-defined entry points (system calls). shared kernel Threads or processes code and data • Kernel code is like user code, but the kernel is privileged: in shared enter the address space kernel for services. The kernel has direct access to all hardware functions, and defines the machine entry points for interrupts and exceptions. CPU and devices force entry to the kernel to handle exceptional events.

Protecting Entry to the Kernel Example: System Call Traps

Protected events and kernel mode are the architectural User code invokes kernel services by initiating system call traps. foundations of kernel-based OS (, XP, etc). • Programs in C, C++, etc. invoke system calls by linking to a • The machine defines a small set of exceptional event types. standard library of procedures written in assembly language. • The machine defines what conditions raise each event. the library defines a stub or wrapper routine for each syscall • The kernel installs handlers for each event at boot time. stub executes a special trap instruction (e.g., chmk or callsys or int) e.g., a table in kernel memory read by the machine syscall arguments/results passed in registers or user stack Alpha CPU architecture The machine transitions to kernel mode user read() in Unix libc.a library (executes in user mode): only on an exceptional event. or #define SYSCALL_READ 27 # code for a read system call The kernel defines the event handlers. trap/return exception move arg0…argn, a0…an # syscall args in registers A0..AN move SYSCALL_READ, v0 # syscall dispatch code in V0 callsys # kernel trap Therefore the kernel chooses what code kernel move r1, _errno # errno = return status will execute in kernel mode, and when. return

Faults The Role of Events Faults are similar to system calls in some respects: A CPU event is an “unnatural” change in control flow. • Faults occur as a result of a process executing an instruction. Like a procedure call, an event changes the PC. Fault handlers execute on the process kernel stack; the fault handler may block (sleep) in the kernel. Also changes mode or context (current stack), or both. Events do not change the current space! • The completed fault handler may return to the faulted context. But faults are different from syscall traps in other respects: The kernel defines a handler routine for each event type. Event handlers always execute in kernel mode. • Syscalls are deliberate, but faults are “accidents”. The specific types of events are defined by the machine. divide-by-zero, dereference invalid pointer, memory • Not every execution of the faulting instruction results in a fault. Once the system is booted, every entry to the kernel occurs as a result of an event. may depend on memory state or register contents In some sense, the whole kernel is a big event handler.

2 CPU Events: Interrupts and Exceptions Mode, Space, and Context

An interrupt is caused by an external event. At any time, the state of each processor is defined by: device requests attention, timer expires, etc. 1. mode: given by the mode bit An exception is caused by an executing instruction. Is the CPU executing in the protected kernel or a user program? CPU requires software intervention to handle a fault or trap. 2. space: defined by V->P translations currently in effect control flow unplanned deliberate What address space is the CPU running in? Once the system is sync fault syscall trap exception.cc booted, it always runs in some . async interrupt AST 3. context: given by register state and execution stream AST: Asynchronous System Trap Also called a software interrupt or an Is the CPU executing a /process, or an ? Asynchronous or Deferred Procedure Call event handler (e.g., (APC or DPC) ISR: Interrupt Service Where is the stack? Routine) Note: different “cultures” may use some of these terms (e.g., These are important because the mode/space/context trap, fault, exception, event, interrupt) slightly differently. determines the meaning and validity of key operations.

The Virtual Address Space Process and Kernel Address Spaces

0 0x0 text A typical process VAS space includes: data • user regions in the lower half 0 0x0 BSS sbrk() n-bit virtual data data jsr V->P mappings specific to each process 32-bit virtual user stack address accessible to user or kernel code address args/env space space • kernel regions in upper half 2n-1 shared by all processes, but accessible only to 2n-1-1 0x7FFFFFFF kernel code kernel text and • NT (XP?) on subdivides kernel region into an

kernel data unpaged half and a (mostly) paged upper half at n-1 0xC0000000 for page tables and I/O cache. 2 0x80000000 • Win95/98 uses the lower half of system space as a 2n-1 0xffffffff system-wide shared region.

2n-1 0xFFFFFFFF A VAS for a private address space system (e.g., Unix, NT/XP) executing on a typical 32-bit system (e.g., x86).

The OS Directs the MMU Virtual Address Translation 29 virtual address 0 The OS controls the operation of the MMU to select: 13 Example: typical 32-bit 00 VPN offset (1) the subset of possible virtual addresses that are valid for architecture with 8KB pages. each process (the process virtual address space); Virtual address translation maps a (2) the physical translations for those virtual addresses; virtual page number (VPN) to a (3) the modes of permissible access to those virtual addresses; physical page frame number (PFN): the rest is easy. address read/write/execute translation (4) the specific set of translations in effect at any instant. Deliver exception to OS if translation is not need rapid from one address space to another valid and accessible in MMU completes a reference only if the OS “says it’s OK”. requested mode. MMU raises an exception if the reference is “not OK”. physical address PFN + { offset

3 Completing a VM Reference Virtual Memory as a Cache

virtual physical MMU access executable backing start probe load memory memory TLB physical file storage here memory (big) (small) header text pageout/eviction text data probe access raise idatadata BSS TLB valid? exception wdata symbol user stack table, etc. args/env load page fetch zero-fill program kernel TLB OS sections process physical segments page frames

fetch page on allocate page virtual-to-physical from disk disk? frame fault? process translations

Wrapping Up What did we just do?

There is lots more to say about address translation, but we We used special machine features to “virtualize” a core don’t want to spend too much time on it now. resource: memory. • On NT/x86, each address space has a page directory • Each process/space only gets some of the memory. • One page: 4K bytes, 1024 4-byte entries (PTEs) • The OS decides how much you get. • The OS decides what parts of the program and its data are in • Each PDIR entry points to a “page table” memory, and what parts you will have to wait for. • Each “page table” is one page with 1024 PTEs • You can’t tell exactly what you have. • each PTE maps one 4K page of the address space • The OS isolates each process from its competitors. • Each page table maps 4MB of memory: 1024*4K Virtualization involves a clean abstract interface with a level • One PDIR for a 4GB address space, max 4MB of tables of indirection that enables the system to interpose on • Load PDIR base address into a register to activate the VAS important actions, securely and transparently, in order to cover up ugly details of the environment.

Sharing the CPU Sharing Disks

We have seen how an can share and How should the OS mediate/virtualize/share the disk(s) “virtualize” one hardware resource: memory. among multiple users or programs? How can does an OS share the CPU among multiple running • Safely programs (processes)? •Fairly • Safely • Securely • Fairly (?) • Efficiently • Efficiently • Effectively • Robustly

4 Classical View: The Questions A Simple Page Table

The basic issues/questions in this course are how to: process page table Each process/VAS has • allocate memory and storage to multiple programs? PFN 0 its own page table. PFN 1 Virtual addresses are • share the CPU among concurrently executing programs? translated relative to • suspend and resume programs? the current page table.

• share data safely among concurrent activities? PFN i In this example, each PFN i VPN j maps to PFN j, • protect one executing program’s storage from another? + but in practice any offset • protect the code that implements the protection, and physical frame may be used for any virtual page. mediates access to resources? page #i offset • prevent rogue programs from taking over the machine? user virtual address The page tables are physical memory themselves stored in • allow programs to interact safely? page frames memory; a protected register holds a pointer to the current page table.

Page Tables (2) Second-level page tables

Top-level page table

32 bit address with 2 page table fields Two-level page tables [from Tanenbaum]

5