Outline

u Protection mechanisms and OS Structures COS 318: Operating Systems u : Protection and Address Translation

Protection and Virtual Memory

Jaswinder Pal Singh Computer Science Department Princeton University

(http://www.cs.princeton.edu/courses/cos318/)

2

Protection Issues Architecture Support: Privileged Mode

u CPU l Kernel has the ability to take CPU away from users to An or exception (INT) prevent a user from using the CPU forever l Users should not have such an ability

u Memory User mode Kernel (privileged) mode l Prevent a user from accessing others’ data • Regular instructions • Regular instructions l Prevent users from modifying kernel code and data • Access user memory • Privileged instructions structures • Access user memory • Access kernel memory u I/O l Prevent users from performing “illegal” I/Os u Question A special instruction (IRET) l What’s the difference between protection and security?

3 4

1 Privileged Instruction Examples Monolithic

u Memory address mapping u All kernel routines are together, linked in single large executable u Flush or invalidate data cache l Each can call any other u Invalidate TLB entries l Services and utilities User User program program u Load and read system registers u A system call interface u u Change processor modes from kernel to user Examples: syscall l Linux, BSD Unix, Windows, … u Change the voltage and frequency of processor syscall u Pros u Halt a processor l Shared kernel space u Reset a processor l Good performance Kernel u Perform I/O operations u Cons (many things) l Instability: crash in any procedure brings system down l Inflexible / hard to maintain, extend

5 6

Layered Structure x86 Protection Rings

u Hiding information at each layer u Layered dependency Privileged instructions u Examples Level N Can be executed only l THE (6 layers) When current privileged . • Mostly for functionality splitting . Level (CPR) is 0

l MS-DOS (4 layers)

Level 2 u Pros kernel l Layered abstraction Level 0

Level 1 u Cons Operating system Level 1 l Inefficiency services

l Inflexible Hardware Level 2 Applications Level 3

7 8

2 Virtual Machine u Services implemented as regular u Virtual machine monitor processes l Virtualize hardware u Micro-kernel obtain services for Apps Apps users by messaging with services User OS l Run several OSes OS . . . OS u Examples: program Services l Examples 1 k l Mach, Taos, L4, OS-X • IBM VM/370 VM1 VMk syscall u Pros? • Java VM Virtual Machine Monitor l Flexibility • VMWare, Xen l Fault isolation entry u Cons? u What would you use µ-kernel l Inefficient (boundary crossings) virtual machine for? Raw Hardware l Inconvenient to share data between kernel and services l Just shifts the problem? 9 10

Two Popular Ways to Implement VMM

To support multiprogramming, we need “Protection”

Win Apps u Kernel vs. user mode

Win Apps Win Vista u Virtual address spaces and Address Translation

Linux Apps Win Vista Linux Apps VMM Physical memory Abstraction: virtual memory Linux VMM Linux No protection Each program isolated from all others and from the OS

Hardware Hardware Limited size Illusion of “infinite” memory

Sharing visible to programs Transparent --- can’t tell if VMM runs on hardware VMM as an application memory is shared (A special lecture later in the semester) Virtual addresses are translated to physical addresses

11

3 The Big Picture Issues

u DRAM is fast, but relatively expensive u Many processes u Disk is inexpensive, but slow l The more processes a system can handle, the better l 100X less expensive u Address space size CPU l 100,000X longer latency l Many small processes whose total size may exceed l 1000X less bandwidth memory u Our goals Memory l Even one may exceed the physical memory size l Run programs as efficiently as possible u Protection l Make the system as safe as possible Disk l A user process should not crash the system l A user process should not do bad things to other processes

13 14

Consider A Simple System Handling Protection

u Only physical memory u Errors in one process should not affect others l Applications use OS u For each process, check each load and store physical memory directly x9000 instruction to allow only legal memory references u Run three processes email l Email, browser, gcc x7000 gcc address Physical u What if browser CPU Check x5000 memory l gcc has an address error error? gcc x2500 data l browser writes at x7050? Free l email needs to expand? x0000 l browser needs more memory than is on the machine? 15 16

4 Handling Finiteness: Relocation Virtual Memory

u A process should be able to run regardless of where u Flexible its data are physically placed or physical memory size l Processes (and data) can move in memory as they u Give each process a large, static “fake” address execute, and be part in memory and part on disk space that is large and contiguous and entirely its own u Simple u As a process runs, relocate or map each load and l Applications generate loads and stores to addresses in store to addresses in actual physical memory the contiguous, large, “fake” address space email u Efficient address Check & Physical l 20/80 rule: 20% of memory gets 80% of references CPU relocate memory l Keep the 20% in physical memory u Design issues data l How is protection enforced? l How are processes and data relocated?

17 l How is memory partitioned? 18

Address Mapping and Granularity Generic Address Translation

u Must have some “mapping” mechanism u Unite (MMU) translates virtual l Map virtual addresses to physical addresses in RAM or address into CPU disk for each load and store u Mapping must have some granularity u Combination of hardware and Virtual address l Finer granularity provides more flexibility (privileged) software controls the translation MMU l Finer granularity requires more mapping information u CPU view l Virtual addresses Physical address u Each process has its own memory space [0, high] Physical I/O l Address space memory device u Memory or I/O device view l Physical addresses

19 20

5 Goals of Translation Address Translation Methods u Implicit translation for each u Base and Bounds memory reference Registers u Segmentation u A hit should be very fast u u Trigger an exception on a Paging L1 2-3x miss u Multilevel translation u Protected from user’s errors L2-L3 10-20x u Inverted page tables

Memory 100-300x Paging 20M-30Mx Disk

21

Base and Bounds Base and Bound (or Limit)

u Built in Cray-1 u CPU has base and bound reg virtual memory physical memory u Base holds start address of running bound 0 process; bound is length of its addressable space code 6250 (base) virtual address > u Protection error l A process can only access physical data memory in [base, base+bound] + base u On a bound l Save/restore base, bound regs stack 6250+bound u Pros physical address l Simple u Cons Each program loaded into contiguous l Can’t fit all processes, have to swap regions of physical memory. l Fragmentation in memory Hardware cost: 2 registers, adder, comparator. l Relocate processes when they grow l Compare and add on every instruction 24

6 Segmentation Segmentation example u Each process has a table of (seg, size) (assume 2 bit segment ID, 12 bit segment offset) Virtual address u Treats (seg, size) as a fine- v-segment # p-segment segment physical memory grained (base, bound) start size segment offset > u Protection code (00) 0x4000 0x700 error 0 l Each entry has data (01) 0 0x500 (nil, read, write, exec) seg size - (10) 0 0 4ff u On a context switch stack (11) 0x2000 0x1000 . l Save/restore table in kernel . memory . virtual memory u Pros 2000 0 l Efficient: programmer knows program and so segments 6ff 2fff l Provides logical protection + 1000 l Easy to share data 14ff u Cons 4000 l Complex management physical address 3000 46ff l Fragmentation 3fff

25

Segmentation example (cont’d) Paging

u Use a fixed size unit called Virtual memory for strlen(x) physical memory for strlen(x) page instead of segment Virtual address size u Use a page table to VPage # offset Main: 240 store 1108, r2 x: 108 a b c \0 translate error 244 store pc+8, r31 … u Various bits in each entry > 248 jump 360 Page table u Context switch 24c Main: 4240 store 1108, r2 PPage# l Similar to segmentation ... … 4244 store pc+8, r31 ... u strlen: 360 loadbyte (r2), r3 4248 jump 360 What should be the page . size? . … 424c . 420 jump (r31) … u Pros PPage# ... … strlen: 4360 loadbyte (r2), r3 l Simple allocation … l Easy to share x: 1108 a b c \0 4420 jump (r31) u Cons PPage # offset … … l Big table Physical address l How to deal with holes?

28

7 Paging example How Many PTEs Do We Need?

u Assume 4KB page l Needs “low order” 12 bits virtual memory physical memory 0 4 u Worst case for 32-bit address machine a i 20 b j l # of processes × 2 c k 4 l 220 PTEs per page table (~4Mbytes), but there might be d l 3 8 10K processes. They won’t fit in memory together

e 1 12 e u What about 64-bit address machine? f f g g l # of processes × 252 h 16 h l A page table cannot fit in a disk (252 PTEs = 16PBytes)! page size: 4 bytes a i b j c k d l

30

Segmentation with paging Segmentation with paging – Intel 386

Virtual address u As shown in the following diagram, the Intel Vseg # VPage # offset 386 uses segmentation with paging for memory management with a two-level Page table paging scheme. seg size PPage# ...... PPage# ...

Every segment has > its own page table PPage # offset error Physical address

8 Intel 30386 address translation Multiple-Level Page Tables

Virtual address pte dir table offset . .

Directory . . . .

. .

What does this buy us? 34

Inverted Page Tables Virtual-To-Physical Lookups

u Programs only know virtual addresses u Main idea Physical l Each program or process starts from 0 to high address l One PTE for each Virtual address physical page frame address u Each virtual address must be translated l Hash (Vpage, pid) to pid vpage offset k offset l May involve walking through the hierarchical page table Ppage# l Since the page table stored in memory, a program u Pros memory access may requires several actual memory 0 accesses l Small page table for large address space u Solution pid vpage k u Cons l Cache “active” part of page table in a very fast memory l Lookup is difficult n-1 l Overhead of Inverted page table managing hash chains, etc 35 36

9 Translation Look-aside Buffer (TLB) Bits in a TLB Entry

Virtual address u Common (necessary) bits

VPage # offset l Virtual page number l Physical page number: translated address VPage# PPage# ... l Valid bit VPage# PPage# ... Miss l Access bits: kernel and user (none, read, write) . . Real u Optional (useful) bits VPage# PPage# ... page l Process tag TLB table l Reference bit Hit l Modify bit PPage # offset l Cacheable bit Physical address

37 38

Hardware-Controlled TLB Software-Controlled TLB

u On a TLB miss u On a miss in TLB, software is invoked l If the page containing the PTE is valid (in memory), hardware loads the PTE into the TLB l Write back if there is no free entry • Write back and replace an entry if there is no free entry l Check if the page containing the PTE is in memory l Generate a fault if the page containing the PTE is l If not, perform page fault handling invalid, or if there is a protection fault l Load the PTE into the TLB l VM software performs fault handling l Restart the faulting instruction l Restart the CPU u On a hit in TLB, the hardware checks valid bit u On a TLB hit, hardware checks the valid bit l If valid, pointer to page frame in memory l If valid, pointer to page frame in memory l If invalid, the hardware generates a page fault l If invalid, the hardware generates a page fault • Perform page fault handling • Perform page fault handling • Restart the faulting instruction • Restart the faulting instruction 39 40

10 Cache vs. TLB TLB Related Issues

u What TLB entry to be replaced? Address Data Vpage # offset l Random l Pseudo LRU Hit Cache Hit TLB u What happens on a context switch? l Process tag: invalidate appropriate TLB entries Miss Miss l No process tag: Invalidate the entire TLB contents ppage # offset u What happens when changing a page table entry? Memory Memory l Change the entry in memory l Invalidate the TLB entry

u Similarities u Differences l Cache a portion of memory l Associativity l Write back on a miss l Consistency

42 43

Consistency Issues Summary: Virtual Memory

u “Snoopy” cache protocols (hardware) u Virtual Memory l Maintain consistency with DRAM, even when DMA l Virtualization makes software development easier and enables happens memory resource utilization better l Separate address spaces provide protection and isolate faults u Consistency between DRAM and TLBs (software) l You need to flush related TLBs whenever changing a u Address Translation page table entry in memory l Translate every memory operation using table (page table, segment table). u TLB “shoot-down” l Speed: cache frequently used translations l On multiprocessors, when you modify a page table entry, you need to flush all related TLB entries on all u Result processors, why? l Every process has a private address space l Programs run independently of actual physical memory addresses used, and actual memory size l Protection: processes only access memory they are allowed to

44

11