Outline
u Protection mechanisms and OS Structures COS 318: Operating Systems u Virtual Memory: Protection and Address Translation
Protection and Virtual Memory
Jaswinder Pal Singh Computer Science Department Princeton University
(http://www.cs.princeton.edu/courses/cos318/)
2
Protection Issues Architecture Support: Privileged Mode
u CPU l Kernel has the ability to take CPU away from users to An interrupt or exception (INT) prevent a user from using the CPU forever l Users should not have such an ability
u Memory User mode Kernel (privileged) mode l Prevent a user from accessing others’ data • Regular instructions • Regular instructions l Prevent users from modifying kernel code and data • Access user memory • Privileged instructions structures • Access user memory • Access kernel memory u I/O l Prevent users from performing “illegal” I/Os u Question A special instruction (IRET) l What’s the difference between protection and security?
3 4
1 Privileged Instruction Examples Monolithic
u Memory address mapping u All kernel routines are together, linked in single large executable u Flush or invalidate data cache l Each can call any other u Invalidate TLB entries l Services and utilities User User program program u Load and read system registers u A system call interface u u Change processor modes from kernel to user Examples: syscall l Linux, BSD Unix, Windows, … u Change the voltage and frequency of processor syscall u Pros u Halt a processor l Shared kernel space u Reset a processor l Good performance Kernel u Perform I/O operations u Cons (many things) l Instability: crash in any procedure brings system down l Inflexible / hard to maintain, extend
5 6
Layered Structure x86 Protection Rings
u Hiding information at each layer u Layered dependency Privileged instructions u Examples Level N Can be executed only l THE (6 layers) When current privileged . • Mostly for functionality splitting . Level (CPR) is 0
l MS-DOS (4 layers)
Level 2 Operating system u Pros kernel l Layered abstraction Level 0
Level 1 u Cons Operating system Level 1 l Inefficiency services
l Inflexible Hardware Level 2 Applications Level 3
7 8
2 Microkernel Virtual Machine u Services implemented as regular u Virtual machine monitor processes l Virtualize hardware u Micro-kernel obtain services for Apps Apps users by messaging with services User OS l Run several OSes OS . . . OS u Examples: program Services l Examples 1 k l Mach, Taos, L4, OS-X • IBM VM/370 VM1 VMk syscall u Pros? • Java VM Virtual Machine Monitor l Flexibility • VMWare, Xen l Fault isolation entry u Cons? u What would you use µ-kernel l Inefficient (boundary crossings) virtual machine for? Raw Hardware l Inconvenient to share data between kernel and services l Just shifts the problem? 9 10
Two Popular Ways to Implement VMM Memory Protection
To support multiprogramming, we need “Protection”
Win Apps u Kernel vs. user mode
Win Apps Win Vista u Virtual address spaces and Address Translation
Linux Apps Win Vista Linux Apps VMM Physical memory Abstraction: virtual memory Linux VMM Linux No protection Each program isolated from all others and from the OS
Hardware Hardware Limited size Illusion of “infinite” memory
Sharing visible to programs Transparent --- can’t tell if VMM runs on hardware VMM as an application memory is shared (A special lecture later in the semester) Virtual addresses are translated to physical addresses
11
3 The Big Picture Issues
u DRAM is fast, but relatively expensive u Many processes u Disk is inexpensive, but slow l The more processes a system can handle, the better l 100X less expensive u Address space size CPU l 100,000X longer latency l Many small processes whose total size may exceed l 1000X less bandwidth memory u Our goals Memory l Even one process may exceed the physical memory size l Run programs as efficiently as possible u Protection l Make the system as safe as possible Disk l A user process should not crash the system l A user process should not do bad things to other processes
13 14
Consider A Simple System Handling Protection
u Only physical memory u Errors in one process should not affect others l Applications use OS u For each process, check each load and store physical memory directly x9000 instruction to allow only legal memory references u Run three processes email l Email, browser, gcc x7000 gcc address Physical u What if browser CPU Check x5000 memory l gcc has an address error error? gcc x2500 data l browser writes at x7050? Free l email needs to expand? x0000 l browser needs more memory than is on the machine? 15 16
4 Handling Finiteness: Relocation Virtual Memory
u A process should be able to run regardless of where u Flexible its data are physically placed or physical memory size l Processes (and data) can move in memory as they u Give each process a large, static “fake” address execute, and be part in memory and part on disk space that is large and contiguous and entirely its own u Simple u As a process runs, relocate or map each load and l Applications generate loads and stores to addresses in store to addresses in actual physical memory the contiguous, large, “fake” address space email u Efficient address Check & Physical l 20/80 rule: 20% of memory gets 80% of references CPU relocate memory l Keep the 20% in physical memory u Design issues data l How is protection enforced? l How are processes and data relocated?
17 l How is memory partitioned? 18
Address Mapping and Granularity Generic Address Translation
u Must have some “mapping” mechanism u Memory Management Unite (MMU) translates virtual l Map virtual addresses to physical addresses in RAM or address into physical address CPU disk for each load and store u Mapping must have some granularity u Combination of hardware and Virtual address l Finer granularity provides more flexibility (privileged) software controls the translation MMU l Finer granularity requires more mapping information u CPU view l Virtual addresses Physical address u Each process has its own memory space [0, high] Physical I/O l Address space memory device u Memory or I/O device view l Physical addresses
19 20
5 Goals of Translation Address Translation Methods u Implicit translation for each u Base and Bounds memory reference Registers u Segmentation u A hit should be very fast u u Trigger an exception on a Paging L1 2-3x miss u Multilevel translation u Protected from user’s errors L2-L3 10-20x u Inverted page tables
Memory 100-300x Paging 20M-30Mx Disk
21
Base and Bounds Base and Bound (or Limit)
u Built in Cray-1 u CPU has base and bound reg virtual memory physical memory u Base holds start address of running bound 0 process; bound is length of its addressable space code 6250 (base) virtual address > u Protection error l A process can only access physical data memory in [base, base+bound] + base u On a context switch bound l Save/restore base, bound regs stack 6250+bound u Pros physical address l Simple u Cons Each program loaded into contiguous l Can’t fit all processes, have to swap regions of physical memory. l Fragmentation in memory Hardware cost: 2 registers, adder, comparator. l Relocate processes when they grow l Compare and add on every instruction 24
6 Segmentation Segmentation example u Each process has a table of (seg, size) (assume 2 bit segment ID, 12 bit segment offset) Virtual address u Treats (seg, size) as a fine- v-segment # p-segment segment physical memory grained (base, bound) start size segment offset > u Protection code (00) 0x4000 0x700 error 0 l Each entry has data (01) 0 0x500 (nil, read, write, exec) seg size - (10) 0 0 4ff u On a context switch stack (11) 0x2000 0x1000 . l Save/restore table in kernel . memory . virtual memory u Pros 2000 0 l Efficient: programmer knows program and so segments 6ff 2fff l Provides logical protection + 1000 l Easy to share data 14ff u Cons 4000 l Complex management physical address 3000 46ff l Fragmentation 3fff
25
Segmentation example (cont’d) Paging
u Use a fixed size unit called Virtual memory for strlen(x) physical memory for strlen(x) page instead of segment Virtual address page table size u Use a page table to VPage # offset Main: 240 store 1108, r2 x: 108 a b c \0 translate error 244 store pc+8, r31 … u Various bits in each entry > 248 jump 360 Page table u Context switch 24c Main: 4240 store 1108, r2 PPage# l Similar to segmentation ... … 4244 store pc+8, r31 ... u strlen: 360 loadbyte (r2), r3 4248 jump 360 What should be the page . size? . … 424c . 420 jump (r31) … u Pros PPage# ... … strlen: 4360 loadbyte (r2), r3 l Simple allocation … l Easy to share x: 1108 a b c \0 4420 jump (r31) u Cons PPage # offset … … l Big table Physical address l How to deal with holes?
28
7 Paging example How Many PTEs Do We Need?
u Assume 4KB page l Needs “low order” 12 bits virtual memory physical memory 0 4 u Worst case for 32-bit address machine a i 20 b j l # of processes × 2 c k 4 l 220 PTEs per page table (~4Mbytes), but there might be d l 3 8 10K processes. They won’t fit in memory together
e 1 12 e u What about 64-bit address machine? f f g g l # of processes × 252 h 16 h l A page table cannot fit in a disk (252 PTEs = 16PBytes)! page size: 4 bytes a i b j c k d l
30
Segmentation with paging Segmentation with paging – Intel 386
Virtual address u As shown in the following diagram, the Intel Vseg # VPage # offset 386 uses segmentation with paging for memory management with a two-level Page table paging scheme. seg size PPage# ...... PPage# ...
Every segment has > its own page table PPage # offset error Physical address
8 Intel 30386 address translation Multiple-Level Page Tables
Virtual address pte dir table offset . .
Directory . . . .
. .
What does this buy us? 34
Inverted Page Tables Virtual-To-Physical Lookups
u Programs only know virtual addresses u Main idea Physical l Each program or process starts from 0 to high address l One PTE for each Virtual address physical page frame address u Each virtual address must be translated l Hash (Vpage, pid) to pid vpage offset k offset l May involve walking through the hierarchical page table Ppage# l Since the page table stored in memory, a program u Pros memory access may requires several actual memory 0 accesses l Small page table for large address space u Solution pid vpage k u Cons l Cache “active” part of page table in a very fast memory l Lookup is difficult n-1 l Overhead of Inverted page table managing hash chains, etc 35 36
9 Translation Look-aside Buffer (TLB) Bits in a TLB Entry
Virtual address u Common (necessary) bits
VPage # offset l Virtual page number l Physical page number: translated address VPage# PPage# ... l Valid bit VPage# PPage# ... Miss l Access bits: kernel and user (none, read, write) . . Real u Optional (useful) bits VPage# PPage# ... page l Process tag TLB table l Reference bit Hit l Modify bit PPage # offset l Cacheable bit Physical address
37 38
Hardware-Controlled TLB Software-Controlled TLB
u On a TLB miss u On a miss in TLB, software is invoked l If the page containing the PTE is valid (in memory), hardware loads the PTE into the TLB l Write back if there is no free entry • Write back and replace an entry if there is no free entry l Check if the page containing the PTE is in memory l Generate a fault if the page containing the PTE is l If not, perform page fault handling invalid, or if there is a protection fault l Load the PTE into the TLB l VM software performs fault handling l Restart the faulting instruction l Restart the CPU u On a hit in TLB, the hardware checks valid bit u On a TLB hit, hardware checks the valid bit l If valid, pointer to page frame in memory l If valid, pointer to page frame in memory l If invalid, the hardware generates a page fault l If invalid, the hardware generates a page fault • Perform page fault handling • Perform page fault handling • Restart the faulting instruction • Restart the faulting instruction 39 40
10 Cache vs. TLB TLB Related Issues
u What TLB entry to be replaced? Address Data Vpage # offset l Random l Pseudo LRU Hit Cache Hit TLB u What happens on a context switch? l Process tag: invalidate appropriate TLB entries Miss Miss l No process tag: Invalidate the entire TLB contents ppage # offset u What happens when changing a page table entry? Memory Memory l Change the entry in memory l Invalidate the TLB entry
u Similarities u Differences l Cache a portion of memory l Associativity l Write back on a miss l Consistency
42 43
Consistency Issues Summary: Virtual Memory
u “Snoopy” cache protocols (hardware) u Virtual Memory l Maintain consistency with DRAM, even when DMA l Virtualization makes software development easier and enables happens memory resource utilization better l Separate address spaces provide protection and isolate faults u Consistency between DRAM and TLBs (software) l You need to flush related TLBs whenever changing a u Address Translation page table entry in memory l Translate every memory operation using table (page table, segment table). u TLB “shoot-down” l Speed: cache frequently used translations l On multiprocessors, when you modify a page table entry, you need to flush all related TLB entries on all u Result processors, why? l Every process has a private address space l Programs run independently of actual physical memory addresses used, and actual memory size l Protection: processes only access memory they are allowed to
44
11