16. Virtual Memory
6.004x Computation Structures Part 3 – Computer Organization
Copyright © 2016 MIT EECS
6.004 Computation Structures L16: Virtual Memory, Slide #1 Even more Memory Hierarchy
6.004 Computation Structures L16: Virtual Memory, Slide #2 Reminder: A Typical Memory Hierarchy • Everything is a cache for something else
Access time Capacity Managed By On the datapath Registers 1 cycle 1 KB Software/Compiler
Level 1 Cache 2-4 cycles 32 KB Hardware
Level 2 Cache 10 cycles 256 KB Hardware
Level 3 Cache 40 cycles 10 MB Hardware On chip
Main Memory 200 cycles 10 GB Software/OS Other chips Flash Drive 10-100us 100 GB Software/OS
Mechanical Hard Disk 10ms 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #3 Reminder: Hardware Caches
Direct-mapped N-way set-associative Fully associative address address address N
Each address maps Each address can map Any location can to a single location in to any of N locations map to any (ways) of a single set, address the cache, given by given by its index bits index bits Other implementation choices: – Block size – Replacement policy (LRU, Random, …) – Write policy (write-through, write-behind, write-back)
6.004 Computation Structures L16: Virtual Memory, Slide #4 Reminder: A Typical Memory Hierarchy • Everything is a cache for something else
Access time Capacity Managed By On the datapath Registers 1 cycle 1 KB Software/Compiler
Level 1 Cache 2-4 cyclesBefore: 32 KB Hardware Level 2 Cache 10 Hardwarecycles 256 KB Hardware
Level 3 Cache 40 cyclesCaches 10 MB Hardware On chip
Main Memory 200 cycles 10 GB Software/OS Other TODAY: chips Flash Drive 10-100usVirtual 100 GB Software/OS
Mechanical Hard Disk 10msMemory 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #5 Extending the Memory Hierarchy Main Secondary Cache memory storage CPU Fast SRAM DRAM Hard disk ~1-10ns ~100ns ~10ms ~10 KB – 10 MB ~10 GB ~1 TB
• Problem: DRAM vs disk has much more extreme differences than SRAM vs DRAM – Access latencies: • DRAM ~10-100x slower than SRAM • Disk ~100,000x slower than DRAM – Importance of sequential accesses: • DRAM: Fetching successive words ~5x faster than first word • Disk: Fetching successive words ~100,000x faster than first word • Result: Design decisions driven by enormous cost of misses
6.004 Computation Structures L16: Virtual Memory, Slide #6 Impact of Enormous Miss Penalty
• If DRAM was to be organized like an SRAM cache, how should we design it? – Associativity: High, minimize miss ratio – Block size: Large, amortize cost of a miss over multiple words (locality) – Write policy: Write back, minimize number of writes
• Is there anything good about misses being so expensive? – We can handle them in software! What’s 1000 extra instructions (~1us) vs 10ms? – Approach: Handle hits in hardware, misses in software • Simpler implementation, more flexibility
6.004 Computation Structures L16: Virtual Memory, Slide #7 Basics of Virtual Memory
6.004 Computation Structures L16: Virtual Memory, Slide #8 Virtual Memory
• Two kinds of addresses: – CPU uses virtual addresses – Main memory uses physical addresses • Hardware translates virtual addresses to physical addresses via an operating system (OS)-managed table, the page map Main Memory management unit memory
CPU Virtual MMU Physical DRAM addresses addresses (VAs) 0 (PAs) 0 1 1
CPU Not all virtual N-1 addresses may have a translation P-1 Disk Page Map 6.004 Computation Structures L16: Virtual Memory, Slide #9 Virtual Memory Implementation: Paging
32-p p • Divide physical memory in fixed-size blocks, called pages Virtual Page # offset – Typical page size (2p): 4KB -16 KB – Virtual address: Virtual page number + offset bits – Physical address: Physical page number + offset bits – Why use lower bits as offset? Physical Page #offset
• MMU maps virtual pages to physical pages – Use page map to perform translation – Cause a page fault (a miss) if virtual page is not resident in physical memory.
Using main memory as a page cache = paging or demand paging
6.004 Computation Structures L16: Virtual Memory, Slide #10 Demand Paging
Basic idea: – Start with all virtual pages in secondary storage, MMU “empty”, ie, there are no pages resident in physical memory – Begin running program… each VA “mapped” to a PA • Reference to RAM-resident page: RAM accessed by hardware • Reference to a non-resident page: page fault, which traps to software handler, which – Fetches missing page from DISK into RAM – Adjusts MMU to map newly-loaded virtual page directly in RAM – If RAM is full, may have to replace (“swap out”) some little-used page to free up RAM for the new page. – Working set incrementally loaded via page faults, gradually evolves as pages are replaced…
6.004 Computation Structures L16: Virtual Memory, Slide #11 Simple Page Map Design
Virtual Memory Physical Memory D R PPN 1 1 0 X 0 X 1 1 0 X PAGE MAP One entry per virtual page – Resident bit R = 1 for pages stored in RAM, or 0 for non- resident (disk or unallocated). Page fault when R = 0 – Contains physical page number (PPN) of each resident page – Dirty bit D = 1 if we’ve changed this page since loading it from disk (and therefore need to write it to disk when it’s replaced)
6.004 Computation Structures L16: Virtual Memory, Slide #12 Example: Virtual à Physical Translation 16-entry 8-page VPN offset Page Map Phys. Mem. 4 8 VA 0 0 1 2 0x000 VPN 0x4 3 8 PA 1 -- 0 -- 0x0FC PPN 2 0 1 4 0x100 VPN 0x5 3 -- 0 -- 0x1FC Setup: 4 1 1 0 0x200 8 5 1 1 1 VPN 0x0 256 bytes/page (2 )
0x2FC 4 6 -- 0 -- 16 virtual pages (2 ) 0x300 7 -- 0 -- VPN 0xF 3 8 physical pages (2 ) 8 -- 0 -- 0x3FC 0x400 9 12-bit VA (4 vpn, 8 offset) -- 0 -- VPN 0x2
A -- 0 -- 0x4FC 11-bit PA (3 ppn, 8 offset) B -- 0 -- 0x500 VPN 0xE LRU page: VPN = 0xE C 1 1 7 0x5FC D 1 1 6 0x600 VPN 0xD LD(R31,0x2C8,R0): E 1 1 5 0x6FC F VA = 0x2C8, PA = ______0x4C8 0 1 3 0x700 VPN 0xC D R PPN 0x7FC VPN = 0x2 → PPN = 0x4
6.004 Computation Structures L16: Virtual Memory, Slide #13 Page Faults
6.004 Computation Structures L16: Virtual Memory, Slide #14 Page Faults
If a page does not have a valid Before Page Fault Physical translation, MMU causes a page Page Map Memory fault. OS page fault handler is invoked, handles miss: R=1 CPU – Choose a page to replace, write VPN 1 it back if dirty. Mark page as R=0 no longer resident – Are there any restrictions on Disk which page we can we select? ** – Read page from secondary After Page Fault storage into available physical Physical Page Map Memory page – Update page map to show new R=0 CPU page is resident VPN 5 – Return control to program, R=1 which re-executes memory access Disk ** https://en.wikipedia.org/wiki/Page_replacement_algorithm#Page_replacement_algorithms 6.004 Computation Structures L16: Virtual Memory, Slide #15 VPN offset Example: Page Fault 4 8 VA 3 8 PA 16-entry 8-page Setup: PPN Page Table Phys. Mem. 256 bytes/page (28) 0 0 1 2 0x000 VPN 0x4 4 1 -- 0 -- 16 virtual pages (2 ) 0x0FC 3 2 0 1 4 0x100 8 physical pages (2 ) VPN 0x5 3 -- 0 -- 0x1FC 12-bit VA (4 vpn, 8 offset) 4 1 1 0 0x200 11-bit PA (3 ppn, 8 offset) 5 1 1 1 VPN 0x0
0x2FC 6 1-- 01 --5 LRU page: VPN = 0xE 0x300 7 -- 0 -- VPN 0xF
8 -- 0 -- 0x3FC 0x400 ST(BP,-4,SP), SP = 0x604 9 -- 0 -- VPN 0x2 0x500 A VA = 0x600, PA = ______-- 0 -- 0x4FC VPN = 0x6 B -- 0 -- 0x500 VPN 0xE0x6 ⇒ Not resident, it’s on disk C 1 1 7 0x5FC ⇒ Choose page to replace (LRU = 0xE) D 1 1 6 0x600 D[0xE] = 1, so write 0x500-0x5FC to disk VPN 0xD ⇒ E --1 10 --5 ⇒ Mark VPN 0xE as no longer resident 0x6FC F 0 1 3 0x700 ⇒ Read in page VPN 0x6 from disk into VPN 0xC 0x500-0x5FC D R PPN 0x7FC ⇒ Set up page map for VPN 0x6 = PPN 0x5 ⇒ PA = 0x500 ⇒ This is a write so set D[0x6] = 1
6.004 Computation Structures L16: Virtual Memory, Slide #16 Virtual Memory: the CS View
int VtoP(int VPageNo,int PO) { Problem: Translate if (R[VPageNo] == 0) VIRTUAL ADDRESS PageFault(VPageNo); to PHYSICAL ADDRESS return (PPN[VPageNo] << p) | PO; } P Multiply by 2 , the page size /* Handle a missing page... */ Virtual Page # void PageFault(int VPageNo) { int i;
i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0;
Physical Page # PPN[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; }
6.004 Computation Structures L16: Virtual Memory, Slide #17 The HW/SW Balance
IDEA: • devote HARDWARE to high-traffic, performance-critical path • use (slow, cheap) SOFTWARE to handle exceptional cases
int VtoP(int VPageNo,int PO) { if (R[VPageNo] == 0)PageFault(VPageNo); hardware return (PPN[VPageNo] << p) | PO; }
/* Handle a missing page... */ void PageFault(int VPageNo) { int i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0; software PA[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; } HARDWARE performs address translation, detects page faults: • running program interrupted (“suspended”); • PageFault(…) is forced; • On return from PageFault; running program continues
6.004 Computation Structures L16: Virtual Memory, Slide #18 Building the MMU
6.004 Computation Structures L16: Virtual Memory, Slide #19 Page Map Arithmetic
p
D R PPN Vpage# PO 1 1 1 0 v 1 m Ppage# PO PAGEMAP PHYSICAL MEMORY
(v + p) bits in virtual address Typical page size: 4KB -16 KB (m + p) bits in physical address Typical (v+p): 32-64 bits 2v number of VIRTUAL pages (4GB-16EB) 2m number of PHYSICAL pages Typical (m+p): 30-40+ bits 2p bytes per physical page (1GB-1TB) 2v+p bytes in virtual memory m+p Long virtual addresses allow 2 bytes in physical memory ISAs to support larger v (m+2)2 bits in the page map memories à ISA longevity
6.004 Computation Structures L16: Virtual Memory, Slide #20 Example: Page Map Arithmetic
31 12 11 0 SUPPOSE... Virtual Page # 32-bit Virtual address (v+p)
20 30-bit physical address (m+p) 4 KB page size (p = 12) 12 THEN: # Physical Pages = ______218 = 256K 18 # Virtual Pages = ______220 # Page Map Entries = ______220 = 1M PhysPg # 20 29 12 11 0 # Bits In pagemap = 20*2______~ 20M
Use fast SRAM for page map??? OUCH!
6.004 Computation Structures L16: Virtual Memory, Slide #21 RAM-Resident Page Maps
• Small page maps can use dedicated SRAM… gets expensive for big ones! • Solution: Move page map to main memory:
Virtual Address Physical Memory
PROBLEM virtual Each memory reference page number now takes 2 accesses Page Map Ptr physical to physical memory! page number
+ Physical memory pages that hold page map entries
6.004 Computation Structures L16: Virtual Memory, Slide #22 Translation Look-aside Buffer (TLB)
• Problem: 2x performance hit… each memory reference now takes 2 accesses! • Solution: Cache the page map entries
Physical Memory Virtual Address IDEA: LOCALITY in memory reference patterns → virtual TLB hit page SUPER locality in number references to page Page Tbl Ptr physical map page number VARIATIONS: + • multi-level page map TLB miss • paging the page map! TLB: small cache of page table entries Associative lookup by VPN https://en.wikipedia.org/wiki/Translation_lookaside_buffer 6.004 Computation Structures L16: Virtual Memory, Slide #23 MMU Address Translation
32-bit virtual address 32
1 2 3 Page fault 20 12 20 12 (handled by SW)
PTBL virtual page number D R PPN Data
Look in TLB: VPN→PPN cache Usually implemented as a small fully-associative cache
6.004 Computation Structures L16: Virtual Memory, Slide #24 Putting it All Together: MMU with TLB
Suppose 1. How many pages can reside in • virtual memory of 232 bytes physical memory at one time? 214 • physical memory of 224 bytes 2. How many entries are there in • page size is 210 (1 K) bytes the page table? 222 • 4-entry fully associative TLB 3. How many bits per entry in the page table? (Assume each entry • [p = 10, v = 22, m = 14] has PPN, resident bit, dirty bit) 16 Page Map 4. How many pages does the page VPN R D PPN table occpy? 223 bytes = 213 pages TLB ------5. What fraction of virtual memory Tag Data 0 0 0 7 can be resident? 1/28 1 1 1 9 VPN | R D PPN 6. What is the physical address for 2 1 0 0 ----+------3 0 0 5 virtual address 0x1804? What 0 | 0 0 3 4 1 0 5 components are involved in the 6 | 1 1 2 5 0 0 3 translation? [VPN=6] 0x804 1 | 1 1 9 6 1 1 2 7. Same for 0x1080 [VPN=4] 0x1480 3 | 0 0 5 7 1 0 4 8. Same for 0x0FC [VPN=0] page fault 8 1 0 1 …
6.004 Computation Structures L16: Virtual Memory, Slide #25 Contexts
6.004 Computation Structures L16: Virtual Memory, Slide #26 Contexts
A context is a mapping of VIRTUAL to PHYSICAL locations, as dictated by contents of the page map:
Virtual Memory Physical Memory D R
X X
X PAGEMAP Several programs may be simultaneously loaded into main memory, each in its separate context: Virtual Physical Virtual Memory 1 Memory Memory 2
reload the page map?
map1 map2 6.004 Computation Structures L16: Virtual Memory, Slide #27 Contexts: A Sneak Preview
Virtual Physical Virtual Memory 1 Memory Memory 2 First Glimpse of a VIRTUAL MACHINE
1. TIMESHARING among several programs • Separate context for each program • OS loads appropriate context into page map when switching among programs
2. Separate context for OS “Kernel” (e.g., interrupt handlers)... • “Kernel” vs “User” contexts • Switch to Kernel context on interrupt; • Switch back on interrupt return.
6.004 Computation Structures L16: Virtual Memory, Slide #28 Memory Management & Protection
• Applications are written as if they have Address Space access to the entire virtual address space, without considering where other Code applications reside – Enables fixed conventions (e.g., program starts at 0x1000, stack is contiguous and grows up, …) without worrying about conflicts Stack
• OS Kernel controls all contexts, prevents programs from reading and writing into each other’s memory
Heap
6.004 Computation Structures L16: Virtual Memory, Slide #29 MMU Improvements
6.004 Computation Structures L16: Virtual Memory, Slide #30 Multi-level Page Maps 32-bit virtual address 32
1 2 3 Page fault (handled by SW) 20 12 10 10 12
PDIR D R PPN D R PPN virtual page number Page Directory Partial Page Table Data (1 page) (1 page)
Translation Instead of one page map with 220 entries, Look-aside Buffer “virtualize the page table”: One permanently-resident page holds “page directory” which has 1024 entries pointing to 1024-entry partial page tables in virtual memory! 6.004 Computation Structures L16: Virtual Memory, Slide #31 Rapid Context-Switching
Add a register to hold index of current context. To switch contexts: update Context # and PageTblPtr registers. Don’t have to flush TLB since each entry’s tag includes context # in addition to virtual page number
Context #Virtual Address Physical Memory
TLB hit Context & Virtual page number PageTblPtr physical page number
TLB miss +
6.004 Computation Structures L16: Virtual Memory, Slide #32 Using Caches with Virtual Memory
Virtually-Addressed Physically-Addressed Cache Cache Tags from virtual addresses Tags from physical addresses
Cache MMU Main MMU Cache Main CPU memory CPU memory
• FAST: No MMU time on • Avoids stale cache data HIT after context switch • Problem: Must flush • SLOW: MMU time on HIT cache after context switch
6.004 Computation Structures L16: Virtual Memory, Slide #33 Best of Both Worlds: Physically-Indexed, Virtually-Tagged Cache
MMU
CPU Main memory
Cache index comes entirely from address bits in page Cache offset – don’t need to wait for MMU to start cache lookup! OBSERVATION: If cache index bits are a subset of page offset bits, tag access in a physical cache can overlap page map access. Tag from cache is compared with physical page address from MMU to determine hit/miss.
Problem: Limits # of bits of cache index → increase cache capacity by increasing associativity 6.004 Computation Structures L16: Virtual Memory, Slide #34 Summary: Virtual Memory
• Goal 1: Exploit locality on a large scale – Programmers want a large, flat address space, but use a small portion! – Solution: Cache working set into RAM from disk – Basic implementation: MMU with single-level page map • Access loaded pages via fast hardware path • Load virtual memory on demand: page faults – Several optimizations: • Moving page map to RAM, for cost reasons • Translation Lookaside Buffer (TLB) to regain performance – Cache/VM interactions: Can cache physical or virtual locations • Goals 2 & 3: Ease memory management, protect multiple contexts from each other – We’ll see these in detail on the next lecture!
6.004 Computation Structures L16: Virtual Memory, Slide #35