16. Virtual Memory

16. Virtual Memory 6.004x Computation Structures Part 3 – Computer Organization Copyright © 2016 MIT EECS 6.004 Computation Structures L16: Virtual Memory, Slide #1 Even more Memory Hierarchy 6.004 Computation Structures L16: Virtual Memory, Slide #2 Reminder: A Typical Memory Hierarchy • Everything is a cache for something else Access time Capacity Managed By On the datapath Registers 1 cycle 1 KB Software/Compiler Level 1 Cache 2-4 cycles 32 KB Hardware Level 2 Cache 10 cycles 256 KB Hardware Level 3 Cache 40 cycles 10 MB Hardware On chip Main Memory 200 cycles 10 GB Software/OS Other chips Flash Drive 10-100us 100 GB Software/OS Mechanical Hard Disk 10ms 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #3 Reminder: Hardware Caches Direct-mapped N-way set-associative Fully associative address address address N Each address maps Each address can map Any location can to a single location in to any of N locations map to any (ways) of a single set, address the cache, given by given by its index bits index bits Other implementation choices: – Block size – Replacement policy (LRU, Random, …) – Write policy (write-through, write-behind, write-back) 6.004 Computation Structures L16: Virtual Memory, Slide #4 Reminder: A Typical Memory Hierarchy • Everything is a cache for something else Access time Capacity Managed By On the datapath Registers 1 cycle 1 KB Software/Compiler Level 1 Cache 2-4 cyclesBefore: 32 KB Hardware Level 2 Cache 10 Hardwarecycles 256 KB Hardware Level 3 Cache 40 cyclesCaches 10 MB Hardware On chip Main Memory 200 cycles 10 GB Software/OS Other TODAY: chips Flash Drive 10-100usVirtual 100 GB Software/OS Mechanical Hard Disk 10msMemory 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #5 Extending the Memory Hierarchy Main Secondary Cache memory storage CPU Fast SRAM DRAM Hard disk ~1-10ns ~100ns ~10ms ~10 KB – 10 MB ~10 GB ~1 TB • Problem: DRAM vs disk has much more extreme differences than SRAM vs DRAM – Access latencies: • DRAM ~10-100x slower than SRAM • Disk ~100,000x slower than DRAM – Importance of sequential accesses: • DRAM: Fetching successive words ~5x faster than first word • Disk: Fetching successive words ~100,000x faster than first word • Result: Design decisions driven by enormous cost of misses 6.004 Computation Structures L16: Virtual Memory, Slide #6 Impact of Enormous Miss Penalty • If DRAM was to be organized like an SRAM cache, how should we design it? – Associativity: High, minimize miss ratio – Block size: Large, amortize cost of a miss over multiple words (locality) – Write policy: Write back, minimize number of writes • Is there anything good about misses being so expensive? – We can handle them in software! What’s 1000 extra instructions (~1us) vs 10ms? – Approach: Handle hits in hardware, misses in software • Simpler implementation, more flexibility 6.004 Computation Structures L16: Virtual Memory, Slide #7 Basics of Virtual Memory 6.004 Computation Structures L16: Virtual Memory, Slide #8 Virtual Memory • Two kinds of addresses: – CPU uses virtual addresses – Main memory uses physical addresses • Hardware translates virtual addresses to physical addresses via an operating system (OS)-managed table, the page map Main Memory management unit memory CPU Virtual MMU Physical DRAM addresses addresses (VAs) 0 (PAs) 0 1 1 CPU Not all virtual N-1 addresses may have a translation P-1 Disk Page Map 6.004 Computation Structures L16: Virtual Memory, Slide #9 Virtual Memory Implementation: Paging 32-p p • Divide physical memory in fixed-size blocks, called pages Virtual Page # offset – Typical page size (2p): 4KB -16 KB – Virtual address: Virtual page number + offset bits – Physical address: Physical page number + offset bits – Why use lower bits as offset? Physical Page #offset • MMU maps virtual pages to physical pages – Use page map to perform translation – Cause a page fault (a miss) if virtual page is not resident in physical memory. Using main memory as a page cache = paging or demand paging 6.004 Computation Structures L16: Virtual Memory, Slide #10 Demand Paging Basic idea: – Start with all virtual pages in secondary storage, MMU “empty”, ie, there are no pages resident in physical memory – Begin running program… each VA “mapped” to a PA • Reference to RAM-resident page: RAM accessed by hardware • Reference to a non-resident page: page fault, which traps to software handler, which – Fetches missing page from DISK into RAM – Adjusts MMU to map newly-loaded virtual page directly in RAM – If RAM is full, may have to replace (“swap out”) some little-used page to free up RAM for the new page. – Working set incrementally loaded via page faults, gradually evolves as pages are replaced… 6.004 Computation Structures L16: Virtual Memory, Slide #11 Simple Page Map Design Virtual Memory Physical Memory D R PPN 1 1 0 X 0 X 1 1 0 X PAGE MAP One entry per virtual page – Resident bit R = 1 for pages stored in RAM, or 0 for non- resident (disk or unallocated). Page fault when R = 0 – Contains physical page number (PPN) of each resident page – Dirty bit D = 1 if we’ve changed this page since loading it from disk (and therefore need to write it to disk when it’s replaced) 6.004 Computation Structures L16: Virtual Memory, Slide #12 Example: Virtual à Physical Translation 16-entry 8-page VPN offset Page Map Phys. Mem. 4 8 VA 0 0 1 2 0x000 VPN 0x4 3 8 PA 1 -- 0 -- 0x0FC PPN 2 0 1 4 0x100 VPN 0x5 3 -- 0 -- 0x1FC Setup: 4 1 1 0 0x200 8 5 1 1 1 VPN 0x0 256 bytes/page (2 ) 0x2FC 4 6 -- 0 -- 16 virtual pages (2 ) 0x300 7 -- 0 -- VPN 0xF 3 8 physical pages (2 ) 8 -- 0 -- 0x3FC 0x400 9 12-bit VA (4 vpn, 8 offset) -- 0 -- VPN 0x2 A -- 0 -- 0x4FC 11-bit PA (3 ppn, 8 offset) B -- 0 -- 0x500 VPN 0xE LRU page: VPN = 0xE C 1 1 7 0x5FC D 1 1 6 0x600 VPN 0xD LD(R31,0x2C8,R0): E 1 1 5 0x6FC F VA = 0x2C8, PA = _______0x4C8 0 1 3 0x700 VPN 0xC D R PPN 0x7FC VPN = 0x2 → PPN = 0x4 6.004 Computation Structures L16: Virtual Memory, Slide #13 Page Faults 6.004 Computation Structures L16: Virtual Memory, Slide #14 Page Faults If a page does not have a valid Before Page Fault Physical translation, MMU causes a page Page Map Memory fault. OS page fault handler is invoked, handles miss: R=1 CPU – Choose a page to replace, write VPN 1 it back if dirty. Mark page as R=0 no longer resident – Are there any restrictions on Disk which page we can we select? ** – Read page from secondary After Page Fault storage into available physical Physical Page Map Memory page – Update page map to show new R=0 CPU page is resident VPN 5 – Return control to program, R=1 which re-executes memory access Disk ** https://en.wikipedia.org/wiki/Page_replacement_algorithm#Page_replacement_algorithms 6.004 Computation Structures L16: Virtual Memory, Slide #15 VPN offset Example: Page Fault 4 8 VA 3 8 PA 16-entry 8-page Setup: PPN Page Table Phys. Mem. 256 bytes/page (28) 0 0 1 2 0x000 VPN 0x4 4 1 -- 0 -- 16 virtual pages (2 ) 0x0FC 3 2 0 1 4 0x100 8 physical pages (2 ) VPN 0x5 3 -- 0 -- 0x1FC 12-bit VA (4 vpn, 8 offset) 4 1 1 0 0x200 11-bit PA (3 ppn, 8 offset) 5 1 1 1 VPN 0x0 0x2FC 6 1-- 01 --5 LRU page: VPN = 0xE 0x300 7 -- 0 -- VPN 0xF 8 -- 0 -- 0x3FC 0x400 ST(BP,-4,SP), SP = 0x604 9 -- 0 -- VPN 0x2 0x500 A VA = 0x600, PA = _______ -- 0 -- 0x4FC VPN = 0x6 B -- 0 -- 0x500 VPN 0xE0x6 ⇒ Not resident, it’s on disk C 1 1 7 0x5FC ⇒ Choose page to replace (LRU = 0xE) D 1 1 6 0x600 D[0xE] = 1, so write 0x500-0x5FC to disk VPN 0xD ⇒ E --1 10 --5 ⇒ Mark VPN 0xE as no longer resident 0x6FC F 0 1 3 0x700 ⇒ Read in page VPN 0x6 from disk into VPN 0xC 0x500-0x5FC D R PPN 0x7FC ⇒ Set up page map for VPN 0x6 = PPN 0x5 ⇒ PA = 0x500 ⇒ This is a write so set D[0x6] = 1 6.004 Computation Structures L16: Virtual Memory, Slide #16 Virtual Memory: the CS View int VtoP(int VPageNo,int PO) { Problem: Translate if (R[VPageNo] == 0) VIRTUAL ADDRESS PageFault(VPageNo); to PHYSICAL ADDRESS return (PPN[VPageNo] << p) | PO; } P Multiply by 2 , the page size /* Handle a missing page... */ Virtual Page # void PageFault(int VPageNo) { int i; i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0; Physical Page # PPN[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; } 6.004 Computation Structures L16: Virtual Memory, Slide #17 The HW/SW Balance IDEA: • devote HARDWARE to high-traffic, performance-critical path • use (slow, cheap) SOFTWARE to handle exceptional cases int VtoP(int VPageNo,int PO) { if (R[VPageNo] == 0)PageFault(VPageNo); hardware return (PPN[VPageNo] << p) | PO; } /* Handle a missing page... */ void PageFault(int VPageNo) { int i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0; software PA[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; } HARDWARE performs address translation, detects page faults: • running program interrupted (“suspended”); • PageFault(…) is forced; • On return from PageFault; running program continues 6.004 Computation Structures L16: Virtual Memory, Slide #18 Building the MMU 6.004 Computation Structures L16: Virtual Memory, Slide #19 Page Map Arithmetic p D R PPN Vpage# PO 1 1 1 0 v 1 m Ppage# PO PAGEMAP PHYSICAL MEMORY (v + p) bits in virtual address Typical page size: 4KB -16 KB (m + p) bits in physical address Typical (v+p): 32-64 bits 2v number of VIRTUAL pages (4GB-16EB) 2m number of PHYSICAL pages Typical (m+p): 30-40+ bits 2p bytes per physical page (1GB-1TB) 2v+p bytes in virtual memory m+p Long virtual addresses allow 2 bytes in physical memory ISAs to support larger v (m+2)2 bits in the page map memories à ISA longevity 6.004 Computation Structures L16: Virtual Memory, Slide #20 Example: Page Map Arithmetic 31 12 11 0 SUPPOSE..

16. Virtual Memory

Virtual Memory

The Case for Compressed Caching in Virtual Memory Systems

Memory Protection at Option

Virtual Memory

Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES from the WINDOWS 10 VIRTUAL STORE 2

HALO: Post-Link Heap-Layout Optimisation

Virtual Memory - Paging

Chapter 4 Memory Management and Virtual Memory Asst.Prof.Dr

X86 Memory Protection and Translation

Operating Systems

Virtual Memory in X86

A Hybrid Swapping Scheme Based on Per-Process Reclaim for Performance Improvement of Android Smartphones (August 2018)