<<

16. Virtual Memory

6.004x Computation Structures Part 3 – Computer Organization

Copyright © 2016 MIT EECS

6.004 Computation Structures L16: Virtual Memory, Slide #1 Even more

6.004 Computation Structures L16: Virtual Memory, Slide #2 Reminder: A Typical Memory Hierarchy • Everything is a for something else

Access time Capacity Managed By On the Registers 1 cycle 1 KB Software/Compiler

Level 1 Cache 2-4 cycles 32 KB Hardware

Level 2 Cache 10 cycles 256 KB Hardware

Level 3 Cache 40 cycles 10 MB Hardware On chip

Main Memory 200 cycles 10 GB Software/OS Other chips Flash Drive 10-100us 100 GB Software/OS

Mechanical Hard Disk 10ms 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #3 Reminder: Hardware Caches

Direct-mapped N-way set-associative Fully associative address address address N

Each address maps Each address can map Any location can to a single location in to any of N locations map to any (ways) of a single set, address the cache, given by given by its index bits index bits Other implementation choices: – Block size – Replacement policy (LRU, Random, …) – Write policy (write-through, write-behind, write-back)

6.004 Computation Structures L16: Virtual Memory, Slide #4 Reminder: A Typical Memory Hierarchy • Everything is a cache for something else

Access time Capacity Managed By On the datapath Registers 1 cycle 1 KB Software/Compiler

Level 1 Cache 2-4 cyclesBefore: 32 KB Hardware Level 2 Cache 10 Hardwarecycles 256 KB Hardware

Level 3 Cache 40 cyclesCaches 10 MB Hardware On chip

Main Memory 200 cycles 10 GB Software/OS Other TODAY: chips Flash Drive 10-100usVirtual 100 GB Software/OS

Mechanical Hard Disk 10msMemory 1 TB Software/OS devices 6.004 Computation Structures L16: Virtual Memory, Slide #5 Extending the Memory Hierarchy Main Secondary Cache memory storage CPU Fast SRAM DRAM Hard disk ~1-10ns ~100ns ~10ms ~10 KB – 10 MB ~10 GB ~1 TB

• Problem: DRAM vs disk has much more extreme differences than SRAM vs DRAM – Access latencies: • DRAM ~10-100x slower than SRAM • Disk ~100,000x slower than DRAM – Importance of sequential accesses: • DRAM: Fetching successive words ~5x faster than first word • Disk: Fetching successive words ~100,000x faster than first word • Result: Design decisions driven by enormous cost of misses

6.004 Computation Structures L16: Virtual Memory, Slide #6 Impact of Enormous Miss Penalty

• If DRAM was to be organized like an SRAM cache, how should we design it? – Associativity: High, minimize miss ratio – Block size: Large, amortize cost of a miss over multiple words (locality) – Write policy: Write back, minimize number of writes

• Is there anything good about misses being so expensive? – We can them in software! What’s 1000 extra instructions (~1us) vs 10ms? – Approach: Handle hits in hardware, misses in software • Simpler implementation, more flexibility

6.004 Computation Structures L16: Virtual Memory, Slide #7 Basics of Virtual Memory

6.004 Computation Structures L16: Virtual Memory, Slide #8 Virtual Memory

• Two kinds of addresses: – CPU uses virtual addresses – Main memory uses physical addresses • Hardware translates virtual addresses to physical addresses via an (OS)-managed table, the map Main unit memory

CPU Virtual MMU Physical DRAM addresses addresses (VAs) 0 (PAs) 0 1 1

CPU Not all virtual N-1 addresses may have a translation P-1 Disk Page Map 6.004 Computation Structures L16: Virtual Memory, Slide #9 Virtual Memory Implementation: Paging

32-p p • Divide physical memory in fixed-size blocks, called pages Virtual Page # offset – Typical page size (2p): 4KB -16 KB – Virtual address: Virtual page number + offset bits – : Physical page number + offset bits – Why use lower bits as offset? Physical Page #offset

• MMU maps virtual pages to physical pages – Use page map to perform translation – Cause a (a miss) if virtual page is not resident in physical memory.

Using main memory as a = paging or

6.004 Computation Structures L16: Virtual Memory, Slide #10 Demand Paging

Basic idea: – Start with all virtual pages in secondary storage, MMU “empty”, ie, there are no pages resident in physical memory – Begin running program… each VA “mapped” to a PA • Reference to RAM-resident page: RAM accessed by hardware • Reference to a non-resident page: page fault, which traps to software handler, which – Fetches missing page from DISK into RAM – Adjusts MMU to map newly-loaded virtual page directly in RAM – If RAM is full, may have to replace (“swap out”) some little-used page to free up RAM for the new page. – incrementally loaded via page faults, gradually evolves as pages are replaced…

6.004 Computation Structures L16: Virtual Memory, Slide #11 Simple Page Map Design

Virtual Memory Physical Memory D R PPN 1 1 0 X 0 X 1 1 0 X PAGE MAP One entry per virtual page – Resident bit R = 1 for pages stored in RAM, or 0 for non- resident (disk or unallocated). Page fault when R = 0 – Contains physical page number (PPN) of each resident page – Dirty bit D = 1 if we’ve changed this page since loading it from disk (and therefore need to write it to disk when it’s replaced)

6.004 Computation Structures L16: Virtual Memory, Slide #12 Example: Virtual à Physical Translation 16-entry 8-page VPN offset Page Map Phys. Mem. 4 8 VA 0 0 1 2 0x000 VPN 0x4 3 8 PA 1 -- 0 -- 0x0FC PPN 2 0 1 4 0x100 VPN 0x5 3 -- 0 -- 0x1FC Setup: 4 1 1 0 0x200 8 5 1 1 1 VPN 0x0 256 bytes/page (2 )

0x2FC 4 6 -- 0 -- 16 virtual pages (2 ) 0x300 7 -- 0 -- VPN 0xF 3 8 physical pages (2 ) 8 -- 0 -- 0x3FC 0x400 9 12-bit VA (4 vpn, 8 offset) -- 0 -- VPN 0x2

A -- 0 -- 0x4FC 11-bit PA (3 ppn, 8 offset) B -- 0 -- 0x500 VPN 0xE LRU page: VPN = 0xE C 1 1 7 0x5FC D 1 1 6 0x600 VPN 0xD LD(R31,0x2C8,R0): E 1 1 5 0x6FC F VA = 0x2C8, PA = ______0x4C8 0 1 3 0x700 VPN 0xC D R PPN 0x7FC VPN = 0x2 → PPN = 0x4

6.004 Computation Structures L16: Virtual Memory, Slide #13 Page Faults

6.004 Computation Structures L16: Virtual Memory, Slide #14 Page Faults

If a page does not have a valid Before Page Fault Physical translation, MMU causes a page Page Map Memory fault. OS page fault handler is invoked, handles miss: R=1 CPU – Choose a page to replace, write VPN 1 it back if dirty. Mark page as R=0 no longer resident – Are there any restrictions on Disk which page we can we select? ** – Read page from secondary After Page Fault storage into available physical Physical Page Map Memory page – Update page map to show new R=0 CPU page is resident VPN 5 – Return control to program, R=1 which re-executes memory access Disk ** https://en.wikipedia.org/wiki/Page_replacement_algorithm#Page_replacement_algorithms 6.004 Computation Structures L16: Virtual Memory, Slide #15 VPN offset Example: Page Fault 4 8 VA 3 8 PA 16-entry 8-page Setup: PPN Phys. Mem. 256 bytes/page (28) 0 0 1 2 0x000 VPN 0x4 4 1 -- 0 -- 16 virtual pages (2 ) 0x0FC 3 2 0 1 4 0x100 8 physical pages (2 ) VPN 0x5 3 -- 0 -- 0x1FC 12-bit VA (4 vpn, 8 offset) 4 1 1 0 0x200 11-bit PA (3 ppn, 8 offset) 5 1 1 1 VPN 0x0

0x2FC 6 1-- 01 --5 LRU page: VPN = 0xE 0x300 7 -- 0 -- VPN 0xF

8 -- 0 -- 0x3FC 0x400 ST(BP,-4,SP), SP = 0x604 9 -- 0 -- VPN 0x2 0x500 A VA = 0x600, PA = ______-- 0 -- 0x4FC VPN = 0x6 B -- 0 -- 0x500 VPN 0xE0x6 ⇒ Not resident, it’s on disk C 1 1 7 0x5FC ⇒ Choose page to replace (LRU = 0xE) D 1 1 6 0x600 D[0xE] = 1, so write 0x500-0x5FC to disk VPN 0xD ⇒ E --1 10 --5 ⇒ Mark VPN 0xE as no longer resident 0x6FC F 0 1 3 0x700 ⇒ Read in page VPN 0x6 from disk into VPN 0xC 0x500-0x5FC D R PPN 0x7FC ⇒ Set up page map for VPN 0x6 = PPN 0x5 ⇒ PA = 0x500 ⇒ This is a write so set D[0x6] = 1

6.004 Computation Structures L16: Virtual Memory, Slide #16 Virtual Memory: the CS View

int VtoP(int VPageNo,int PO) { Problem: Translate if (R[VPageNo] == 0) VIRTUAL ADDRESS PageFault(VPageNo); to PHYSICAL ADDRESS return (PPN[VPageNo] << p) | PO; } P Multiply by 2 , the page size /* Handle a missing page... */ Virtual Page # void PageFault(int VPageNo) { int i;

i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0;

Physical Page # PPN[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; }

6.004 Computation Structures L16: Virtual Memory, Slide #17 The HW/SW Balance

IDEA: • devote HARDWARE to high-traffic, performance-critical path • use (slow, cheap) SOFTWARE to handle exceptional cases

int VtoP(int VPageNo,int PO) { if (R[VPageNo] == 0)PageFault(VPageNo); hardware return (PPN[VPageNo] << p) | PO; }

/* Handle a missing page... */ void PageFault(int VPageNo) { int i = SelectLRUPage(); if (D[i] == 1) WritePage(DiskAdr[i],PPN[i]); R[i] = 0; software PA[VPageNo] = PPN[i]; ReadPage(DiskAdr[VPageNo],PPN[i]); R[VPageNo] = 1; D[VPageNo] = 0; } HARDWARE performs address translation, detects page faults: • running program interrupted (“suspended”); • PageFault(…) is forced; • On return from PageFault; running program continues

6.004 Computation Structures L16: Virtual Memory, Slide #18 Building the MMU

6.004 Computation Structures L16: Virtual Memory, Slide #19 Page Map Arithmetic

p

D R PPN Vpage# PO 1 1 1 0 v 1 m Ppage# PO PAGEMAP PHYSICAL MEMORY

(v + p) bits in virtual address Typical page size: 4KB -16 KB (m + p) bits in physical address Typical (v+p): 32-64 bits 2v number of VIRTUAL pages (4GB-16EB) 2m number of PHYSICAL pages Typical (m+p): 30-40+ bits 2p bytes per physical page (1GB-1TB) 2v+p bytes in virtual memory m+p Long virtual addresses allow 2 bytes in physical memory ISAs to support larger v (m+2)2 bits in the page map memories à ISA longevity

6.004 Computation Structures L16: Virtual Memory, Slide #20 Example: Page Map Arithmetic

31 12 11 0 SUPPOSE... Virtual Page # 32-bit Virtual address (v+p)

20 30-bit physical address (m+p) 4 KB page size (p = 12) 12 THEN: # Physical Pages = ______218 = 256K 18 # Virtual Pages = ______220 # Page Map Entries = ______220 = 1M PhysPg # 20 29 12 11 0 # Bits In pagemap = 20*2______~ 20M

Use fast SRAM for page map??? OUCH!

6.004 Computation Structures L16: Virtual Memory, Slide #21 RAM-Resident Page Maps

• Small page maps can use dedicated SRAM… gets expensive for big ones! • Solution: Move page map to main memory:

Virtual Address Physical Memory

PROBLEM virtual Each memory reference page number now takes 2 accesses Page Map Ptr physical to physical memory! page number

+ Physical memory pages that hold page map entries

6.004 Computation Structures L16: Virtual Memory, Slide #22 Translation Look-aside Buffer (TLB)

• Problem: 2x performance hit… each memory reference now takes 2 accesses! • Solution: Cache the page map entries

Physical Memory Virtual Address IDEA: LOCALITY in memory reference patterns → virtual TLB hit page SUPER locality in number references to page Page Tbl Ptr physical map page number VARIATIONS: + • multi-level page map TLB miss • paging the page map! TLB: small cache of page table entries Associative lookup by VPN https://en.wikipedia.org/wiki/Translation_lookaside_buffer 6.004 Computation Structures L16: Virtual Memory, Slide #23 MMU Address Translation

32-bit virtual address 32

1 2 3 Page fault 20 12 20 12 (handled by SW)

PTBL virtual page number D R PPN Data

Look in TLB: VPN→PPN cache Usually implemented as a small fully-associative cache

6.004 Computation Structures L16: Virtual Memory, Slide #24 Putting it All Together: MMU with TLB

Suppose 1. How many pages can reside in • virtual memory of 232 bytes physical memory at one time? 214 • physical memory of 224 bytes 2. How many entries are there in • page size is 210 (1 K) bytes the page table? 222 • 4-entry fully associative TLB 3. How many bits per entry in the page table? (Assume each entry • [p = 10, v = 22, m = 14] has PPN, resident bit, dirty bit) 16 Page Map 4. How many pages does the page VPN R D PPN table occpy? 223 bytes = 213 pages TLB ------5. What fraction of virtual memory Tag Data 0 0 0 7 can be resident? 1/28 1 1 1 9 VPN | R D PPN 6. What is the physical address for 2 1 0 0 ----+------3 0 0 5 virtual address 0x1804? What 0 | 0 0 3 4 1 0 5 components are involved in the 6 | 1 1 2 5 0 0 3 translation? [VPN=6] 0x804 1 | 1 1 9 6 1 1 2 7. Same for 0x1080 [VPN=4] 0x1480 3 | 0 0 5 7 1 0 4 8. Same for 0x0FC [VPN=0] page fault 8 1 0 1 …

6.004 Computation Structures L16: Virtual Memory, Slide #25 Contexts

6.004 Computation Structures L16: Virtual Memory, Slide #26 Contexts

A context is a mapping of VIRTUAL to PHYSICAL locations, as dictated by contents of the page map:

Virtual Memory Physical Memory D R

X X

X PAGEMAP Several programs may be simultaneously loaded into main memory, each in its separate context: Virtual Physical Virtual Memory 1 Memory Memory 2

”:

reload the page map?

map1 map2 6.004 Computation Structures L16: Virtual Memory, Slide #27 Contexts: A Sneak Preview

Virtual Physical Virtual Memory 1 Memory Memory 2 First Glimpse of a

1. TIMESHARING among several programs • Separate context for each program • OS loads appropriate context into page map when switching among programs

2. Separate context for OS “Kernel” (e.g., handlers)... • “Kernel” vs “User” contexts • Switch to Kernel context on interrupt; • Switch back on interrupt return.

6.004 Computation Structures L16: Virtual Memory, Slide #28 Memory Management & Protection

• Applications are written as if they have access to the entire , without considering where other Code applications reside – Enables fixed conventions (e.g., program starts at 0x1000, stack is contiguous and grows up, …) without worrying about conflicts Stack

• OS Kernel controls all contexts, prevents programs from reading and writing into each other’s memory

Heap

6.004 Computation Structures L16: Virtual Memory, Slide #29 MMU Improvements

6.004 Computation Structures L16: Virtual Memory, Slide #30 Multi-level Page Maps 32-bit virtual address 32

1 2 3 Page fault (handled by SW) 20 12 10 10 12

PDIR D R PPN D R PPN virtual page number Page Directory Partial Page Table Data (1 page) (1 page)

Translation Instead of one page map with 220 entries, Look-aside Buffer “virtualize the page table”: One permanently-resident page holds “page directory” which has 1024 entries pointing to 1024-entry partial page tables in virtual memory! 6.004 Computation Structures L16: Virtual Memory, Slide #31 Rapid Context-Switching

Add a register to hold index of current context. To switch contexts: update Context # and PageTblPtr registers. Don’t have to flush TLB since each entry’s tag includes context # in addition to virtual page number

Context #Virtual Address Physical Memory

TLB hit Context & Virtual page number PageTblPtr physical page number

TLB miss +

6.004 Computation Structures L16: Virtual Memory, Slide #32 Using Caches with Virtual Memory

Virtually-Addressed Physically-Addressed Cache Cache Tags from virtual addresses Tags from physical addresses

Cache MMU Main MMU Cache Main CPU memory CPU memory

• FAST: No MMU time on • Avoids stale cache data HIT after context switch • Problem: Must flush • SLOW: MMU time on HIT cache after context switch

6.004 Computation Structures L16: Virtual Memory, Slide #33 Best of Both Worlds: Physically-Indexed, Virtually-Tagged Cache

MMU

CPU Main memory

Cache index comes entirely from address bits in page Cache offset – don’t need to wait for MMU to start cache lookup! OBSERVATION: If cache index bits are a subset of page offset bits, tag access in a physical cache can overlap page map access. Tag from cache is compared with physical page address from MMU to determine hit/miss.

Problem: Limits # of bits of cache index → increase cache capacity by increasing associativity 6.004 Computation Structures L16: Virtual Memory, Slide #34 Summary: Virtual Memory

• Goal 1: Exploit locality on a large scale – Programmers want a large, flat address space, but use a small portion! – Solution: Cache working set into RAM from disk – Basic implementation: MMU with single-level page map • Access loaded pages via fast hardware path • Load virtual memory on demand: page faults – Several optimizations: • Moving page map to RAM, for cost reasons • Translation Lookaside Buffer (TLB) to regain performance – Cache/VM interactions: Can cache physical or virtual locations • Goals 2 & 3: Ease memory management, protect multiple contexts from each other – We’ll see these in detail on the next lecture!

6.004 Computation Structures L16: Virtual Memory, Slide #35