Opera ng Systems CMPSCI 377 Virtual Memory & Paging Emery Berger and Mark Corner University of Massachuse s Amherst Virtual vs. Physical Memory . Apps don’t access physical memory – Well, not directly . Apps use virtual memory – Addresses start at 0 – One level of indirec on – Address you see is not “real” address
. Any ideas why?
2 Memory Pages . Programs use memory as individual bytes . OS manages groups of bytes: pages – typically 4kB, 8kB – Why? (think Tetris with all squares) – Applies this to virtual and physical memory • Physical pages usually called frames
A Mapping Virtual to Physical
. Note this is simplified and the data here includes the heap, not the typical data segment…
4 Why Virtual Memory? . Why? – Simpler • Everyone gets illusion of whole address space – Isola on • Every process protected from every other – Op miza on • Reduces space requirements
5 Typical Virtual Memory Layout . Some things grow – Must leave room! . Mmap and heap spaces – Mmap increases mmap – Brk increases heap . Other layouts possible Quick Quiz! . Are arrays con nuous in memory? – Physical memory? – Virtual memory?
. Where does the code for a program live? Memory Management Unit . Programs issue loads and stores . What kind of addresses are these? . MMU Translates virtual to physical addresses – Maintains page table (big hash table): – Almost always in HW… Why?
Virtual Physical Program Address MMU Address Memory
Page Table Page Tables . Table of transla ons – virtual pages -> physical pages . One page table per process . One page table entry per virtual page . How? – Programs issue virtual address – Find virtual page (how?) – Lookup physical page, add offset Page Table Entries . Do all virtual pages -> physical page? – Valid and Invalid bits . PTEs have lots of other informa on – For instance some pages can only be read Address Transla on . Powers of 2: – Virtual address space: size 2^m – Page size 2^n . Page#: High m-n bits of virtual address . Lower n bits select offset in page
11 Quick Ac vity . How much mem does a page table need? – 4kB pages, 32 bit address space – page table entry (PTE) uses 4 bytes . 2^32/2^12*4=2^22 bytes=4MB – Is this a problem? – Isn’t this per process? – What about a 64 bit address space? . Any ideas how to fix this? Mul -Level Page Tables . Use a mul -level page table
A A
Level 0 Table Level 1 Table
A
Level 1 Table Quick Ac vity . How much mem does a page table need? – 4kB pages, 32 bit address space – Two level page table – 20bits = 10 bits each level – page table entry (PTE) uses 4 bytes – Only first page of program is valid . 2^10*4+2^10*4=2^13 bytes=8kB
. Isn’t this slow? Transla on Lookaside Buffer (TLB) . TLB: fast, fully associa ve memory – Caches page table entries – Stores page numbers (key) and frame (value) in which they are stored . Assump on: locality of reference – Locality in memory accesses = locality in address transla on . TLB sizes: 8 to 2048 entries – Powers of 2 simplifies transla on of virtual to physical addresses
15 Virtual Memory is an Illusion! . How much memory does a process have? – Do all processes have this? . Key idea: use RAM as cache for disk – OS transparently moves pages . Requires locality: – Working set must fit in RAM • memory referenced recently – If not: thrashing (nothing but disk traffic) Paging
17 Paging + Locality . Most programs obey 90/10 “rule” – 90% of me spent accessing 10% of A memory A . Exploit this rule: B – Only keep “live” parts B of process in memory
18 Key Policy Decisions . Two key ques ons: (for any cache): – When do we read page from disk? – When do we write page to disk?
19 Reading Pages . Read on-demand: – OS loads page on its first reference – May force an evic on of page in RAM – Pause while loading page = page fault . Can also perform pre-paging: – OS guesses which page will next be needed, and begins loading it . Most systems just do demand paging . What about writes?
20 Demand Paging . On every reference, check if page is in memory (resident bit in page table) – Who is doing this? . If not: trap to OS – How does this work in HW? . OS: – Selects vic m page to be replaced – Writes vic m page if necessary, marks non-resident – Begins loading new page from disk – OS can switch to another process • more on this later
21 Swap Space . Swap space = where vic m pages go – Par on or special file reserved on disk
. Size of reserved swap space limits what?
22 Tricks with Page Tables . Do all pages of memory end up in swap? . Parts of address space mapped into files – see man pages for mmap Overview . A Day in the Life of a Page – Alloca on – Use – Evic on – Reuse . Terms: Resident and Non-resident
24 A Day in the Life of a Page
Allocate some memory
char * x = new char[16];
0x40001000
0x40001040 → 0x4000104F
virtual memory layout
25 A Day in the Life of a Page
Update page tables
char * x = new char[16];
0x40001000
0x40001040 → 0x4000104F
virtual physical memory memory layout layout
26 A Day in the Life of a Page
Write contents – dirty page
strcpy(x, “hello”);
0x40001000
0x40001040 → 0x4000104F
virtual physical memory memory layout layout
27 A Day in the Life of a Page
Other processes fill up memory…
virtual physical memory memory layout layout
28 A Day in the Life of a Page
Forcing our page to be evicted (paged out)
virtual physical swap memory memory space layout layout (disk)
29 A Day in the Life of a Page
Now page nonresident & protected
virtual physical swap memory memory space layout layout (disk)
30 A Day in the Life of a Page
Touch page – swap it in
y[0] = x[0];
0x40001000
0x40001040 → 0x4000104F
virtual physical swap memory memory space layout layout (disk)
31 A Day in the Life of a Page
Touch page – swap it in
y[0] = x[0];
0x40001000
0x40001040 → 0x4000104F
virtual physical swap memory memory space layout layout (disk)
32 Tricks with Page Tables: Sharing . Paging allows sharing of memory across processes – Reduces memory requirements . Shared stuff includes code, data – Code typically R/O
33 Tricks with Page Tables: COW . Copy on write (COW) – Just copy page tables – Make all pages read-only . What if process changes mem?
. All processes are created this way! Alloca ng Pages . ul mately from sbrk or mmap . Sbrk increases # of valid pages – Increases the heap . Mmap maps address space to file – Increases the mmap space . Oddity: – Allocators can use either mmap or brk to get pages – You will use mmap
. What does mmap /dev/zero mean? – Think about COW Overview
. Replacement policies – Comparison
36 Cost of Paging . Usually in algorithms, we pick algorithm with best asympto c worst-case – Paging: worst-case analysis useless! – Easy to construct adversary: every page requires page fault
A, B, C, D, E, F, G, H, I, J, A...
size of available memory
37 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
A A, B, C, D, E, F, G, H, I, J, A...
size of available memory
38 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
A B A, B, C, D, E, F, G, H, I, J, A...
size of available memory
39 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
A B C A, B, C, D, E, F, G, H, I, J, A...
size of available memory
40 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
A B C D A, B, C, D, E, F, G, H, I, J, A...
size of available memory
41 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
A B C D E A, B, C, D, E, F, G, H, I, J, A...
size of available memory
42 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
F B C D E A, B, C, D, E, F, G, H, I, J, A...
size of available memory
43 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault
F G H I J A, B, C, D, E, F, G, H, I, J, A...
size of available memory
44 Op mal Replacement (MIN/OPT) . Evict page accessed furthest in future – Op mal page replacement algorithm • Invented by Belady (“MIN”), a.k.a. “OPT” . Provably op mal policy – Just one small problem... Requires predic ng the future – Useful point of comparison • How far from op mal
45 Quick Ac vity: OPT
sequence of page accesses
contents of page frames . Page faults: 5
46 Least-Recently Used (LRU) . Evict page not used in longest me (least-recently used) – Approximates OPT • If recent past ≈ predictor of future – Variant used in all real opera ng systems
47 Quick Ac vity: LRU example
. Page faults: ?
48 LRU example
. Page faults: 5
49 LRU, example II
. Page faults: ?
50 LRU, example II
. Page faults: 12! – Loop: well-known worst-case for LRU
51 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A, B, C, D, A, B, C, D, ...
size of available memory
52 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A A, B, C, D, A, B, C, D, ...
size of available memory
53 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B A, B, C, D, A, B, C, D, ...
size of available memory
54 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B C A, B, C, D, A, B, C, D, ...
size of available memory
55 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B D A, B, C, D, A, B, C, D, ...
size of available memory
56 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B D A, B, C, D, A, B, C, D, ...
size of available memory
57 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B D A, B, C, D, A, B, C, D, ...
size of available memory
58 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size
A B D A, B, C, D, A, B, C, D, ...
size of available memory
59 FIFO . First-in, first-out: evict oldest page . As compe ve as LRU, but performs miserably in prac ce! – Ignores locality – Suffers from Belady’s anomaly: • More memory can mean more paging! – LRU & similar algs. do not • Stack algorithms – more memory means ≥ hits
60 Virtual Memory in Reality . Implemen ng exact LRU . Approxima ng LRU – Hardware Support – Clock – Segmented queue . Mul programming – Global LRU – Working Set
61 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A, B, C, B, C, C, D
62 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A A, B, C, B, C, C, D 1
63 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A B A, B, C, B, C, C, D 1 2
64 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A B C A, B, C, B, C, C, D 1 2 3
65 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A B C A, B, C, B, C, C, D 1 4 3
66 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A B C A, B, C, B, C, C, D 1 4 5
67 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
A B C A, B, C, B, C, C, D 1 4 6
68 Implemen ng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used
DA B C A, B, C, B, C, C, D 17 4 6
LRU page
How should we implement this?
69 Implemen ng Exact LRU . Could keep pages in order – op mizes evic on – Priority queue: update = O(log n), evic on = O(log n)
. Op mize for common case! – Common case: hits, not misses – Hash table: update = O(1), evic on = O(n)
70 Cost of Maintaining Exact LRU . Hash tables: too expensive – On every reference: • Compute hash of page address • Update me stamp – Unfortunately: 10x – 100x more expensive!
71 Cost of Maintaining Exact LRU . Alterna ve: doubly-linked list – Move items to front when referenced – LRU items at end of list – S ll too expensive • 4-6 pointer updates per reference
. Can we do be er?
72 Virtual Memory in Reality . Implemen ng exact LRU . Approxima ng LRU – Hardware Support – Clock – Segmented queue . Mul programming – Global LRU – Working Set
73 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits
A B C A, B, C, B, C, C, D 1 1 1
74 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits
A B C A, B, C, B, C, C, D 0 0 0
reset reference bits
75 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits
A B C A, B, C, B, C, C, D 0 1 0
76 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits
A B C A, B, C, B, C, C, D 0 1 1
77 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits
A B C A, B, C, B, C, C, D 0 1 1
78 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits – Evict page with reference bit = 0
DA B C – A, B, C, B, C, C, D 01 1 1 . Cost per miss = O(n)
79 Virtual Memory in Reality . Implemen ng exact LRU . Approxima ng LRU – Hardware Support – Clock – Segmented queue . Mul programming – Global LRU – Working Set
80 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
81 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
82 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
83 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 0 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
84 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
85 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
86 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
87 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 01 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
88 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
89 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
90 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
91 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
92 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
93 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 DG – If reference bit = 1, 01 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G
94 Segmented Queue . Real systems: segment queue into two – approximate for frequently-referenced pages • e.g., first 1/3 page frames – fast – exact LRU for infrequently-referenced pages • last 2/3 page frames; doubly-linked list – precise . How do we move between the two?
clock exact LRU
95 VM in the Real World . Implemen ng exact LRU . Approxima ng LRU – Hardware Support – Clock – Segmented queue . Mul programming – Global LRU – Working Set
96 Mul programming & VM . Mul ple programs compete for memory – Processes move memory from and to disk – Pages needed by one process may get squeezed out by another process – thrashing - effec ve cost of memory access = cost of disk access = really really bad . Must balance memory across processes – avoid thrashing
97 Global LRU . Put all pages from all procs in one pool – Manage with LRU (Segmented Queue) – Used by Linux, BSD, etc. . Advantages: – Easy . Disadvantages: – Many
98 Global LRU Disadvantages . No isola on between processes – One process touching many pages can force another process’ pages to be evicted . Priority ignored, or inverted – All processes treated equally . Greedy (or wasteful) processes rewarded – Programs with poor locality squeeze out those with good locality – Result: more page faults
99 Global LRU Disadvantages . “Sleepyhead” problem – Intermi ent, important process – Every me it wakes up – no pages! – back to sleep... – Think ntpd . Suscep ble to denial of service – Non-paying “guest”, lowest priority, marches over lots of pages – gets all available memory
. Alterna ves? – Pinning?
100 The End . If me: Enhancing clock and CRAMM
101 Enhancing Clock . Recall: don’t write back unmodified pages – Idea: favor evic on of unmodified pages – Extend hardware to keep another bit: modified bit . Total order of tuples: (ref bit, mod bit) – (0,0), (0,1), (1,0), (1,1) – Evict page from lowest nonempty class
102 Replacement, Enhanced Clock . OS scans at most three mes – Page (0,0) – replace that page – Page (0,1) – write out page, clear mod bit – Page (1,0), (1,1) – clear reference bit . Passes: – all pages (0,0) or (0,1) – all pages (0,1) - write out pages – all pages (0,0) – replace any page
. Fast, but s ll coarse approxima on of LRU
103 Working Set . Denning: Only run processes whose working set fits in RAM – Other processes: deac vate (suspend) . Classical defini on: working set = pages touched in last τ references . Provides isola on – Process’s reference behavior only affects itself
104 Working Set Problems . Algorithm relies on key parameter, τ – How do we set τ? – Is there one correct τ? • Different processes have different mescales over which they touch pages . Not acceptable (or necessarily possible) to suspend processes altogether
. Not really used – Very rough variant used in Windows
105 Solu on: CRAMM . New VM management alg: Coopera ve Robust Automa c Memory Management [OSDI 2006, Yang et al.] . Redefine working set size = pages required to spend < n% me paging – CRAMM default = 5%
106 Calcula ng WSS w.r.t 5% Memory reference sequence mnbakcl mndbkcl mndecl mndef gnef hgf hgi hji kji kjl mkl mnl mnc nkc kcl mkl mnl mn n
LRU Queue mnbakcl mnakcl mnkcl mnkcl mnkcjl ji hi hg gf ef de dc b a Pages in Least Recently 1 14 Used order
0 0 0 0 43210 0 0 0 0 0 0 10 0 0 Hit Histogram Associated with each LRU posi on 5 Fault Curve faults ∞ 1 1 faulti = ∑hist [i + ] i 1 4 11 14 pages
107 Compu ng hit histogram . Not possible in standard VM: – Global LRU queues – No per process/file informa on or control • Difficult to es mate app’s WSS / available memory
. CRAMM VM: – Per process/file page management: • Page list: Ac ve, Inac ve, Evicted • Add & maintain histogram
108 Managing pages per process
Pages protected by turning Pages evicted to disk. Active off permissions (minor fault) (major fault) (CLOCK) Inactive (LRU) Evicted (LRU) Major fault Header
Page Des Refill & Adjustment Evicted
AVL node Minor fault
faults
Pages Histogram
109 Controlling overhead
Pages protected by turning Pages evicted to disk. Active off permissions (minor fault) (major fault) (CLOCK) Inactive (LRU) Evicted (LRU)
Header
Page Des
AVL node
Buffer
faults control the boundary: 1% of execution time Pages Histogram
110 Compe ve Analysis . I removed this slide because I don’t get it. This is the worst case analysis…
. Instead of worst-case, Compare replacement policy (OPT) – How much worse is algorithm than op mal?
. Result: LRU & FIFO both “k-compe ve” – k = size of queue – Can incur k mes more misses than OPT
111 FIFO & Belady’s Anomaly
112 LRU: No Belady’s Anomaly
. Why no anomaly for LRU?
113