Operang Systems CMPSCI 377 & Paging Emery Berger and Mark Corner University of Massachuses Amherst Virtual vs. Physical Memory . Apps don’t access physical memory – Well, not directly . Apps use virtual memory – Addresses start at 0 – One level of indirecon – Address you see is not “real” address

. Any ideas why?

2 Memory Pages . Programs use memory as individual bytes . OS manages groups of bytes: pages – typically 4kB, 8kB – Why? (think Tetris with all squares) – Applies this to virtual and physical memory • Physical pages usually called frames

A Mapping Virtual to Physical

. Note this is simplified and the data here includes the heap, not the typical data segment…

4 Why Virtual Memory? . Why? – Simpler • Everyone gets illusion of whole address space – Isolaon • Every protected from every other – Opmizaon • Reduces space requirements

5 Typical Virtual Memory Layout . Some things grow – Must leave room! . Mmap and heap spaces – Mmap increases mmap – Brk increases heap . Other layouts possible Quick Quiz! . Are arrays connuous in memory? – Physical memory? – Virtual memory?

. Where does the code for a program live? . Programs issue loads and stores . What kind of addresses are these? . MMU Translates virtual to physical addresses – Maintains table (big hash table): – Almost always in HW… Why?

Virtual Physical Program Address MMU Address Memory

Page Table Page Tables . Table of translaons – virtual pages -> physical pages . One per process . One page table entry per virtual page . How? – Programs issue virtual address – Find virtual page (how?) – Lookup physical page, add offset Page Table Entries . Do all virtual pages -> physical page? – Valid and Invalid bits . PTEs have lots of other informaon – For instance some pages can only be read Address Translaon . Powers of 2: – Virtual address space: size 2^m – Page size 2^n . Page#: High m-n bits of virtual address . Lower n bits select offset in page

11 Quick Acvity . How much mem does a page table need? – 4kB pages, 32 bit address space – page table entry (PTE) uses 4 bytes . 2^32/2^12*4=2^22 bytes=4MB – Is this a problem? – Isn’t this per process? – What about a 64 bit address space? . Any ideas how to fix this? Mul-Level Page Tables . Use a mul-level page table

A A

Level 0 Table Level 1 Table

A

Level 1 Table Quick Acvity . How much mem does a page table need? – 4kB pages, 32 bit address space – Two level page table – 20bits = 10 bits each level – page table entry (PTE) uses 4 bytes – Only first page of program is valid . 2^10*4+2^10*4=2^13 bytes=8kB

. Isn’t this slow? Translaon Lookaside Buffer (TLB) . TLB: fast, fully associave memory – Caches page table entries – Stores page numbers (key) and frame (value) in which they are stored . Assumpon: locality of reference – Locality in memory accesses = locality in address translaon . TLB sizes: 8 to 2048 entries – Powers of 2 simplifies translaon of virtual to physical addresses

15 Virtual Memory is an Illusion! . How much memory does a process have? – Do all processes have this? . Key idea: use RAM as cache for disk – OS transparently moves pages . Requires locality: – must fit in RAM • memory referenced recently – If not: thrashing (nothing but disk traffic) Paging

17 Paging + Locality . Most programs obey 90/10 “rule” – 90% of me spent accessing 10% of A memory A . Exploit this rule: B – Only keep “live” parts B of process in memory

18 Key Policy Decisions . Two key quesons: (for any cache): – When do we read page from disk? – When do we write page to disk?

19 Reading Pages . Read on-demand: – OS loads page on its first reference – May force an evicon of page in RAM – Pause while loading page = page fault . Can also perform pre-paging: – OS guesses which page will next be needed, and begins loading it . Most systems just do . What about writes?

20 Demand Paging . On every reference, check if page is in memory (resident bit in page table) – Who is doing this? . If not: trap to OS – How does this work in HW? . OS: – Selects vicm page to be replaced – Writes vicm page if necessary, marks non-resident – Begins loading new page from disk – OS can switch to another process • more on this later

21 Swap Space . Swap space = where vicm pages go – Paron or special file reserved on disk

. Size of reserved swap space limits what?

22 Tricks with Page Tables . Do all pages of memory end up in swap? . Parts of address space mapped into files – see man pages for mmap Overview . A Day in the Life of a Page – Allocaon – Use – Evicon – Reuse . Terms: Resident and Non-resident

24 A Day in the Life of a Page

 Allocate some memory

char * x = new char[16];

0x40001000

0x40001040 → 0x4000104F

virtual memory layout

25 A Day in the Life of a Page

 Update page tables

char * x = new char[16];

0x40001000

0x40001040 → 0x4000104F

virtual physical memory memory layout layout

26 A Day in the Life of a Page

 Write contents – dirty page

strcpy(x, “hello”);

0x40001000

0x40001040 → 0x4000104F

virtual physical memory memory layout layout

27 A Day in the Life of a Page

 Other processes fill up memory…

virtual physical memory memory layout layout

28 A Day in the Life of a Page

 Forcing our page to be evicted (paged out)

virtual physical swap memory memory space layout layout (disk)

29 A Day in the Life of a Page

 Now page nonresident & protected

virtual physical swap memory memory space layout layout (disk)

30 A Day in the Life of a Page

 Touch page – swap it in

y[0] = x[0];

0x40001000

0x40001040 → 0x4000104F

virtual physical swap memory memory space layout layout (disk)

31 A Day in the Life of a Page

 Touch page – swap it in

y[0] = x[0];

0x40001000

0x40001040 → 0x4000104F

virtual physical swap memory memory space layout layout (disk)

32 Tricks with Page Tables: Sharing . Paging allows sharing of memory across processes – Reduces memory requirements . Shared stuff includes code, data – Code typically R/O

33 Tricks with Page Tables: COW . Copy on write (COW) – Just copy page tables – Make all pages read-only . What if process changes mem?

. All processes are created this way! Allocang Pages . ulmately from sbrk or mmap . Sbrk increases # of valid pages – Increases the heap . Mmap maps address space to file – Increases the mmap space . Oddity: – Allocators can use either mmap or brk to get pages – You will use mmap

. What does mmap /dev/zero mean? – Think about COW Overview

. Replacement policies – Comparison

36 Cost of Paging . Usually in algorithms, we pick algorithm with best asymptoc worst-case – Paging: worst-case analysis useless! – Easy to construct adversary: every page requires page fault

A, B, C, D, E, F, G, H, I, J, A...

size of available memory

37 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

A A, B, C, D, E, F, G, H, I, J, A...

size of available memory

38 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

A B A, B, C, D, E, F, G, H, I, J, A...

size of available memory

39 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

A B C A, B, C, D, E, F, G, H, I, J, A...

size of available memory

40 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

A B C D A, B, C, D, E, F, G, H, I, J, A...

size of available memory

41 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

A B C D E A, B, C, D, E, F, G, H, I, J, A...

size of available memory

42 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

F B C D E A, B, C, D, E, F, G, H, I, J, A...

size of available memory

43 Cost of Paging . Worst-case analysis – useless – Easy to construct adversary example: every page requires page fault

F G H I J A, B, C, D, E, F, G, H, I, J, A...

size of available memory

44 Opmal Replacement (MIN/OPT) . Evict page accessed furthest in future – Opmal page replacement algorithm • Invented by Belady (“MIN”), a.k.a. “OPT” . Provably opmal policy – Just one small problem... Requires predicng the future – Useful point of comparison • How far from opmal

45 Quick Acvity: OPT

sequence of page accesses

contents of page frames . Page faults: 5

46 Least-Recently Used (LRU) . Evict page not used in longest me (least-recently used) – Approximates OPT • If recent past ≈ predictor of future – Variant used in all real operang systems

47 Quick Acvity: LRU example

. Page faults: ?

48 LRU example

. Page faults: 5

49 LRU, example II

. Page faults: ?

50 LRU, example II

. Page faults: 12! – Loop: well-known worst-case for LRU

51 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A, B, C, D, A, B, C, D, ...

size of available memory

52 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A A, B, C, D, A, B, C, D, ...

size of available memory

53 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B A, B, C, D, A, B, C, D, ...

size of available memory

54 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B C A, B, C, D, A, B, C, D, ...

size of available memory

55 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B D A, B, C, D, A, B, C, D, ...

size of available memory

56 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B D A, B, C, D, A, B, C, D, ...

size of available memory

57 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B D A, B, C, D, A, B, C, D, ...

size of available memory

58 Most-Recently Used (MRU) . Evict most-recently used page . Shines for LRU’s worst-case: loop that exceeds RAM size

A B D A, B, C, D, A, B, C, D, ...

size of available memory

59 FIFO . First-in, first-out: evict oldest page . As compeve as LRU, but performs miserably in pracce! – Ignores locality – Suffers from Belady’s anomaly: • More memory can mean more paging! – LRU & similar algs. do not • Stack algorithms – more memory means ≥ hits

60 Virtual Memory in Reality . Implemenng exact LRU . Approximang LRU – Hardware Support – Clock – Segmented queue . Mulprogramming – Global LRU – Working Set

61 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A, B, C, B, C, C, D

62 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A A, B, C, B, C, C, D 1

63 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A B A, B, C, B, C, C, D 1 2

64 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A B C A, B, C, B, C, C, D 1 2 3

65 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A B C A, B, C, B, C, C, D 1 4 3

66 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A B C A, B, C, B, C, C, D 1 4 5

67 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

A B C A, B, C, B, C, C, D 1 4 6

68 Implemenng Exact LRU . On each reference, me stamp page . When we need to evict: select oldest page = least-recently used

DA B C A, B, C, B, C, C, D 17 4 6

LRU page

How should we implement this?

69 Implemenng Exact LRU . Could keep pages in order – opmizes evicon – Priority queue: update = O(log n), evicon = O(log n)

. Opmize for common case! – Common case: hits, not misses – Hash table: update = O(1), evicon = O(n)

70 Cost of Maintaining Exact LRU . Hash tables: too expensive – On every reference: • Compute hash of page address • Update me stamp – Unfortunately: 10x – 100x more expensive!

71 Cost of Maintaining Exact LRU . Alternave: doubly-linked list – Move items to front when referenced – LRU items at end of list – Sll too expensive • 4-6 pointer updates per reference

. Can we do beer?

72 Virtual Memory in Reality . Implemenng exact LRU . Approximang LRU – Hardware Support – Clock – Segmented queue . Mulprogramming – Global LRU – Working Set

73 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits

A B C A, B, C, B, C, C, D 1 1 1

74 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits

A B C A, B, C, B, C, C, D 0 0 0

reset reference bits

75 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits

A B C A, B, C, B, C, C, D 0 1 0

76 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits

A B C A, B, C, B, C, C, D 0 1 1

77 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits

A B C A, B, C, B, C, C, D 0 1 1

78 Hardware Support . Maintain reference bits for every page – On each access, set reference bit to 1 – Page replacement algorithm periodically resets reference bits – Evict page with reference bit = 0

DA B C – A, B, C, B, C, C, D 01 1 1 . Cost per miss = O(n)

79 Virtual Memory in Reality . Implemenng exact LRU . Approximang LRU – Hardware Support – Clock – Segmented queue . Mulprogramming – Global LRU – Working Set

80 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

81 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

82 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 1 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

83 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 1 – Checks reference bit of next frame A C – If reference bit = 0, 0 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

84 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 1 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

85 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 1 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

86 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame A C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

87 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 01 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

88 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle B . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

89 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 0 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

90 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

91 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

92 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 0 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 D – If reference bit = 1, 0 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

93 The Clock Algorithm . Variant of FIFO & LRU . Keep frames in circle BF . On page fault, OS: 01 – Checks reference bit of next frame AE C – If reference bit = 0, 0 01 replace page, set bit to 1 DG – If reference bit = 1, 01 set bit to 0, advance pointer to next frame A, B, C, D, B, C, E, F, C, G

94 Segmented Queue . Real systems: segment queue into two – approximate for frequently-referenced pages • e.g., first 1/3 page frames – fast – exact LRU for infrequently-referenced pages • last 2/3 page frames; doubly-linked list – precise . How do we move between the two?

clock exact LRU

95 VM in the Real World . Implemenng exact LRU . Approximang LRU – Hardware Support – Clock – Segmented queue . Mulprogramming – Global LRU – Working Set

96 Mulprogramming & VM . Mulple programs compete for memory – Processes move memory from and to disk – Pages needed by one process may get squeezed out by another process – thrashing - effecve cost of memory access = cost of disk access = really really bad . Must balance memory across processes – avoid thrashing

97 Global LRU . Put all pages from all procs in one pool – Manage with LRU (Segmented Queue) – Used by , BSD, etc. . Advantages: – Easy . Disadvantages: – Many

98 Global LRU Disadvantages . No isolaon between processes – One process touching many pages can force another process’ pages to be evicted . Priority ignored, or inverted – All processes treated equally . Greedy (or wasteful) processes rewarded – Programs with poor locality squeeze out those with good locality – Result: more page faults

99 Global LRU Disadvantages . “Sleepyhead” problem – Intermient, important process – Every me it wakes up – no pages! – back to sleep... – Think ntpd . Suscepble to denial of service – Non-paying “guest”, lowest priority, marches over lots of pages – gets all available memory

. Alternaves? – Pinning?

100 The End . If me: Enhancing clock and CRAMM

101 Enhancing Clock . Recall: don’t write back unmodified pages – Idea: favor evicon of unmodified pages – Extend hardware to keep another bit: modified bit . Total order of tuples: (ref bit, mod bit) – (0,0), (0,1), (1,0), (1,1) – Evict page from lowest nonempty class

102 Replacement, Enhanced Clock . OS scans at most three mes – Page (0,0) – replace that page – Page (0,1) – write out page, clear mod bit – Page (1,0), (1,1) – clear reference bit . Passes: – all pages (0,0) or (0,1) – all pages (0,1) - write out pages – all pages (0,0) – replace any page

. Fast, but sll coarse approximaon of LRU

103 Working Set . Denning: Only run processes whose working set fits in RAM – Other processes: deacvate (suspend) . Classical definion: working set = pages touched in last τ references . Provides isolaon – Process’s reference behavior only affects itself

104 Working Set Problems . Algorithm relies on key parameter, τ – How do we set τ? – Is there one correct τ? • Different processes have different mescales over which they touch pages . Not acceptable (or necessarily possible) to suspend processes altogether

. Not really used – Very rough variant used in Windows

105 Soluon: CRAMM . New VM management alg: Cooperave Robust Automac Memory Management [OSDI 2006, Yang et al.] . Redefine working set size = pages required to spend < n% me paging – CRAMM default = 5%

106 Calculang WSS w.r.t 5% Memory reference sequence mnbakcl mndbkcl mndecl mndef gnef hgf hgi hji kji kjl mkl mnl mnc nkc kcl mkl mnl mn n

LRU Queue mnbakcl mnakcl mnkcl mnkcl mnkcjl ji hi hg gf ef de dc b a Pages in Least Recently 1 14 Used order

0 0 0 0 43210 0 0 0 0 0 0 10 0 0 Hit Histogram Associated with each LRU posion 5 Fault Curve faults ∞ 1 1 faulti = ∑hist [i + ] i 1 4 11 14 pages

107 Compung hit histogram . Not possible in standard VM: – Global LRU queues – No per process/file informaon or control • Difficult to esmate app’s WSS / available memory

. CRAMM VM: – Per process/file page management: • Page list: Acve, Inacve, Evicted • Add & maintain histogram

108 Managing pages per process

Pages protected by turning Pages evicted to disk. Active off permissions (minor fault) (major fault) (CLOCK) Inactive (LRU) Evicted (LRU) Major fault Header

Page Des Refill & Adjustment Evicted

AVL node Minor fault

faults

Pages Histogram

109 Controlling overhead

Pages protected by turning Pages evicted to disk. Active off permissions (minor fault) (major fault) (CLOCK) Inactive (LRU) Evicted (LRU)

Header

Page Des

AVL node

Buffer

faults control the boundary: 1% of execution time Pages Histogram

110 Compeve Analysis . I removed this slide because I don’t get it. This is the worst case analysis…

. Instead of worst-case, Compare replacement policy (OPT) – How much worse is algorithm than opmal?

. Result: LRU & FIFO both “k-compeve” – k = size of queue – Can incur k mes more misses than OPT

111 FIFO & Belady’s Anomaly

112 LRU: No Belady’s Anomaly

. Why no anomaly for LRU?

113