Operating Systems Principles Virtual Memory and Paging

4/12/2016 Virtual Memory and Paging 6A. Introduction to Swapping and Paging 6B. Paging MMUs and Demand Paging Operating Systems Principles 6C. Replacement Algorithms Virtual Memory and Paging 6D. Thrashing and Working Sets Mark Kampe 6E. Other optimizations ([email protected]) Virtual Memory and Paging 2 Memory Management Memory Management Goals 1. allocate/assign physical memory to processes 1. transparency – explicit requests: malloc (sbrk) – process sees only its own virtual address space – implicit: program loading, stack extension – process is unaware memory is being shared 2. manage the virtual address space 2. efficiency – instantiate virtual address space on context switch – high effective memory utilization – extend or reduce it on demand – low run-time cost for allocation/relocation 3. manage migration to/from secondary storage 3. protection and isolation – optimize use of main storage – private data will not be corrupted – minimize overhead (waste, migrations) – private data cannot be seen by other processes Virtual Memory and Paging 3 Virtual Memory and Paging 4 Primary and Secondary Storage Why we swap • primary = main (executable) memory • make best use of a limited amount of memory – primary storage is expensive and very limited – process can only execute if it is in memory – only processes in primary storage can be run – can’t keep all processes in memory all the time • secondary = non-executable (e.g. Disk/SSD) – if it isn't READY, it doesn't need to be in memory – blocked processes can be moved to secondary storage – swap it out and make room for other processes – swap out code, data, stack and non-resident context • improve CPU utilization – make room in primary for other "ready" processes – when there are no READY processes, CPU is idle • returning to primary memory – CPU idle time means reduced system throughput – process is copied back when it becomes unblocked – more READY processes means better utilization Virtual Memory and Paging 5 Virtual Memory and Paging 6 1 4/12/2016 scheduling states with swapping Pure Swapping • each segment is contiguous exit running – in memory, and on secondary storage create request – all in memory, or all on swap device swapping takes a great deal of time allocate • blocked ready – transferring entire data (and text) segments swap out swap in • swapping wastes a great deal of memory – processes seldom need the entire segment swapped allocate swap out wait • variable length memory/disk allocation – complex, expensive, external fragmentation Virtual Memory and Paging 7 Virtual Memory and Paging 8 paged address translation Paging and Fragmentation process virtual address space CODE DATA STACK a segment is implemented as a set of virtual pages • internal fragmentation – averages only ½ page (half of the last one) • external fragmentation – completely non-existent (we never carve up pages) physical memory Virtual Memory and Paging 9 Virtual Memory and Paging 10 Paging Memory Management Unit Paging Relocation Examples virtual address physical address virtual address physical address page # offset page # offset 000400000005 1C083E280100 0C20041F 1C080100 offset within page remains the same. page fault V page # V 0C20 virtual page # is V page # V 0105 used as an index V page # V 00A1 into the page table 0 0 V page # V 041F selected entry valid bit is checked 0 0 to ensure that this contains physical V page # V 0D10 virtual page # is page number. legal. V page # V 0AC3 page table page table Virtual Memory and Paging 11 Virtual Memory and Paging 12 2 4/12/2016 Demand PagingC1 Demand Paging – advantages • paging MMU supports not present pages • improved system performance – generates a fault/trap when they are referenced – fewer in-memory pages per process – OS can bring in page, retry the faulted reference – more processes in primary memory • entire process needn’t be in memory to run • more parallelism, better throughput • better response time for processes already in memory – start each process with a subset of its pages – less time required to page processes in and out – load additional pages as program demands them • fewer limitations on process size • they don't need all the pages all the time – process can be larger than physical memory – code execution exhibits reference locality – process can have huge (sparse) virtual space – data references exhibit reference locality Virtual Memory and Paging 13 Virtual Memory and Paging 14 Page Fault Handling Demand Paging and Performance • initialize page table entries to not present • page faults hurt performance increased overhead • CPU faults when invalid page is referenced – • additional context switches and I/O operations 1. trap forwarded to page fault handler – reduced throughput 2. determine which page, where it resides • processes are delayed waiting for needed pages 3. find and allocate a free page frame • key is having the "right" pages in memory 4. block process, schedule I/O to read page in – right pages -> few faults, little overhead/delay – wrong pages -> many faults, much overhead/delay 5. update page table point at newly read-in page • we have little control over what we bring in 6. back up user-mode PC to retry failed instruction – we read the pages the process demands 7. unblock process, return to user-mode • key to performance is which pages we evict Virtual Memory and Paging 15 Virtual Memory and Paging 16 Belady's Optimal Algorithm Approximating Optimal Replacement • Q: which page should we replace? • note which pages have recently been used A: the one we won't need for the longest time – use this data to predict future behavior • Why is this the right page? • Possible replacement algorithms – random, FIFO: straw-men ... forget them – it delays the next page fault as long as possible – minimum number of page faults per unit time • Least Recently Used – assert near future will be like recent past • How can we predict future references? • programs do exhibit temporal and spatial locality – Belady cannot be implemented in a real system • if we haven’t used it recently, we probably won’t soon – but we can run implement it for test data streams – we don’t have to be right 100% of the time – we can compare other algorithms against it • the more right we are, the more page faults we save Virtual Memory and Paging 17 Virtual Memory and Paging 18 3 4/12/2016 Why Programs Exhibit Locality True LRU is hard to implement • Code locality • maintain this information in the MMU? – code in same routine is in same/adjacent page – MMU notes the time, every time a page is referenced – loops iterate over the same code – maybe we can get a per-page read/written bit – a few routines are called repeatedly • maintain this information in software? – intra-module calls are common – mark all pages invalid, even if they are in memory – take a fault the first time each page is referenced Stack locality • – then mark this page valid for the rest of the time slice activity focuses on this and adjacent call frames – • finding oldest page is prohibitively expensive • Data reference locality – 16GB memory / 4K page = 4M pages to scan – this is common, but not assured Virtual Memory and Paging 19 Virtual Memory and Paging 20 Practical LRU surrogates True Global LRU Replacement • must be cheap – can’t cause additional page faults reference a b c d a b d e f a b c d a e d – avoid scanning the whole page table (it is big) True LRU loads 4, replacements 7 • clock algorithms … a surrogate for LRU frame 0 a ! f d ! – organize all pages in a circular list frame 1 b ! a ! frame 2 c e c – position around the list is a surrogate for age frame 3 d ! b e – progressive scan whenever we need another page • for each page, ask MMU if page has been referenced • if so, reset the reference bit in the MMU; skip page • if not, consider this page to be the least recently used Virtual Memory and Paging 21 Virtual Memory and Paging 22 LRU Clock Algorithm Working Sets – per process LRU reference a b c d a b d e f a b c d a e d • Global LRU is probably a blunder LRU clock loads 4, replacements 7 – bad interaction with round-robin scheduling frame 0 a ! ! ! ! f d ! – better to give each process it's own page pool frame 1 b ! ! ! a ! ! do LRU replacement within that pool E – frame 2 c e b e 2 frame 3 d ! ! ! c • fixed # of pages per process is also bad clock pos 0 1 2 3 0 0 0 012 30 1 2 3 0 1 12 3 – different processes exhibit different locality which pages are needed changes over time True LRU loads 4, replacements 7 • frame 0 a a f d d • number of pages needed changes over time frame 1 b b a a – much like different natural scheduling intervals frame 2 c e c frame 3 d d b e • we clearly want dynamic working sets Virtual Memory and Paging 23 Virtual Memory and Paging 24 4 4/12/2016 “natural” working set size (Optimal Working Sets) • What is optimal working set for a process? “thrashing” in insufficient space – number of pages needed during next time slice It can’t be done! • what if try to run process in fewer pages? number of page – needed pages replace one another continuously faults the sweet little marginal benefit – this is called "thrashing" spot for additional space More, is just “more”. • how can we know what working set size is? – by observing the process behavior working set size • which pages should be in the working-set? – no need to guess, the process will fault for them Virtual Memory and Paging 25 Virtual Memory and Paging 26 Implementing Working Sets Working Set Clock Algorithm • managed working set size page frame 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 – assign

Operating Systems Principles Virtual Memory and Paging

Virtual Memory

The Case for Compressed Caching in Virtual Memory Systems

Memory Protection at Option

Virtual Memory

Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES from the WINDOWS 10 VIRTUAL STORE 2

HALO: Post-Link Heap-Layout Optimisation

Virtual Memory - Paging

Chapter 4 Memory Management and Virtual Memory Asst.Prof.Dr

X86 Memory Protection and Translation

Operating Systems

Virtual Memory in X86

A Hybrid Swapping Scheme Based on Per-Process Reclaim for Performance Improvement of Android Smartphones (August 2018)