Memory Management Outline

F Processes 4 F Memory Management – Basic 4 – Paging 4 Operating Systems – ¬

Virtual Memory

Motivation

Page out F Logical address space larger than physical F Less I/O needed A1 memory A1 A2 A3 F Less memory needed A3 – “Virtual Memory” F Faster response B1 B2 – on special disk F More users B1 F Abstraction for programmer F No pages in memory F Performance ok? initially Main Memory – Error handling not used – Pure demand paging – Maximum arrays

Paging Implementation Fault Validation F Page not in memory Bit – interrupt OS => page fault F Page 0 0 1 v OS looks in table: 0 – invalid reference? => abort Page 1 1 0 i 1 Page 0 0 3 – not in memory? => bring it in Page 2 2 3 v 2 F 1 Get empty frame (from list) Page 3 3 0 i F Swap page into frame 3 Page 2 2 Logical F Reset tables (valid bit = 1) Memory Physical Memory F Restart instruction

1 Performance Example Performance of Demand Paging F memory access time = 100 nanoseconds F swap fault overhead = 25 msec Page Fault Rate F page fault rate = 1/1000 0 < p < 1.0 (no page faults to every is fault) F EAT = (1-p) * 100 + p * (25 msec) Effective Access Time = (1-p) * 100 + p * 25,000,000 = (1-p) (memory access) + p (Page Fault Overhead) = 100 + 24,999,900 * p F Page Fault Overhead = 100 + 24,999,900 * 1/1000 = 25 microseconds! = swap page out + swap page in + restart F Want less than 10% degradation 110 > 100 + 24,999,900 * p 10 > 24,999,9000 * p p < .0000004 or 1 fault in 2,500,000 accesses!

Page Replacement Page Replacement “Dirty” Bit - avoid page out F Page fault => What if no free frames? 0 1 v

– terminate user (ugh!) (0) 1 02 iv (4) – swap out process (reduces degree of multiprog) Page 0 2 3 v – replace other page with needed page 0 Page 1 3 0 i (1) 1 Page 0 F Page replacement: 0 3 Page 2 Page Table 2 victim – if free frame, use it 1 – use algorithm to select victim frame Page 3 (3) 0 1 vi (2) 3 Page 2 2 – write page to disk, changing tables Logical Memory 1 0 i Physical – read in new page Memory – restart process Page Table

Page Replacement Algorithms First-In-First-Out (FIFO) 1,2,3,4,1,2,5,1,2,3,4,5

F Every system has its own 1 4 5 3 Frames / Process 9 Page Faults F Want lowest page fault rate 2 1 3 F Evaluate by running it on a particular string 3 2 4 of memory references (reference string) and computing number of page faults 1 5 4 4 Frames / Process 10 Page Faults! F Example: 1,2,3,4,1,2,5,1,2,3,4,5 2 1 5 3 2 Belady’s Anomaly

4 3

2 Optimal Least Recently Used

vs. F Replace the page that has not been used for the longest period of time F Replace the page that will not be used for the longest period of time 1,2,3,4,1,2,5,1,2,3,4,5

1,2,3,4,1,2,5,1,2,3,4,5 1 5 1 4 4 Frames / Process 2 2 8 Page Faults 6 Page Faults 3 5 4 3 4 3 No Belady’s Anomoly 4 5 How do we know this? Use as benchmark - “Stack” Algorithm - N frames subset of N+1

LRU Implementation LRU Approximations

F Counter implementation F LRU good, but hardware support expensive – every page has a counter; every time page is F Some hardware support by reference bit referenced, copy clock to counter – with each page, initially = 0 – when a page needs to be changed, compare the – when page is referenced, set = 1 counters to determine which to change – replace the one which is 0 (no order) F Stack implementation – enhance by having 8 bits and shifting – keep a stack of page numbers – approximate LRU – page referenced: move to top – no search needed for replacement

Second-Chance Second-Chance

(a) (b) F FIFO replacement, but …

– Get first in FIFO 1 1 0 1 – Look at reference bit 0 2 0 2 u bit == 0 then replace Next u bit == 1 then set bit = 0, get next in FIFO 1 3 0 3 Vicitm F If page referenced enough, never replaced 1 4 0 4 F Implement with circular queue If all 1, degenerates to FIFO

3 Enhanced Second-Chance Counting Algorithms F 2-bits, reference bit and modify bit F Keep a counter of number of references F (0,0) neither recently used nor modified – LFU - replace page with smallest count – best page to replace u if does all in beginning, won’t be replaced F (0,1) not recently used but modified u decay values by shift – needs write-out – MFU - smallest count just brought in and will F (1,0) recently used but clean probably be used – probably used again soon F Not too common (expensive) and not too F (1,1) recently used and modified good – used soon, needs write-out F Circular queue in each class -- (Macintosh)

Page Buffering Allocation of Frames

F Pool of frames F How many fixed frames per process? – start new process immediately, before writing old F Two allocation schemes: u write out when system idle – fixed allocation – list of modified pages – priority allocation u write out when system idle – pool of free frames, remember content u page fault => check pool

Fixed Allocation Priority Allocation

F Equal allocation F Use a proportional scheme based on priority – ex: 93 frames, 5 procs = 18 per proc (3 in pool) F If process generates a page fault F Proportional Allocation – select replacement a process with lower – number of frames proportional to size priority – ex: 64 frames, s1 = 10, s2 = 127 F “Global” versus “Local” replacement u f1 = 10 / 137 x 64 = 5 – local consistent (not influenced by others) u f2 = 127 / 137 x 64 = 59 – global more efficient (used more often) F Treat processes equal

4 Thrashing Thrashing

F If a process does not have “enough” pages, the page-fault rate is very high – low CPU utilization – OS thinks it needs increased multiprogramming CPU – adds another procces to system utilization F Thrashing is when a process is busy swapping pages in and out degree of muliprogramming

Cause of Thrashing Working-Set Model

F Why does paging work? F window W = a fixed number of – Locality model page references u process migrates from one locality to another – total number of pages references in time T u localities may overlap F D = sum of size of W’s F Why does thrashing occur? – sum of localities > total memory size F How do we fix thrashing? – Working Set Model – Page Fault Frequency

Working Set Example Page Fault Frequency increase number of F T = 5 frames F 1 2 3 2 3 1 2 4 3 4 7 4 3 3 4 1 1 2 2 2 1 upper bound

lower bound W={1,2,3} W={3,4,7} W={1,2} decrease Page Fault Rate – if T too small, will not encompass locality number of frames – if T too large, will encompass several localities Number of Frames

– if T => infinity, will encompass entire program F Establish “acceptable” page-fault rate F if D > m => thrashing, so suspend a process – If rate too low, process loses frame F Modify LRU appx to include Working Set – If rate too high, process gains frame

5 Prepaging Page Size F Old - Page size fixed, New -choose page size

F Pure demand paging has many page faults F How do we pick the right page size? Tradeoffs: initially – Fragmentation – use working set – Table size – does cost of prepaging unused frames outweigh – Minimize I/O cost of page-faulting? u transfer small (.1ms), latency + seek time large (10ms) – Locality u small finer resolution, but more faults – ex: 200K process (1/2 used), 1 fault / 200k, 100K faults/1 byte F Historical trend towards larger page sizes – CPU, mem faster proportionally than disks

Program Structure Program Structure int A[1024][1024]; F consider: for (i=0; i<1024; i++) int A[1024][1024]; for (j=0; j<1024; j++) for (j=0; j<1024; j++) A[i][j] = 0; for (i=0; i<1024; i++) A[i][j] = 0; F 1024 page faults F suppose: F stack vs. hash table – process has 1 frame F Compiler – 1 row per page – separate code from data – => 1024x1024 page faults! – keep routines that call each other together F LISP (pointers) vs. Pascal (no-pointers)

Priority Processes Real-Time Processes

F Consider F Real-time – low priority process faults, – bounds on delay u bring page in – hard-real time: systems crash, lives lost – low priority process in ready queue for awhile, u air-traffic control, factor automation waiting while high priority process runs – soft-real time: application sucks – high priority process faults u audio, video u low priority page clean, not used in a while F Paging adds unexpected delays => perfect! – don’t do it F Lock-bit (like for I/O) until used once – lock bits for real-time processes

6 Virtual Memory and WinNT Virtual Memory and WinNT F Page Replacement Algorithm – FIFO F Shared pages – Missing page, plus adjacent pages – level of indirection for easier updates F Working set – same virtual entry – default is 30 F Page File – take victim frame periodically – stores only modified logical pages – if no fault, reduce set size by 1 – code and memory mapped files on disk already F Reserve pool – hard page faults – soft page faults

Virtual Memory and Application Performance Studies F Regions of virtual memory and – paging disk (normal) Demand Paging in Windows NT – file (text segment, memory mapped file) F New Virtual Memory Mikhail Mikhailov – exec() creates new page table Saqib Syed Ganga Kannan – fork() copies page table Divya Prakash u reference to common pages Mark Claypool Sujit Kumar u if written, then copied David Finkel F Page Replacement Algorithm WPI BMC Software, Inc. – second chance (with more bits)

Capacity Planning Then and Now Experiment Design

F Capacity Planning in the good old days F System F Experiments – used to be just mainframes – Pentium 133 MHz – Page Faults – NT Server 4.0 – Caching – simple CPU-load based queuing theory – 64 MB RAM – Unix – IDE NTFS F Analysis F Capacity Planning today – perfmon – distributed systems F clearmem – networks of workstations – Windows NT – MS Exchange, Lotus Notes

7 Page Fault Method Soft or Hard Page Faults?

F “Work hard” F Run lots of applications, open and close F All local access, not over network

Page Metrics with Caching On Caching and Prefetching Hit Return Read Exit button 4 KB F Start process Start Hit Return – wait for “Enter” button Read F Start perfmon 4 KB

F Hit “Enter” Exit F Read 1 4-K page F Exit F Repeat

8