Design Decisions Page Size Page Fault Rate Resident Set

Käyttöjärjestelmät, Luento 9 WEEK 5 Page fault rate • More small pages to the same memory space • References from large pages Virtual memory algorithms and more probable to go to a page not yet in memory examples • References from small pages often to other pages -> often used pages selected to the Stallings, Chapter 8 memory • Page size closer to process size • Too small working set size -> larger page fault rate • Large working set size -> nearly all pages in the memory 1 4 Algorithms for memory Design decisions management • Virtual memory or not? • Fetch policy • Paging or segmentation? • Demand paging vs prepaging • What algorithms? • Placement policy CONSIDERING ONLY PAGING: • Replacement policy • Resident set management, selecting the page to be replaced • What page size? • Methods: OPT, LRU, FIFO, random … • Cleaning policy • Demand cleaning vs precleaning, buffering • Load control • Working set size 2 5 Page size Resident set mgt • Number of page frames allocated for a process (resident set • Reduce internal fragmentation ĺ small size) • Fixed: how large, identical for all? • Reduce page table size ĺ large • Variable: based on what? Working set? • Proportional (1x, 2x, …) to disk block size • Replacement policy: • Global: over all pages (all processes) • Optimal value is different for different programs • Local: just the pages of this process • Increase TLB hit ratio ĺ large • Size of the resident set: • Too small: higher multprocessing level, more page faults, trashing • Different page size for different applications? • Too large: smaller CPU utilisation, wasted memory • How does MMU know the page size? Tbl 8.2 [Stal05] 3 6 Syksy 2007, Tiina Niklander 9-1 Käyttöjärjestelmät, Luento 9 Resident set management Clock • Replace the page where the use bit is zero, while scanning reset the use bit (esim. PFF) • MMU sets the use bit to Tbl 8.4 [Stal05] 1 every time a memory reference to that page 7 is made 10 Replacement policy Working set • OPT – optimal algorithm Fig 8.19 [Stal05] • Nothing is better than this, unrealistic to be implemented, • Clairvoyant, knows the future references and choses based on them • LRU, Least recently used • High amount of bookkeeping • Modification: NRU – Not recently used, easier to implement • Only maintain information about recent usage (for example on used bit) • FIFO, First-in-first-out • Easy to implement, but not very useful • Modification: second chance – do not replace page that was just recently used • Clock • Metrics? Number of page faults or page fault rate 8 11 Korvauspolitiikka Working set • Set of pages used during last k references (window size) • Pages not in the working set are candidates to be released or replaced • Suitable size for the working set? Estimate based on page fault rate 1 £ W (t,D) £ min(D,n) 9 12 Syksy 2007, Tiina Niklander 9-2 Käyttöjärjestelmät, Luento 9 Dynamically adjusting resident set UNIX/Solaris: data structures • PFF – Page Fault Frequency Page table - one for each process Tbl 8.5 [Stal05] - one entry for each logical page • VSWS – Variable-interval Sampled Working Set Location in the main memory? P P Pages of P Pages of P … P’s page X = 7; X = 7 … P&Q: write Q X = 5 Q Q’s page X = 5 Pages of Q Pages of Q 13 16 UNIX/4BSD: (core map) Swap area, VM backup store (Fig 4-33 [Tane01]) 1 KB sijainti levyllä? kenen? segmentti? globaali 16 B poistoalg. Fixed allocation – Dynamic allocation – For the whole process only for the pages that in advance are swapped out 14 (Fig 10-16 [Tane01])17 Kaksiviisarinen Clock Fronthand: set Reference = 0 Backhand: if (Reference == 0) UNIX / Solaris (+4BSD) page out Pages not used during the MUISTINHALLINTA sweep are assumed not to be in the working sets of the processes. Suits better for large memories. Speed of the hands (frame/sec)? Scanrate Gap between the hands? (Fig 8.23 [Stal05]) Handspread 15 18 Syksy 2007, Tiina Niklander 9-3 Käyttöjärjestelmät, Luento 9 UNIX/Solaris: Linux: paging Kernel memory allocation • Architecture independent • Alpha: 64b addresses, missing • Based on paging full support for 3 levels, x86 • Locking pages to memory • Page size 8KB, offset 13 bits ”value 0” • Kernel allocates and deallocates all the time (large number of) • x86: 32b address, only small areas and buffers 2-levels, • For example: proc, vnode, file descriptor • Page size 4KB, offset 12 bits Several p. One page • Kernel need to allocate and deallocate fragments of pages (not • 3-level page table • page directory (PD), 1 page only full pages) (Fig 8.25 [Stal05]) • Dynamic partitioning for allocations within one page • page middle directory (PMD), multiple pages • page table (PT), multiple pages • => Lazy Buddy System • Delay combining until there are ’too many’ fragments of certain size • Page directory always in memory • This the page table address given to MMU at process switch 19 • Other pages can be on disk 22 Linux: address translation • Split address to 4 elements • index to page directory (PD-offset) • index to page middle directory (PMD-offset) Linux • index to page table (PT-offset) • offset MUISTINHALLINTA PMD_base = PD_base + PD[PD_offset] PT_base = PMD_base + PMD[PMD_offset] Page_base = PT_base + PT[PT_offset] Ch 8.4 [Stal 05] fyys.os = Page-base + offset • X86-trick • Page middle directory (PMD) reduced to one item, which compiler optimizes away. • MMU assumes that the page directory refers directly to the page table (PT) 20 • Suitable to all architectures that support only two-level paging 23 Linux: memory management Linux: placement (allocation) • 32-bittisissä architectures • Reserves a consecutive area (region) for consecutive • 3 GB reserved for user processes pages • 1 GB for kernel (including the page tables) • at the beginning of the physical memory • More efficient fetching and storing with disks • Kernel mode allowed to make references to the full 4 GB • Optimize the disk utilisation, with the cost of memory • Kernel’s memory allocation based on slabs allocation • Only for the kernel, efficient allocation of small memory areas, avoids allocating full pages • Buddy System: 1, 2, 4, 8, 16 or 32 page frames • Allocates pages in large groups, increases internal fragmentation • Implementation: table with list of different sizes unallocated page regions 21 24 Syksy 2007, Tiina Niklander 9-4 Käyttöjärjestelmät, Luento 9 Linux: replacement Slap allocation • Globaali Clock using M-bit + sort of LRU • Counter of dynamically allocated page frames • 8 bit counter of age vrt. U-bitti • MMU increases the counter at each page reference • Background process (kswapd) reduces the counter each second • if too few page frames free, deallocated unused frames • Uses pagebuffer • To store deallocated pages for a short duration 25 Fig 9.27 – Silberschatz, Galvin & Gagne: 28 Operating system concepts with Java Linux http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html demand paging, levy page cache page cache read-ahead swap_out active_list swap cache age up (+3) (age > 0, in use) refill_inactive_scan Windows 2000 age down (/2) inactive_dirty_list (age = 0, not used MUISTINHALLINTA clean or dirty) kreclaimd page_launder alloc inactive_clean_list Ch 8.5 [Stal 05] kswapd (age = 0, not used, Ch 11.5 [Tane 01] clean) ”It does not seem possible to understand the VM system in a modular bdflush way. The interfaces between, say, the zone allocator and the swap policy are many and varied… What a nightmare” (Joe Knapka) 26 29 ”0x0000????:” W2K Linux: kernel memory allocation • 32-bit addresses • Kernel memory locked to the memory => 4 GB virt. os. • page size 4K - 64K • Dynamic allocation based on pages Pentium: 4KB • Dynamically loadable modules fully to the kernel area Alpha: 8K • buddy system on the pages kmalloc() ”-1” • Additional need for small, short term memory allocations -> use slab • One page ’cut’ to smaller areas (slabs) (Fig 8.26 [Stal05]) • buddy system within each page • x86: 32, 64, 128, 252, 508, 2040, 4080 tavua ks. (page size 4KB) Fig. 11.23 [Tane 01] ”0xFFFF????:” 27 30 Syksy 2007, Tiina Niklander 9-5 Käyttöjärjestelmät, Luento 9 W2K: Paging W2K: replacement • No segmentation • Balance set manager -daemon • Each second 1 sec check if enough free frames • Variable resident set size per process • Working set manager • Start with a fixed set of frames • Running when needed • If large number of free frames, process can get more frames • Dynamic working set size • A greedy one is not allowed to allocate last 512 frames • Released page frames of processes • If the free memory reduces, the resident set of processes can be • Starts with large idle processes decreased • Swapper-daemon • Demand fetch (no prefetching) • every 4 sec: search for processes, whose threads (all of them) have been • Disk I/O using larger blocks passive for long duration • 1-8 pages at each time • Parts of the OS and buffer pool locked to memory • Caching of released pages 31 34 W2K: paging (Fig 11.26 [Tane01]) W2K: page frames • not used Pentium Fig 11.24 [Tane01] (seur. kalvo) (Fig 11.27 [Tane01]) 32 35 disk W2K mapped modified page W2K reference page writer Modified writer address zero page thread space WS dealloc Free (idle proc) copy Standby on Zeroed swapper normal write? use thread stack 4 s. page fault Fig 11.27 [Tane01] not in swap area ”The code is so complex that the designers loathe [Tane01] to touch parts of it for fear of breaking something Fig 11.28 [Tane01] somewhere” Andrew Tanenbaum [Tane01, p. 823] 33 36 Syksy 2007, Tiina Niklander 9-6.

Design Decisions Page Size Page Fault Rate Resident Set

Fluidmem: Open Source Full Memory Disaggregation

Virtual Memory

Ret2dir: Rethinking Kernel Isolation Vasileios P

System Analysis and Tuning Guide Opensuse Leap 42.1 System Analysis and Tuning Guide Opensuse Leap 42.1

Protecting Commodity Operating Systems Through Strong Kernel Isolation

Practice Problems: Memory

Chapter 8 Virtual Memory Roadmap

Chapter 8 Chapter 8 Virtual Memory

An Analysis of How Static and Shared Libraries Affect Memory Usage on an IP-STB

Working Sets Past and Present

Computer Systems II

Section 9: Paging