Introduction to Paging Wes J

TCSS 422 A – Winter 2018 2/28/2018 Institute of Technology TCSS 422: OPERATING SYSTEMS OBJECTIVES Quiz 4 – Active Reading Chapter 19 Homework 2 Questions Paging, Homework 3 Questions Translation Lookaside Buffer, and Smaller Tables Ch. 18 . Introduction to Paging Wes J. Lloyd Ch. 19 . Translation Lookaside Buffer (TLB) Institute of Technology Ch. 20 University of Washington - Tacoma . Smaller Tables TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 February 28, 2018 L14.2 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma FEEDBACK FROM 2/23 There was no feedback !!! CHAPTER 18: INTRODUCTION TO PAGING TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.3 February 28, 2018 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma L14.4 PAGING DESIGN QUESTIONS WHERE ARE PAGE TABLES STORED? Where are page tables stored? Example: . Consider a 32-bit process address space (up to 4GB) What are the typical contents of the page table? . With 4 KB pages . 20 bits for VPN (220 pages) How big are page tables? . 12 bits for the page offset (212 unique bytes in a page) Page tables for each process are stored in RAM Does paging make the system too slow? . Support potential storage of 220 translations = 1,048,576 pages per process . Each page has a page table entry size of 4 bytes TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.5 February 28, 2018 L14.6 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.1 TCSS 422 A – Winter 2018 2/28/2018 Institute of Technology PAGE TABLE EXAMPLE NOW FOR AN ENTIRE OS With 220 slots in our page table for a single process If 4 MB is required to store one process Each slot dereferences a VPN VPN0 Consider how much memory is required for an entire OS? VPN1 . With for example 100 processes… Provides physical frame number VPN 2 Page table memory requirement is now 4MB x 100 = 400MB Each slot requires 4 bytes (32 bits) … . 20 for the PFN on a 4GB system with 4KB pages … If computer has 4GB memory (maximum for 32-bits), . 12 for the offset which is preserved the page table consumes 10% of memory . (note we have no status bits, so this is VPN1048576 unrealistically small) 400 MB / 4000 GB How much memory to store page table for 1 process? Is this efficient? . 4,194,304 bytes (or 4MB) to index one process TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.7 February 28, 2018 L14.8 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma COUNTING MEMORY ACCESSES VISUALIZING MEMORY ACCESSES: FOR THE FIRST 5 LOOP ITERATIONS Example: Use this Array initialization Code Locations: . Page table . Array . Code 50 accesses for 5 loop Assembly equivalent: iterations TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.9 February 28, 2018 L14.10 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma OBJECTIVES Chapter 19 CHAPTER 19: .TLB Algorithm TRANSLATION .TLB Tradeoffs LOOKASIDE BUFFER (TLB) .TLB Context Switch TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 February 28, 2018 L14.12 Institute of Technology, University of Washington - Tacoma L14.11 Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.2 TCSS 422 A – Winter 2018 2/28/2018 Institute of Technology TRANSLATION LOOKASIDE BUFFER TRANSLATION LOOKASIDE BUFFER - 2 Legacy name… Goal: Reduce access to the page Better name, “Address Translation Cache” tables Example: 50 RAM accesses TLB is an on CPU cache of address translations for first 5 for-loop .virtual physical memory iterations Move lookups from RAM to TLB by caching page table entries TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.13 February 28, 2018 L14.14 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma TRANSLATION LOOKASIDE BUFFER (TLB) TRANSLATION LOOKASIDE BUFFER (TLB) Part of the CPU’s Memory Management Unit (MMU) Part of the CPU’s Memory Management Unit (MMU) Address translation cache Address translation cache The TLB is an address translation cache Different than L1, L2, L3 CPU memory caches TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.15 February 28, 2018 L14.16 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma TLB BASIC ALGORITHM TLB BASIC ALGORITHM - 2 For: array based page table Hardware managed TLB CheckGenerateExtract if theExtract the TLB pagephysical theholds frame virtual addressthe number translationpage to number access from for TLB memorythe VPN If TLB miss, accessRetryUpdate thethe instruction…thepg tableTLB with(in RAM) (requerythe translationto find the address TLB) translation TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.17 February 28, 2018 L14.18 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.3 TCSS 422 A – Winter 2018 2/28/2018 Institute of Technology TLB – ADDRESS TRANSLATION CACHE TLB EXAMPLE Key detail: For a TLB miss, we first access the page table in RAM to populate the TLB… we then requery the TLB Example: Program address space: 256-byte All address translations go through the TLB . Addressable using 8 total bits (28) . 4 bits for the VPN (16 total pages) Page size: 16 bytes . Offset is addressable using 4-bits Store an array: of (10) 4-byte integers TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.19 February 28, 2018 L14.20 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma TLB EXAMPLE - 2 TLB EXAMPLE - 3 Consider the code above: For the accesses: a[0], a[1], a[2], a[3], a[4], Initially the TLB does not know where a[] is a[5], a[6], a[7], a[8], a[9] Consider the accesses: a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7], How many are hits? a[8], a[9] How many are misses? How many pages are accessed? What is the hit rate? (%) What happens when accessing a page not . 70% (3 misses one for each VP, 7 hits) in the TLB? TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.21 February 28, 2018 L14.22 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma TLB EXAMPLE - 4 TLB TRADEOFFS Page size Larger page sizes increase the probability of a TLB hit What factors affect the hit/miss rate? Example: 16-bytes (very small), 4096-bytes (common) . Page size . Data locality Larger sizes increase memory requirement of offset . Temporal locality TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.23 February 28, 2018 L14.24 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.4 TCSS 422 A – Winter 2018 2/28/2018 Institute of Technology TLB TRADEOFFS - 2 TLB TRADEOFFS - 3 Spatial locality Temporal locality Accessing addresses local to each other improves the hit rate. Higher cache hit ratios are expected for repeated memory accesses close in time Consider random vs. sequential array access Can dramatically improve performance for “second What happens when the data size exceeds the TLB size? iteration” . E.g. 1st level TLB caches 64 4KB page addresses . Single program can cache data lookups for 256 KB TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.25 February 28, 2018 L14.26 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma EXAMPLE: LARGE ARRAY ACCESS TLB EXAMPLE IMPLEMENTATIONS Example: Consider an array of a custom struct where each Intel Nehalem microarchitecture 2008 – multi level TLBs struct is 64-bytes. Consider sequential access for an . First level TLB: array of 8,192 elements stored contiguously in memory: separate cache for data (DTLB) and code (ITLB) . Second level TLB: shared TLB (STLB) for data and code 64 structs per 4KB page . Multiple page sizes (4KB, 2MB) 128 total pages . Page Size Extension (PSE) CPU flag TLB caches stores a maximum of 64 - 4KB page lookups for larger page sizes Intel Haswell microarchitecture 22nm 2013 How many hits vs. misses for sequential array iteration? . Two level TLB . 1 miss for every 64 array accesses, 63 hits . Three page sizes (4KB, 2MB, 1GB) . Complete traversal: 128 total misses, 8,064 hits (98.4% hit Without large page sizes consider ratio) the # of TLB entries to address 1.9 MB of memory… TCSS422: Operating Systems [Winter 2018] TCSS422: Operating Systems [Winter 2018] February 28, 2018 L14.27 February 28, 2018 L14.28 Institute of Technology, University of Washington - Tacoma Institute

Introduction to Paging Wes J

Computer Organization and Architecture Designing for Performance Ninth Edition

Intermediate X86 Part 2

Chapter 3 Protected-Mode Memory Management

IA-32 Intel Architecture Software Developer's

Intel® Architecture Instruction Set Extensions and Future Features

Chapter 19: Translation Lookaside Buffer (Tlb)

Extended Page Table (EPT) Published October 5, 2018 by Sinaei

Virtual Memory in Contemporary Microprocessors

Intel Processor Identification and the CPUID Instruction

Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2D: Instruction Set Reference

Operating Systems Memory Management: Paging

Chapter 4 Paging