Chapter 19: Translation Lookaside Buffer (Tlb)

TCSS 422 A – Spring 2018 5/21/2018 Institute of Technology TCSS 422: OPERATING SYSTEMS OBJECTIVES Assignment 3 – Page Table Walker Three Easy Pieces: Review Quiz #5 –Memory Virtualization Translation Lookaside Buffer, Paging – Smaller Tables Memory Virtualization Wrap-up: Translation Lookaside Buffer – Ch. 19 Paging – Smaller Tables – Ch. 20 Wes J. Lloyd Beyond Physical Memory – Ch. 21/22 Institute of Technology University of Washington - Tacoma TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 May 21, 2018 L14.2 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma FEEDBACK – 5/16 How to determine whether a cache lookup is a hit or a miss? For example: for a TLB lookup? CHAPTER 19: How do you get a miss or a hit in a page table? TRANSLATION LOOKASIDE BUFFER (TLB) TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.3 May 21, 2018 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma L14.4 OBJECTIVES TLB EXAMPLE - 3 Chapter 19 .TLB Algorithm For the accesses: a[0], a[1], a[2], a[3], a[4], .TLB Tradeoffs a[5], a[6], a[7], a[8], a[9] How many are hits? .TLB Context Switch How many are misses? What is the hit rate? (%) . 70% (3 misses one for each VP, 7 hits) TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.5 May 21, 2018 L14.6 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.1 TCSS 422 A – Spring 2018 5/21/2018 Institute of Technology EXAMPLE: LARGE ARRAY ACCESS TLB EXAMPLE IMPLEMENTATIONS Example: Consider an array of a custom struct where each Intel Nehalem microarchitecture 2008 – multi level TLBs struct is 64-bytes. Consider sequential access for an . First level TLB: array of 8,192 elements stored contiguously in memory: separate cache for data (DTLB) and code (ITLB) . Second level TLB: shared TLB (STLB) for data and code 64 structs per 4KB page . Multiple page sizes (4KB, 2MB) 128 total pages . Page Size Extension (PSE) CPU flag TLB caches stores a maximum of 64 - 4KB page lookups for larger page sizes Intel Haswell microarchitecture 22nm 2013 How many hits vs. misses for sequential array iteration? . Two level TLB . 1 miss for every 64 array accesses, 63 hits . Three page sizes (4KB, 2MB, 1GB) . Complete traversal: 128 total misses, 8,064 hits (98.4% hit Without large page sizes consider ratio) the # of TLB entries to address 1.9 MB of memory… TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.7 May 21, 2018 L14.8 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma HW CACHE TRADEOFF HANDLING TLB MISS Speed vs. size Historical view Speed Size CISC – Complex instruction set computer In order to be fast, caches must be small .Intel x86 CPUs Too large of a cache will mimic physical memory Limitations for on chip memory .Traditionally have provided on CPU HW instructions and handling of TLB misses .HW has a page table register to store location of page table Dwight on “tradeoffs” TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.9 May 21, 2018 L14.10 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma HANDLING TLB MISS - 2 TLB CONTENTS RISC – Reduced instruction set computer TLB typically may have 32, 64, or 128 entries . ARM CPUs HW searches the entire TLB in parallel to find the translation . Traditionally the OS handles TLB misses Other bits . HW raises an exception . Valid bit: valid translation? . Trap handler is executed to handle the miss . Protection bit: read/execute, read/write . Address-space identifier: identify entries by process Advantages . Dirty bit . HW Simplicity: simply needs to raise an exception . Flexibility: OS provided page table implementations can use different data structures, etc. TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.11 May 21, 2018 L14.12 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.2 TCSS 422 A – Spring 2018 5/21/2018 Institute of Technology TLB: ON CONTEXT SWITCH TLB: CONTEXT SWITCH - 2 TLB stores address translations for current running process Address space identifier (ASID): enables TLB data to persist A context/switch to a new process invalidates the TLB during context switches – also can support virtual machines Must “switch” out the TLB TLB flush . Flush TLB on context switches, set all entries to 0 . Requires time to flush . TLB must be reloaded for each C/S . If process not in CPU for long, the TLB may not get reloaded Alternative: be lazy… . Don’t flush TLB on C/S . Share TLB across processes during C/S . Use address space identifier (ASID) to tag TLB entries by process TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.13 May 21, 2018 L14.14 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma SHARED MEMORY SPACE CACHE REPLACEMENT POLICIES When TLB cache is full, how add a new address translation to the TLB? Observe how the TLB is loaded / unloaded… When processes share a code page Goal minimize miss rate, increase hit rate .Shared libraries ok Least Recently Used (LRU) .Code pages typically are RX, Sharing of pages is useful as it reduces the . Evict the oldest entry not RWX number of physical pages in use. Random policy . Pick a candidate at random to free-up space in the TLB TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.15 May 21, 2018 L14.16 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma LEAST RECENTLY USED CHAPTER 20: RED – miss PAGING: WHITE – hit SMALLER TABLES For 3-page TLB, observe replacement 11 TLB miss, 5 TLB hit TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.17 May 21, 2018 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma L14.18 Slides by Wes J. Lloyd L14.3 TCSS 422 A – Spring 2018 5/21/2018 Institute of Technology OBJECTIVES LINEAR PAGE TABLES Chapter 20 Consider array-based page tables: . Each process has its own page table .Smaller tables . 32-bit process address space (up to 4GB) . With 4 KB pages .Hybrid tables . 20 bits for VPN . 12 bits for the page offset .Multi-level page tables TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.19 May 21, 2018 L14.20 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma LINEAR PAGE TABLES - 2 LINEAR PAGE TABLES - 2 Page tables stored in RAM Page tables stored in RAM Support potential storage of 220 translations Support potential storage of 220 translations = 1,048,576 pages per process @ 4 bytes/page = 1,048,576 pages per process @ 4 bytes/page Page table size 4MB / process Page table size 4MB / process Page tables are too big and consume too much memory. Need Solutions … Consider 100+ OS processes Consider 100+ OS processes . Requires 400+ MB of RAM to store process information . Requires 400+ MB of RAM to store process information TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.21 May 21, 2018 L14.22 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma PAGING: USE LARGER PAGES PAGE TABLES: WASTED SPACE Larger pages = 16KB = 214 Process: 16KB Address Space w/ 1KB pages 32-bit address space: 232 Page Table 218 = 262,144 pages Memory requirement cut to ¼ However pages are huge Internal fragmentation results 16KB page(s) allocated for small programs with only a few variables TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.23 May 21, 2018 L14.24 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma Slides by Wes J. Lloyd L14.4 TCSS 422 A – Spring 2018 5/21/2018 Institute of Technology PAGE TABLES: WASTED SPACE MULTI-LEVEL PAGE TABLES Process: 16KB Address Space w/ 1KB pages Consider a page table: Page Table 32-bit addressing, 4KB pages 220 page table entries Even if memory is sparsely populated the per process page table requires: Most of the page table is unused and full of wasted space. (73%) Often most of the 4MB per process page table is empty Page table must be placed in 4MB contiguous block of RAM MUST SAVE MEMORY! TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.25 May 21, 2018 L14.26 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma MULTI-LEVEL PAGE TABLES - 2 MULTI-LEVEL PAGE TABLES - 2 Add level of indirection, the “page directory” Add level of indirection, the “page directory” Two level page table: 220 pages addressed with two level-indexing (page directory index, page table index) TCSS422: Operating Systems [Spring 2018] TCSS422: Operating Systems [Spring 2018] May 21, 2018 L14.27 May 21, 2018 L14.28 Institute of Technology, University of Washington - Tacoma Institute of Technology, University of Washington - Tacoma MULTI-LEVEL PAGE TABLES - 3 EXAMPLE Advantages 16KB address space, 64byte pages . Only allocates page table space in proportion to the How large would a one-level page table need to be? address space actually used 214 (address space) / 26 (page size) = 28 = 256 (pages) .

Chapter 19: Translation Lookaside Buffer (Tlb)

Computer Organization and Architecture Designing for Performance Ninth Edition

Intermediate X86 Part 2

Virtual Memory - Paging

X86 Memory Protection and Translation

Virtual Memory in X86

Operating Systems

Lecture 15 15.1 Paging

Chapter 3 Protected-Mode Memory Management

X86 Memory Protection and Translation

Darwin: Mac OS X's Core OS

Mac OS X Server

IA-32 Intel Architecture Software Developer's