Review You Are Here!

11/20/12 Review • ImplemenOng precise interrupts in in-order CS 61C: pipelines: – Save excepOons in pipeline unOl commit point Great Ideas in Computer Architecture – Check for traps and interrupts before commit Virtual Memory – No architectural state overwriCen before commit • Support mulOprogramming with translaon and protecOon Instructors: – Base and bound, simple scheme, suffers from memory Krste Asanovic, Randy H. Katz fragmentaon – Paged systems remove external fragmentaon but hCp://inst.eecs.Berkeley.edu/~cs61c/fa12 add indirecOon through page table Fall 2012 -- Lecture #36 1 Fall 2012 -- Lecture #36 2 You Are Here! Private Address Space per User So/ware Hardware OS • Parallel Requests User 1 VA1 Warehouse Smart pages Assigned to computer Scale Phone Page Table e.g., Search “Katz” Computer • Harness Parallel Threads Parallelism & Today’s User 2 Assigned to core Achieve High Lecture VA1 e.g., Lookup, Ads Performance Computer • Parallel InstrucOons Core … Core Page Table >1 instrucOon @ one Ome Memory (Cache) e.g., 5 pipelined instrucOons User 3 VA1 Input/Output Core • Parallel Data Memory Physical FuncOonal >1 data item @ one Ome InstrucOon Unit(s) Unit(s) Page Table e.g., Add of 4 pairs of words free A0+B0 A1+B1 A2+B2 A3+B3 • Hardware descripOons Main Memory • Each user has a page table All gates @ one Ome • Page table contains an entry for each user page • Programming Languages Logic Gates Fall 2012 -- Lecture #36 3 Fall 2012 -- Lecture #36 4 A Problem in the Early SixOes Manual Overlays • Assume an instrucOon can address all • There were many applicaons whose data the storage on the drum could not fit in the main memory, e.g., 40k bits payroll • Method 1: programmer keeps track of main – Paged memory system reduced fragmenta=on addresses in the main memory and but s=ll required the whole program to be iniOates an I/O transfer when required resident in the main memory – Difficult, error-prone! 640k bits • Method 2: automac iniOaon of I/O drum transfers by sokware address Central Store translaon Ferranti Mercury – Brooker’s interpre=ve coding, 1960 1956 – Inefficient! Not just an ancient black art, e.g., IBM Cell microprocessor used in Playsta=on-3 has explicitly managed local store! Fall 2012 -- Lecture #36 5 Fall 2012 -- Lecture #36 6 1 11/20/12 Demand Paging in Atlas (1962) Hardware Organizaon of Atlas Effective 16 ROM pages system code “A page from secondary Initial 0.4 ~1 µsec (not swapped) Address storage is brought into the Address 2 subsidiary pagessystem data primary storage whenever Decode (not swapped) PARs 1.4 µsec it is (implicitly) demanded 48-bit words 0 by the processor.” 512-word pages Main Drum (4) Primary 8 Tape Tom Kilburn 32 pages 192 pages 32 Pages decks 1 Page Address 1.4 µsec 512 words/page 31 88 sec/ Register (PAR) word <effective PN , status> Primary memory as a cache per page frame for secondary memory Secondary Central (Drum) Compare the effective page address against all 32 PARs User sees 32 x 6 x 512 words Memory 32x6 pages match ⇒ normal access of storage no match ⇒ page fault save the state of the partially executed instruction Fall 2012 -- Lecture #36 7 Fall 2012 -- Lecture #36 8 Atlas Demand Paging Scheme Recap: Typical Memory Hierarchy • Take advantage of the principle of locality to present the user • On a page fault: with as much memory as is available in the cheapest – Input transfer into a free page is iniOated technology at the speed offered by the fastest technology – The Page Address Register (PAR) is updated On-Chip Components – If no free page is lek, a page is selected to be replaced Control (based on usage) Cache Instr – The replaced page is wriCen on the drum Second Secondary Level Main Memory RegFile Memory (Disk) Datapath Cache • to minimize drum latency effect, the first empty Data Cache (SRAM) (DRAM) page on the drum was selected – The page table is updated to point to the new locaon of the page on the drum Speed (cycles): ½’s 1’s 10’s 100’s 10,000’s Size (bytes): 100’s 10K’s M’s G’s T’s Cost: highest lowest Fall 2012 -- Lecture #36 9 11/20/12 Fall 2012 -- Lecture #31 10 Modern Virtual Memory Systems Illusion of a large, private, uniform store Administrivia Protection & Privacy OS several users, each with their private • Regrade request deadline Monday Nov 26 address space and one or more shared address spaces useri – For everything up to Project 4 page table ≡ name space Swapping Demand Paging Store Provides the ability to run programs Primary larger than the primary memory Memory Hides differences in machine configurations The price is address translation on each memory reference VA mapping PA TLB Fall 2012 -- Lecture #36 11 Fall 2012 -- Lecture #36 12 2 11/20/12 CS61C in the News Hierarchical Page Table “World's oldest digital Virtual Address computer successfully 31 22 21 12 11 0 p1 p2 offset reboots” 10-bit 10-bit L1 index L2 index offset Iain Thomson Root of the Current Page Table p2 The Register, 11/20/2012 “Aker three years of restoraon by the Naonal Museum of CompuOng (TNMOC) and staff at p1 Bletchley Park, the world's oldest funcOoning digital computer has been successfully rebooted at a ceremony aended by two of its original developers. The 2.5 ton Harwell Dekatron, later renamed (Processor Level 1 the Wolverhampton Instrument for Teaching Computaon from Harwell (WITCH), was first Register) Page Table constructed in 1949 and from 1951 ran at the USK's Harwell Atomic Energy Research Establishment, Memory Physical where it was used to process mathemacal calculaons for Britain's nuclear program. Level 2 The system uses 828 flashing Dekatron valves, each capable of holding a single digit, for volale Page Tables memory, plus 480 GPO 3000 type relays to shik calculaons and six paper tape readers. It was very page in primary memory slow, taking a couple of seconds for each addiOon or subtracOon, five seconds for mulOplicaon and page in secondary memory up to 15 for division.” PTE of a nonexistent page Data Pages Fall 2012 -- Lecture #36 13 Fall 2012 -- Lecture #36 14 Two-Level Page Tables in Physical Address Translaon & ProtecOon Memory Physical Virtual Memory Virtual Address Virtual Page No. (VPN) offset Address Spaces Level 1 PT Kernel/User Mode User 1 Read/Write VA1 Protection Address Level 1 PT Check Translation User 1 User 2 Exception? Physical Address Physical Page No. (PPN) offset User2/VA1 VA1 • Every instruction and data access needs address User1/VA1 translation and protection checks User 2 A good VM design needs to be fast (~ one cycle) and Level 2 PT space efficient User 2 Fall 2012 -- Lecture #36 15 Fall 2012 -- Lecture #36 16 Translaon Lookaside Buffers (TLB) TLB Designs (really an Address Translaon Cache!) Address translation is very expensive! • Typically 32-128 entries, usually fully associave – Each entry maps a large page, hence less spaal locality across In a two-level page table, each reference pages è more likely that two entries conflict becomes several memory accesses – SomeOmes larger TLBs (256-512 entries) are 4-8 way set- associave Solution: Cache translations in TLB – Larger systems someOmes have mulO-level (L1 and L2) TLBs TLB hit Single-Cycle Translation ⇒ • Random or FIFO replacement policy TLB miss ⇒ Page-Table Walk to refill • No process informaon in TLB? virtual address VPN offset • TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB V R W D tag PPN (VPN = virtual page number) Example: 64 TLB entries, 4KB pages, one page per entry (PPN = physical page number) TLB Reach = _____________________________________________? hit? physical address PPN offset Fall 2012 -- Lecture #36 17 Fall 2012 -- Lecture #36 18 3 11/20/12 Handling a TLB Miss Flashcard Quiz: Software (MIPS, Alpha) Which statement is false? TLB miss causes an exception and the operating system walks the page tables and reloads TLB. A privileged “untranslated” addressing mode used for walk Hardware (SPARC v8, x86, PowerPC, RISC-V) A memory management unit (MMU) walks the page tables and reloads the TLB If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page-Fault exception for the original instruction Fall 2012 -- Lecture #36 20 21 Hierarchical Page Table Walk: Page-Based Virtual-Memory Machine (Hardware Page-Table Walk) SPARC v8 Page Fault? Page Fault? Virtual Address Index 1 Index 2 Index 3 Offset Protecon viola=on? Protecon viola=on? 31 23 17 11 0 Virtual Virtual Context Context Table Address Physical Address Physical Table Address Address Register L1 Table Inst. Inst. Data Data PC D Decode E + M W root ptr TLB Cache TLB Cache Context Register L2 Table Miss? Miss? PTP L3 Table Page-Table Base Register Hardware Page PTP Table Walker Physical PTE Physical Memory Controller Address Address 31 11 0 Physical Address Physical Address PPN Offset Main Memory (DRAM) MMU does this table walk in hardware on a TLB miss • Assumes page tables held in untranslated physical memory Fall 2012 -- Lecture #36 22 Fall 2012 -- Lecture #36 23 Address Translaon: pung it all together Handling VM-related traps Virtual Address Inst Inst. Data Data hardware Decode PC TLB Cache D E + M TLB Cache W TLB hardware or software software Lookup TLB miss? Page Fault? TLB miss? Page Fault? miss hit Protection violation? Protection violation? Page Table Protection • Handling a TLB miss needs a hardware or sokware Walk Check mechanism to refill TLB • Handling a page fault (e.g., page is on disk) needs a the page is ∉ memory ∈ memory denied permitted restartable trap so sokware handler can resume aer retrieving page Page Fault Update TLB Protection Physical – Precise excepOons are easy to restart (OS loads page) Fault Address – Can be imprecise but restartable, but this complicates OS (to cache) sokware Where? SEGFAULT • Handling protecOon violaon may abort process Fall 2012 -- Lecture #36 24 25 4 11/20/12 Concurrent Access to TLB & Cache Address Translaon in CPU Pipeline (Virtual Index/Physical Tag) Virtual Inst Inst.

Review You Are Here!

Data Storage the CPU-Memory

Managing the Memory Hierarchy

Technical Details of the Elliott 152 and 153

Embedded DRAM

Information Technology 3 and 4/1998. Emerging

The Memory Hierarchy

Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality

Alan Turingturing –– Computercomputer Designerdesigner

Chapter 2: Memory Hierarchy Design

NEWSLETTER Cpdtrf;Lq'tp.Nn •

Memory Hierarchy Summary

Clementina - (1961-1966) a Personal Experience