Last time… • Prefetching – Speculate future I & d accesses and fetch them into caches CSE 490/590 Computer Architecture • Hardware techniques – Stream buffer – Prefetch-on-miss – One Block Lookahead Address Translation and Protection – Strided • Software techniques Steve Ko – Prefetch instruction Computer Sciences and Engineering – Loop interchange University at Buffalo – Loop fusion – Cache tiling
CSE 490/590, Spring 2011 CSE 490/590, Spring 2011 2
Memory Management Absolute Addresses
• From early absolute addressing schemes, to modern virtual memory systems with support for virtual machine monitors EDSAC, early 50’s
• Only one program ran at a time, with unrestricted • Can separate into orthogonal functions: access to entire machine (RAM + I/O devices) – Translation (mapping of virtual address to physical address) • Addresses in a program depended upon where the – Protection (permission to access word in memory) program was to be loaded in memory – Virtual memory (transparent extension of memory space using slower disk storage) • Problems? • But most modern systems provide support for all the above functions with a single page-based system
CSE 490/590, Spring 2011 3 CSE 490/590, Spring 2011 4
Dynamic Address Translation Simple Base and Bound Translation
Motivation Segment Length Bound Bounds In the early machines, I/O operations were slow and each word transferred involved the CPU Register ≤ Violation? Higher throughput if CPU and I/O of 2 or more Physical programs were overlapped. prog1 current Address How?⇒ multiprogramming Load X Effective segment Address + Location-independent programs Programming and storage management ease Base Memory Physical ⇒ need for a base register Register Base Physical Address
Physical Memory Memory Physical Program Protection prog2 Address Independent programs should not affect Space each other inadvertently ⇒ need for a bound register Base and bounds registers are visible/accessible only when processor is running in the supervisor mode
CSE 490/590, Spring 2011 5 CSE 490/590, Spring 2011 6
C 1 Memory Fragmentation Paged Memory Systems • Processor-generated address can be split into: Users 4 & 5 Users 2 & 5 free arrive leave page number offset OS OS OS Space Space Space • A page table contains the physical address of the base of 16K user 1 16K user 1 user 1 16K each page: user 2 24K user 2 24K 24K 1 user 4 16K 24K user 4 16K 0 0 0 8K 8K 1 1 Physical user 3 32K user 3 32K user 3 32K 2 2 Memory 3 3 3 24K user 5 24K 24K Address Space Page Table 2 of User-1 of User-1 As users come and go, the storage is “fragmented”. Therefore, at some stage programs have to be moved Page tables make it possible to store the around to compact the storage. pages of a program non-contiguously.
CSE 490/590, Spring 2011 7 CSE 490/590, Spring 2011 8
Private Address Space per User Where Should Page Tables Reside?
OS • Space required by the page tables (PT) is User 1 VA1 pages proportional to the address space, number of Page Table users, ... ⇒ Space requirement is large User 2 VA1 ⇒ Too expensive to keep in registers Page Table • Idea: Keep PTs in the main memory User 3 VA1 – needs one reference to retrieve the page base address Physical Memory Memory Physical and another to access the data word Page Table free ⇒ doubles the number of memory references! • Each user has a page table • Page table contains an entry for each user page
CSE 490/590, Spring 2011 9 CSE 490/590, Spring 2011 10
Page Tables in Physical Memory CSE 490/590 Administrivia
PT • Midterm on Friday, 3/4 User • Project 1 deadline: Friday, 3/11 1 • Project 2 list will be up soon VA1 PT • Guest lectures possibly this month User User 1 Virtual 2 • Quiz will be distributed Monday Address Space Physical Memory Memory Physical VA1
User 2 Virtual Address Space
CSE 490/590, Spring 2011 11 CSE 490/590, Spring 2011 12
C 2 A Problem in the Early Sixties Manual Overlays • Assume an instruction can address all • There were many applications whose data the storage on the drum could not fit in the main memory, e.g., payroll 40k bits – Paged memory system reduced fragmentation but still required the whole program to be resident in • Method 1: programmer keeps track of main the main memory addresses in the main memory and initiates an I/O transfer when required 640k bits – Difficult, error-prone! • Programmers moved the data back and forth drum from the secondary store by overlaying it • Method 2: automatic initiation of I/O Central Store transfers by software address translation repeatedly on the primary store Ferranti Mercury – Brooker’s interpretive coding, 1960 1956 tricky programming! – Inefficient! Not just an ancient black art, e.g., IBM Cell microprocessor used in Playstation-3 has explicitly managed local store!
CSE 490/590, Spring 2011 13 CSE 490/590, Spring 2011 14
Demand Paging in Atlas (1962) Hardware Organization of Atlas
Effective 16 ROM pages system code “A page from secondary 0.4 ~1 sec Address Initial µ (not swapped) storage is brought into the Address 2 subsidiary pages primary storage whenever Decode system data PARs 1.4 µsec (not swapped) it is (implicitly) demanded 48-bit words 0 by the processor.” Main Drum (4) 512-word pages 8 Tape decks Tom Kilburn Primary 32 pages 192 pages 88 sec/word 32 Pages 1.4 µsec 1 Page Address 512 words/page 31 Register (PAR)
CSE 490/590, Spring 2011 15 CSE 490/590, Spring 2011 16
Atlas Demand Paging Scheme Caching vs. Demand Paging
secondary • On a page fault: memory
– Input transfer into a free page is initiated primary primary CPU cache CPU – The Page Address Register (PAR) is updated memory memory – If no free page is left, a page is selected to be replaced (based on usage) – The replaced page is written on the drum Caching Demand paging cache entry page frame » to minimize drum latency effect, the first empty cache block (~32 bytes) page (~4K bytes) page on the drum was selected cache miss rate (1% to 20%) page miss rate (<0.001%) cache hit (~1 cycle) page hit (~100 cycles) – The page table is updated to point to the new cache miss (~100 cycles) page miss (~5M cycles) location of the page on the drum a miss is handled a miss is handled in hardware mostly in software
CSE 490/590, Spring 2011 17 CSE 490/590, Spring 2011 18
C 3 Acknowledgements
• These slides heavily contain material developed and copyright by – Krste Asanovic (MIT/UCB) – David Patterson (UCB) • And also by: – Arvind (MIT) – Joel Emer (Intel/MIT) – James Hoe (CMU) – John Kubiatowicz (UCB)
• MIT material derived from course 6.823 • UCB material derived from course CS252
CSE 490/590, Spring 2011 19
C 4