
Last time… • Prefetching – Speculate future I & d accesses and fetch them into caches CSE 490/590 Computer Architecture • Hardware techniques – Stream buffer – Prefetch-on-miss – One Block Lookahead Address Translation and Protection – Strided • Software techniques Steve Ko – Prefetch instruction Computer Sciences and Engineering – Loop interchange University at Buffalo – Loop fusion – Cache tiling CSE 490/590, Spring 2011 CSE 490/590, Spring 2011 2 Memory Management Absolute Addresses • From early absolute addressing schemes, to modern virtual memory systems with support for virtual machine monitors EDSAC, early 50’s • Only one program ran at a time, with unrestricted • Can separate into orthogonal functions: access to entire machine (RAM + I/O devices) – Translation (mapping of virtual address to physical address) • Addresses in a program depended upon where the – Protection (permission to access word in memory) program was to be loaded in memory – Virtual memory (transparent extension of memory space using slower disk storage) • Problems? • But most modern systems provide support for all the above functions with a single page-based system CSE 490/590, Spring 2011 3 CSE 490/590, Spring 2011 4 Dynamic Address Translation Simple Base and Bound Translation Motivation Segment Length Bound Bounds In the early machines, I/O operations were slow and each word transferred involved the CPU Register ≤ Violation? Higher throughput if CPU and I/O of 2 or more Physical programs were overlapped. prog1 current Address How?⇒ multiprogramming Load X Effective segment Address + Location-independent programs Programming and storage management ease Base Memory Physical ⇒ need for a base register Register Base Physical Address Physical Memory Memory Physical Program Protection prog2 Address Independent programs should not affect Space each other inadvertently ⇒ need for a bound register Base and bounds registers are visible/accessible only when processor is running in the supervisor mode CSE 490/590, Spring 2011 5 CSE 490/590, Spring 2011 6 C 1 Memory Fragmentation Paged Memory Systems • Processor-generated address can be split into: Users 4 & 5 Users 2 & 5 free arrive leave page number offset OS OS OS Space Space Space • A page table contains the physical address of the base of 16K user 1 16K user 1 user 1 16K each page: user 2 24K user 2 24K 24K 1 user 4 16K 24K user 4 16K 0 0 0 8K 8K 1 1 Physical user 3 32K user 3 32K user 3 32K 2 2 Memory 3 3 3 24K user 5 24K 24K Address Space Page Table 2 of User-1 of User-1 As users come and go, the storage is “fragmented”. Therefore, at some stage programs have to be moved Page tables make it possible to store the pages of a program non-contiguously. around to compact the storage. CSE 490/590, Spring 2011 7 CSE 490/590, Spring 2011 8 Private Address Space per User Where Should Page Tables Reside? OS • Space required by the page tables (PT) is User 1 VA1 pages proportional to the address space, number of Page Table users, ... ⇒ Space requirement is large User 2 VA1 ⇒ Too expensive to keep in registers Page Table • Idea: Keep PTs in the main memory User 3 VA1 – needs one reference to retrieve the page base address Physical Memory Memory Physical and another to access the data word Page Table free ⇒ doubles the number of memory references! • Each user has a page table • Page table contains an entry for each user page CSE 490/590, Spring 2011 9 CSE 490/590, Spring 2011 10 Page Tables in Physical Memory CSE 490/590 Administrivia PT • Midterm on Friday, 3/4 User • Project 1 deadline: Friday, 3/11 1 • Project 2 list will be up soon VA1 PT • Guest lectures possibly this month User User 1 Virtual 2 • Quiz will be distributed Monday Address Space Physical Memory Physical VA1 User 2 Virtual Address Space CSE 490/590, Spring 2011 11 CSE 490/590, Spring 2011 12 C 2 A Problem in the Early Sixties Manual Overlays • Assume an instruction can address all • There were many applications whose data the storage on the drum could not fit in the main memory, e.g., payroll 40k bits – Paged memory system reduced fragmentation but still required the whole program to be resident in • Method 1: programmer keeps track of main the main memory addresses in the main memory and initiates an I/O transfer when required 640k bits – Difficult, error-prone! • Programmers moved the data back and forth drum from the secondary store by overlaying it • Method 2: automatic initiation of I/O Central Store transfers by software address translation repeatedly on the primary store Ferranti Mercury – Brooker’s interpretive coding, 1960 1956 – Inefficient! tricky programming! Not just an ancient black art, e.g., IBM Cell microprocessor used in Playstation-3 has explicitly managed local store! CSE 490/590, Spring 2011 13 CSE 490/590, Spring 2011 14 Demand Paging in Atlas (1962) Hardware Organization of Atlas Effective 16 ROM pages system code “A page from secondary 0.4 ~1 sec Address Initial µ (not swapped) storage is brought into the Address 2 subsidiary pages primary storage whenever Decode system data PARs 1.4 µsec (not swapped) it is (implicitly) demanded 48-bit words 0 by the processor.” Main Drum (4) 512-word pages 8 Tape decks Tom Kilburn Primary 32 pages 192 pages 88 sec/word 32 Pages 1.4 µsec 1 Page Address 512 words/page 31 Register (PAR) <effective PN , status> Primary memory as a cache per page frame for secondary memory Secondary Central (Drum) Compare the effective page address against all 32 PARs User sees 32 x 6 x 512 words Memory 32x6 pages match ⇒ normal access of storage no match ⇒ page fault save the state of the partially executed instruction CSE 490/590, Spring 2011 15 CSE 490/590, Spring 2011 16 Atlas Demand Paging Scheme Caching vs. Demand Paging secondary • On a page fault: memory – Input transfer into a free page is initiated primary primary CPU cache CPU – The Page Address Register (PAR) is updated memory memory – If no free page is left, a page is selected to be replaced (based on usage) – The replaced page is written on the drum Caching Demand paging cache entry page frame » to minimize drum latency effect, the first empty cache block (~32 bytes) page (~4K bytes) page on the drum was selected cache miss rate (1% to 20%) page miss rate (<0.001%) cache hit (~1 cycle) page hit (~100 cycles) – The page table is updated to point to the new cache miss (~100 cycles) page miss (~5M cycles) location of the page on the drum a miss is handled a miss is handled in hardware mostly in software CSE 490/590, Spring 2011 17 CSE 490/590, Spring 2011 18 C 3 Acknowledgements • These slides heavily contain material developed and copyright by – Krste Asanovic (MIT/UCB) – David Patterson (UCB) • And also by: – Arvind (MIT) – Joel Emer (Intel/MIT) – James Hoe (CMU) – John Kubiatowicz (UCB) • MIT material derived from course 6.823 • UCB material derived from course CS252 CSE 490/590, Spring 2011 19 C 4 .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-