Virtual Memory Input/Output Admin Review: Extending the Memory

Review: Extending the Memory Hierarchy Very fast 1ns clock P Multiple Instructions per cycle SRAM, Fast, Small $ HW manages Expensive Virtual Memory movement Input/Output DRAM, Slow, Big,Cheap Memory (called physical or main CPS 104 SW manages memory) Lecture 21 movement Magnetic, Really Slow, Really Big, Really Cheap © Alvin R. Lebeck CPS 104 3 Admin Review: Virtual Memory Reading • Provides illusion of very large memory ? Sum of the memory of many jobs greater than physical memory Chapter 8 Input Output (primarily 8.3 and 8.5) ? Address space of each job larger than physical memory Appendix A.8 SPIM I/O • Good utilization of available (fast and expensive) physical memory. • Simplifies memory management: code and data movement, protection, ... (main reason today) • Exploits memory hierarchy to keep average access time low. • Involves at least two storage levels: main and secondary Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address in physical memory also known as “physical address” or “real address” © Alvin R. Lebeck CPS 104 2 © Alvin R. Lebeck CPS 104 4 Review: Paged Virtual Memory: Main Idea Review: Paged Virtual Memory • Divide memory (virtual and physical) into fixed size • Virtual address (232, 264) to Physical Address mapping blocks (Pages, Frames). (228) ? Pages in Virtual space. ? virtual page to physical page frame Virtual page number Offset ? Frames in Physical space. • Fixed size units for access control & translation • Make page size a power of 2: (page size = 2k) Virtual • All pages in the virtual address space are contiguous. Physical 0x1000 • Pages can be mapped into any physical Frame 0x0000 • Some pages in main memory (DRAM), some pages on 0x1000 0x6000 secondary memory (disk). 0x2000 0x9000 0x11000 © Alvin R. Lebeck CPS 104 5 © Alvin R. Lebeck CPS 104 7 Review: Paged Virtual Memory: Main Idea (Cont) Review: Virtual to Physical Address translation • All programs are written using Virtual Memory Page size: 4K Virtual Address Address Space. 31 11 0 • The hardware does on-the-fly translation between Virtual Page Number Page offset virtual and physical address spaces. • Use a Page Table to translate between Virtual and Physical addresses Need to • Translation Lookaside Buffer (TLB) expedites Page Table translate address translation every access • Must select “good” page size to minimize (instruction fragmentation and data) 27 11 0 Physical Frame Number Page offset Physical Address © Alvin R. Lebeck CPS 104 6 © Alvin R. Lebeck CPS 104 8 Review: Fast Translation: Translation Buffer Cache Indexing • Cache of translated addresses • Tag on each block • 64 entry fully associative – No need to check index or block offset Page Page • Increasing associativity shrinks index, expands tag Virtual Number offset Address Block Address v r w tag phys frame 1 2 Physical Address TAG Index Block offset . 4 . Fully Associative: No index Direct-Mapped: Large index 3 48 64x1 mux © Alvin R. Lebeck CPS 104 9 © Alvin R. Lebeck CPS 104 11 Cache Memory 102 Address Translation and Caches • Block 7 placed in 4 DM SA • Where is the TLB wrt the cache? block cache: FA 7 mod 4 7 mod 2 • What are the consequences? ? Fully associative, 0 1 2 3 0 1 2 3 0 1 2 3 direct mapped, 2-way set associative • Most of today’s systems have more than 1 cache ? S.A. Mapping = Block ? Digital 21164 had 3 levels Number Modulo Number Sets ? 2 levels on chip (8KB-data,8KB-inst,96KB-unified) Set Set ? DM = 1-way Set Assoc ? one level off chip (2-4MB) 0 1 • Cache Frame 0 1 2 3 7 • Does the OS need to worry about this? ? location in cache Main • Bit-selection Memory © Alvin R. Lebeck CPS 104 10 © Alvin R. Lebeck CPS 104 12 TLBs and Caches Aliases and Virtual Caches 264-1 Physical • aliases (sometimes CPU CPU CPU User Memory called synonyms); Stack VA VA VA Two different VA virtual addresses $ PA TB Tags $ TB map to same Tags Kernel PA VA PA Kernel physical address L2 $ • But, but... the $ TB virtual address is PA PA MEM used to index the cache MEM MEM User Overlap $ access Code/ • Could have data in with VA translation: Data two different Conventional Virtually Addressed Cache requires $ index to locations in the Organization Translate only on miss remain invariant Alias (Synonym) Problem cache across translation 0 © Alvin R. Lebeck CPS 104 13 © Alvin R. Lebeck CPS 104 15 Virtual Caches Index with Physical Portion of Address • Send virtual address to cache. Called Virtually • If index is physical part of address, can start tag Addressed Cache or just Virtual Cache vs. access in parallel with translation so that we can Physical Cache or Real Cache compare to physical tag • Avoid address translation before accessing cache ? faster hit time to cache Page Address Page Offset • Context Switches? ? Just like the TLB (flush or pid) Address Tag Index Block Offset ? Cost is time to flush + “compulsory” misses from empty cache ? Add process identifier tag that identifies process as well as address • Limits cache to page size: what if want bigger caches within process: can’t get a hit if wrong process and use same trick? • I/O must interact with cache ?Higher associativity ?Page coloring = careful selection of va->pa mapping © Alvin R. Lebeck CPS 104 14 © Alvin R. Lebeck CPS 104 16 Page Coloring for Aliases Page Coloring • HW guarantees that every cache frame holds unique • Make physical index match virtual index physical address • Behaves like virtual index cache • OS guarantee: lower n bits of virtual & physical page ? no conflicts for sequential pages numbers must have same value; if direct-mapped, • Possibly many conflicts between processes then aliases map to same cache frame ? address spaces all have same structure (stack, code, heap) ? one form of page coloring ? XOR PID with address (MIPS used variant of this) • Simple implementation Page Offset • Pick arbitrary page if necessary Page Address Address Tag Block Offset Index © Alvin R. Lebeck CPS 104 17 © Alvin R. Lebeck CPS 104 19 Virtual Memory and Physically Indexed Caches Cache Page frames • Notion of bin ? region of cache that may contain cache blocks from a page • Random vs. careful mapping Input / Output • Selection of physical page frame dictates cache index • Overall goal is to minimize cache misses © Alvin R. Lebeck CPS 104 18 Overview Why I/O? • I/O devices • Interactive Applications (keyboard, mouse, screen) ? device controller • Long term storage (files, data repository) • Device drivers • Swap for VM • Memory Mapped I/O • Many different devices • Programmed I/O ? character vs. block • Direct Memory Access (DMA) ? Networks are everywhere! 6 -9 -3 • Rotational media (disks) • 10 difference CPU (10 ) & I/O (10 ) • I/O Bus technologies • Response Time vs. Throughput ? Not always another process to execute • RAID (if time) • OS hides (some) differences in devices ? same (similar) interface to many devices • Permits many apps to share one device © Alvin R. Lebeck CPS 104 21 © Alvin R. Lebeck CPS 104 23 I/O Systems Device Drivers interrupts Processor • top-half ? API (open, close, read, write, ioctl) Cache ? I/O Control (IOCTL, device specific arguments) • bottom-half Memory Bus I/O Bridge ? interrupt handler ? communicates with device Main I/O Bus ? resumes process Memory • Must have access to user address space and device Disk Graphics Network I/O Devices Controller Controller Interface control registers => runs in kernel mode. Disk Disk Graphics Network Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap) © Alvin R. Lebeck CPS 104 22 © Alvin R. Lebeck CPS 104 24 Review: Handling an Interrupt/Exception Device Controllers User Program • Invoke specific kernel Interrupt? ld Interrupt Handler routine based on type of add Busy Done Error st interrupt mul ? interrupt/exception handler RETT beq Bus ld • Must determine what sub caused interrupt bne Service Controller deals with Routines • Clear the interrupt Command Status Data 0 mundane control Device • Return from interrupt Data 1 (e.g., position head, (RETT, MIPS = RFE) Controller error detection/correction) Data n-1 Processor communicates with Controller Device © Alvin R. Lebeck CPS 104 25 © Alvin R. Lebeck CPS 104 27 Processor <-> Device Interface Issues I/O Instructions • Interconnections Separate instructions (in,out) ? Busses • Processor interface CPU Memory ? I/O Instructions Independent I/O Bus memory ? Memory mapped I/O bus • I/O Control Structures Controller Controller ? Device Controllers ? Polling/Interrupts Device Device • Data movement ? Programmed I/O / DMA • Capacity, Access Time, Bandwidth © Alvin R. Lebeck CPS 104 26 © Alvin R. Lebeck CPS 104 28 Memory Mapped I/O Data Movement • Issue command through store instruction Physical Address • Programmed I/O • Check status with load instruction ? processor has to touch all the data ROM Caches? ? too much processor overhead RAM » for high bandwidth devices (disk, network) • DMA CPU ? processor sets up transfer(s) $ Device I/O ? DMA controller transfers data ? complicates memory system L2 $ Controller Memory Bus I/O bus Memory Bus Adapter Bridge © Alvin R. Lebeck CPS 104 29 © Alvin R. Lebeck CPS 104 31 Communicating with the processor Programmed I/O & Polling • Polling ? can waste time waiting for slow I/O device Is the ? busy wait • Advantage: CPU totally in control data ready? ? can interleave with useful work no yes • Interrupts • Disadvantage: Overhead of polling ? interrupt overhead load ? Program must perform check of device, data ? interrupt could happen anytime - asynchronous thus can’t do useful work ? no busy wait store data done? no yes © Alvin R. Lebeck CPS 104 30 © Alvin R. Lebeck CPS 104 32 Programmed I/O & Interrupt Driven Data Transfer SPIM (Future Homework): Interrupt Handler CPU add (1) I/O $ sub • MIPS/SPIM program Device interrupt user and program or • Use memory-mapped I/O L2 $ Controller nop • Use interrupts Memory Bus (2) save PC I/O bus Memory Bus Adapter (3) interrupt • Program should: service addr ? Accept keyboard input read store interrupt » interrupts service ..

Load more