<<

Review: Extending the Memory Hierarchy

Very fast 1ns clock P Multiple Instructions per cycle SRAM, Fast, Small $ HW manages Expensive movement Input/Output DRAM, Slow, Big,Cheap Memory (called physical or main CPS 104 SW manages memory) Lecture 21 movement Magnetic, Really Slow, Really Big, Really Cheap

© Alvin R. Lebeck CPS 104 3

Admin Review: Virtual Memory

Reading • Provides illusion of very large memory ? Sum of the memory of many jobs greater than physical memory Chapter 8 Input Output (primarily 8.3 and 8.5) ? of each job larger than physical memory Appendix A.8 SPIM I/O • Good utilization of available (fast and expensive) physical memory. • Simplifies : code and data movement, protection, ... (main reason today) • Exploits memory hierarchy to keep average access time low. • Involves at least two storage levels: main and secondary

Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address in physical memory also known as “physical address” or “real address”

© Alvin R. Lebeck CPS 104 2 © Alvin R. Lebeck CPS 104 4 Review: Paged Virtual Memory: Main Idea Review: Paged Virtual Memory

• Divide memory (virtual and physical) into fixed size • Virtual address (232, 264) to Physical Address mapping blocks (Pages, Frames). (228) ? Pages in Virtual space. ? virtual page to physical page frame Virtual page number Offset ? Frames in Physical space. • Fixed size units for access control & translation • Make page size a power of 2: (page size = 2k) Virtual • All pages in the virtual address space are contiguous. Physical 0x1000 • Pages can be mapped into any physical Frame 0x0000 • Some pages in main memory (DRAM), some pages on 0x1000 0x6000 secondary memory (disk). 0x2000 0x9000

0x11000

© Alvin R. Lebeck CPS 104 5 © Alvin R. Lebeck CPS 104 7

Review: Paged Virtual Memory: Main Idea (Cont) Review: Virtual to Physical Address translation

• All programs are written using Virtual Memory Page size: 4K Virtual Address Address Space. 31 11 0 • The hardware does on-the-fly translation between Virtual Page Number Page offset virtual and physical address spaces. • Use a to translate between Virtual and Physical addresses Need to • Translation Lookaside Buffer (TLB) expedites Page Table translate address translation every access • Must select “good” page size to minimize (instruction fragmentation and data)

27 11 0 Physical Frame Number Page offset

Physical Address

© Alvin R. Lebeck CPS 104 6 © Alvin R. Lebeck CPS 104 8 Review: Fast Translation: Translation Buffer Cache Indexing

• Cache of translated addresses • Tag on each block • 64 entry fully associative – No need to check index or block offset Page Page • Increasing associativity shrinks index, expands tag Virtual Number offset Address Block Address v r w tag phys frame 1 2 Physical Address TAG Index Block offset ...... 4

...... Fully Associative: No index Direct-Mapped: Large index 3 48 64x1 mux

© Alvin R. Lebeck CPS 104 9 © Alvin R. Lebeck CPS 104 11

Cache Memory 102 Address Translation and Caches

• Block 7 placed in 4 DM SA • Where is the TLB wrt the cache? block cache: FA 7 mod 4 7 mod 2 • What are the consequences? ? Fully associative, 0 1 2 3 0 1 2 3 0 1 2 3 direct mapped, 2-way set associative • Most of today’s systems have more than 1 cache ? S.A. Mapping = Block ? Digital 21164 had 3 levels Number Modulo Number Sets ? 2 levels on chip (8KB-data,8KB-inst,96KB-unified) Set Set ? DM = 1-way Set Assoc ? one level off chip (2-4MB) 0 1 • Cache Frame 0 1 2 3 7 • Does the OS need to worry about this? ? location in cache Main • Bit-selection Memory

© Alvin R. Lebeck CPS 104 10 © Alvin R. Lebeck CPS 104 12 TLBs and Caches Aliases and Virtual Caches 264-1 Physical • aliases (sometimes CPU CPU CPU Memory called synonyms); Stack VA VA VA Two different VA virtual addresses $ PA TB Tags $ TB map to same Tags Kernel PA VA PA Kernel physical address L2 $ • But, but... the $ TB virtual address is PA PA MEM used to index the cache MEM MEM User Overlap $ access Code/ • Could have data in with VA translation: Data two different Conventional Virtually Addressed Cache requires $ index to locations in the Organization Translate only on miss remain invariant Alias (Synonym) Problem cache across translation 0

© Alvin R. Lebeck CPS 104 13 © Alvin R. Lebeck CPS 104 15

Virtual Caches Index with Physical Portion of Address

• Send virtual address to cache. Called Virtually • If index is physical part of address, can start tag Addressed Cache or just Virtual Cache vs. access in parallel with translation so that we can Physical Cache or Real Cache compare to physical tag • Avoid address translation before accessing cache ? faster hit time to cache Page Address Page Offset • Switches? ? Just like the TLB (flush or pid) Address Tag Index Block Offset ? Cost is time to flush + “compulsory” misses from empty cache ? Add identifier tag that identifies process as well as address • Limits cache to page size: what if want bigger caches within process: can’t get a hit if wrong process and use same trick? • I/O must interact with cache ?Higher associativity ?Page coloring = careful selection of va->pa mapping

© Alvin R. Lebeck CPS 104 14 © Alvin R. Lebeck CPS 104 16 Page Coloring for Aliases Page Coloring

• HW guarantees that every cache frame holds unique • Make physical index match virtual index physical address • Behaves like virtual index cache • OS guarantee: lower n bits of virtual & physical page ? no conflicts for sequential pages numbers must have same value; if direct-mapped, • Possibly many conflicts between processes then aliases map to same cache frame ? address spaces all have same structure (stack, code, heap) ? one form of page coloring ? XOR PID with address (MIPS used variant of this) • Simple implementation Page Offset • Pick arbitrary page if necessary Page Address

Address Tag Block Offset Index

© Alvin R. Lebeck CPS 104 17 © Alvin R. Lebeck CPS 104 19

Virtual Memory and Physically Indexed Caches Cache Page frames • Notion of bin ? region of cache that may contain cache blocks from a page • Random vs. careful mapping Input / Output • Selection of physical page frame dictates cache index • Overall goal is to minimize cache misses

© Alvin R. Lebeck CPS 104 18 Overview Why I/O?

• I/O devices • Interactive Applications (keyboard, mouse, screen) ? device controller • Long term storage (files, data repository) • Device drivers • Swap for VM • Memory Mapped I/O • Many different devices • Programmed I/O ? character vs. block • (DMA) ? Networks are everywhere! 6 -9 -3 • Rotational media (disks) • 10 difference CPU (10 ) & I/O (10 ) • I/O technologies • Response Time vs. Throughput ? Not always another process to execute • RAID (if time) • OS hides (some) differences in devices ? same (similar) interface to many devices • Permits many apps to share one device

© Alvin R. Lebeck CPS 104 21 © Alvin R. Lebeck CPS 104 23

I/O Systems Device Drivers

interrupts Processor • top-half ? API (open, close, read, write, ioctl) Cache ? I/O Control (IOCTL, device specific arguments) • bottom-half Memory Bus I/O Bridge ? ? communicates with device Main I/O Bus ? resumes process Memory • Must have access to user address space and device Disk Graphics Network I/O Devices Controller Controller Interface control registers => runs in kernel mode.

Disk Disk Graphics Network

Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)

© Alvin R. Lebeck CPS 104 22 © Alvin R. Lebeck CPS 104 24 Review: Handling an Interrupt/Exception Device Controllers

User Program • Invoke specific kernel Interrupt? ld Interrupt Handler routine based on type of add Busy Done Error st interrupt mul ? interrupt/exception handler RETT beq Bus ld • Must determine what sub caused interrupt bne Service Controller deals with Routines • Clear the interrupt Command Status Data 0 mundane control Device • Return from interrupt Data 1 (e.g., position head, (RETT, MIPS = RFE) Controller error detection/correction) Data n-1 Processor communicates with Controller

Device

© Alvin R. Lebeck CPS 104 25 © Alvin R. Lebeck CPS 104 27

Processor <-> Device Interface Issues I/O Instructions

• Interconnections Separate instructions (in,out) ? Busses

• Processor interface CPU Memory ? I/O Instructions Independent I/O Bus memory ? Memory mapped I/O bus • I/O Control Structures Controller Controller ? Device Controllers ? Polling/Interrupts Device Device • Data movement ? Programmed I/O / DMA • Capacity, Access Time, Bandwidth

© Alvin R. Lebeck CPS 104 26 © Alvin R. Lebeck CPS 104 28 Memory Mapped I/O Data Movement

• Issue command through store instruction Physical Address • Programmed I/O • Check status with load instruction ? processor has to touch all the data ROM Caches? ? too much processor overhead RAM » for high bandwidth devices (disk, network)

• DMA CPU ? processor sets up transfer(s) $ Device I/O ? DMA controller transfers data ? complicates memory system

L2 $ Controller

Memory Bus I/O bus

Memory Bus Adapter Bridge

© Alvin R. Lebeck CPS 104 29 © Alvin R. Lebeck CPS 104 31

Communicating with the processor Programmed I/O & Polling

• Polling ? can waste time waiting for slow I/O device Is the ? busy wait • Advantage: CPU totally in control data ready? ? can interleave with useful work no yes • Interrupts • Disadvantage: Overhead of polling ? interrupt overhead load ? Program must perform check of device, data ? interrupt could happen anytime - asynchronous thus can’t do useful work ? no busy wait store data done? no

yes

© Alvin R. Lebeck CPS 104 30 © Alvin R. Lebeck CPS 104 32 Programmed I/O & Interrupt Driven Data Transfer SPIM (Future Homework): Interrupt Handler

CPU add (1) I/O $ sub • MIPS/SPIM program Device interrupt user and program or • Use memory-mapped I/O L2 $ Controller nop • Use interrupts Memory Bus (2) save PC I/O bus

Memory Bus Adapter (3) interrupt • Program should: service addr ? Accept keyboard input read store interrupt » interrupts service ... ? Echo input to terminal • User program progress halted (4) rti routine only during actual transfer » polling memory • Interrupt overhead can dominate ? Exit if user typed ‘q’ transfer time • Processor must touch all • Programmed I/O data…too slow for some devices

© Alvin R. Lebeck CPS 104 33 © Alvin R. Lebeck CPS 104 35

Direct Memory Access (DMA) Terminal Control

CPU sends a starting address, direction, and length count to • CPU delegates responsibility • Memory mapped DMAC. Then issues "start". I/O Unused 1 1 for data transfer to a special Receiver control controller ? use LW, SW (0xffff0000) • -mapped_io Interrupt Ready CPU command line enable $ 0 option Unused 8 ROM Receiver data • Receiver - input (0xffff0004) ? ready=1 when Received byte data valid L2 $ RAM Memory Bus Memory • Transmitter Unused 1 1 I/O bus Mapped I/O Transmitter control ? ready=1 when (0xffff0008) ready to print next char Interrupt Ready Memory Bus Adapter enable

DMA CNTRL DMAC Unused 8 n DMAC provides handshake signals for device Transmitter data controller, and memory addresses and handshake (0xffff000c) signals for memory. Transmitted byte

© Alvin R. Lebeck CPS 104 34 © Alvin R. Lebeck CPS 104 36 Interrupt Driven I/O Cause Register

• Set Interrupt Enable = 1 • Code 0000 = external interrupt ? generates a level 0 interrupt when Ready becomes 1 ? terminal interrupt ? if interrupt is enabled in also 15 10 5 2 Unused 1 1 Receiver control (0xffff0000)

Interrupt Ready enable Pending Exception interrupts code • Run spim with -notrap ? allows you to install interrupt handler ? So use both –mapped_io and -notrap

© Alvin R. Lebeck CPS 104 37 © Alvin R. Lebeck CPS 104 39

Status Register Summary

• Bit 0 = interrupt enable • Virtual memory & caches • Bit 8 = allow level 0 interrupts • I/O ? terminal input generates level 0 . ? Devices • Coprocessor 0, register 12 ? Processor interface issues ? use mfc0, mtc0 ? Memory mapped I/O ? Interrupts and polling • On interrupt, bits 0-5 are shifted left by 2 ? disables interrupts and enters kernel mode • SPIM terminal device • When done servicing interrupt, use rfe to restore 15 8 5 4 3 2 1 0

Interrupt mask Old Previous Current t t t l/ p l/ p l/ p u u u e r e r e r n r le n r le n r le r e r e r e e r t b e r t b e r t b e n a e n a e n a K s I n K s I n K s I n u e u e u e

© Alvin R. Lebeck CPS 104 38 © Alvin R. Lebeck CPS 104 40 Types of Storage Devices Disk Access

• Magnetic Disks • Access time = • Magnetic Tapes queue + seek + rotational + transfer + overhead • CD ROM • Seek time • Juke Box (automated tape library, robots) ? move arm over track ? average is confusing (startup, slowdown, locality of accesses) • Rotational latency ? wait for sector to rotate under head ? average = 0.5/(3600 RPM) = 8.3ms • Transfer Time ? f(size, BW bytes/sec)

© Alvin R. Lebeck CPS 104 41 © Alvin R. Lebeck CPS 104 43

Magnetic Disks I/O and Virtual Caches

• Long term nonvolatile storage Virtual interrupts • Another slower, less expensive level of memory Cache Processor hierarchy Physical Cache

Addresses Memory Bus I/O Bridge Track Sector I/O is accomplished Arm with physical addresses Main I/O Bus DMA Memory Cylinder • flush pages from cache Disk Graphics Network Controller Controller Interface Platter • need pa->va reverse Head translation • coherent DMA Disk Disk Graphics Network

© Alvin R. Lebeck CPS 104 42 © Alvin R. Lebeck CPS 104 44