COMP 212 Computer Organization & Architecture Re-Cap of Lecture #3 • Cache system is a compromise between
– COMP 212 Fall 2008 More memory system capacity – Faster access speed Lecture 4 – Cost • Memory System is Hierarchical Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages
– Cost: other way around
Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008
Re-Cap of Lecture #3 Re-Cap of Lecture #3
• Addressing • Direct Cache Mapping
– If partition mem address into blocks, then higher bits correspond to – each mem block has a fixed cache line location, and each cache line is block address, lower bits correspond to word locations within the mapped to fixed locations in memory, e.g. block Tag s-r Line or Slot r Word w – Example, 8 14 2
8 » 8 bit address space, give us 2 = 256 word address – We have 214 cache lines, but 222 mem blocks. » If we group 4 word into a block, then we have 26= 64 blocks. – Cache hit/miss ? Check Tag, if mem request tag does not matches » Word address: 01101001 (69h)-> block address 011011 (1Bh) that in cache line, a miss. » Conversion between hex and binary: group binary in 4 bits blocks, each 4 – Pros/Cons: Simple to implement, but not flexible. bit block correspond to a hex number
Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3
• Associative Cache Mapping • Set Associative Cache Mapping:
– each mem block can reside in any cache line, e.g. – A compromise between direct and associative mapping Word – Cache line addressable by cache set Tag 22 bit 2 bit – Each cache set contains k cache lines, called k-way set associative cache. – We have 214 cache lines, but 222 mem blocks. – Mem address mapped to tag, cache set address, and word addr. – Cache hit/miss ? Check Tag with all cache line tags, if requested mem block tag does not exists, a miss. Word – Pros/Cons: flexible, can support complex cache replacement Tag 9 bit Cache Set 13 bit 2 bit algorithms, but expensive to implement (comparing all cache lines’ tags)
Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008
Cache Replacement algorithms
• When there’s a cache miss, a new memory block is loaded into the cache, we need replace cache content
– If direct mapping, don’t have a choice, the new block has a fixed Cache Performance location in cache (spill over from lec #3) – If set associative mapping, need to choose which line in a set to replace
– In associative mapping, more choices, larger space to choose from.
• Typically hardware implemented, no CPU involvement.
Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Replacement algorithms Write Policy
• Algorithms used • Memory data consistency issue – no free lunch theorem.
– Least Recently used (LRU) – When replace cache line, if cache data changed, before it is replaced,
» e.g. in 2 way set associative cache, which of the 2 block is lru? need to write back to corresponding memory location
• First in first out (FIFO) – When IO modified memory word via DMA, cache word becomes invalid, need to reload into cache – replace block that has been in cache longest – Multi-core CPU with its own cache: cache word invalid if changed by • Least frequently used one of the CPU – replace block which has had fewest hits
• Random
– Generate a random number to determine which one to replace
Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008
Write through Write back
• All writes go to main memory as well as cache • Purpose is to minimize write operations on BUS
• Multiple CPUs can monitor main memory traffic to keep • When a cache line is updated, a bit is set to indicate local (to CPU) cache up to date that
• Problem: • At the time of cache line replacement, only write to
– Many writes to memory memory those lines updated.
– Lots of traffic on bus – Average cache update is 15%, but for vector computing, 33%, matrix transposition, 50%.
– Write involves a line instead of a word, so only if a cache word gets written multiple times before replacement, can make it profitable
Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 Example Cache Performance
• Memory write is 32 bit, takes 30ns • Cost per bits for a two level cache system
• Cache line is 16 byes, 196bits – C1: cost for cache per bit – C2: cost of mem per bit • Average word writes per replacement is 12 times – S1: cache size • How will write back save BUS time than write thru ? – S2: mem size • Solutions • What is the average cost per bit ? – Write thru: 12 x 30 = 360ns / replacement cycle
– Write back: (196/32)x30 = 240ns / replacement cycle C1* S1+ C2 * S2 S1+ S2
Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008
Cache Performance - Cost Cache Performance – Access
• Consider the following 2 level system:
– Cache hit ratio is h, i.e, prob of a memory word access is in cache
– Time to access a word in L1 and L2 cache: T1, T2.
• What is the average word access time ? Ts = h *T1+ (1− h) *(T1+ T 2) ⇒ T1 1 = Ts T 2 1+ (1− h) T1
– We want T1/Ts to be close to 1.0
Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Cache access as function of hit ratio Hit ratio vs data access locality
• Different program has different access locality characteristics
• What is the cache size affecting the hit ratio ?
– If no locality, totally proportional to the S1/S2 ratio
Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008
Re-Cap of Lecture #3
• Cache System Performance:
– What are the cache replacement algorithms ? Memory and Disk System » LRU, FIFO, LFU, Random (mostly informational) – What is the difference between write back and write thru ?
– When will write back be better than write thru ?
– What is the cost per bit of a k-level cache system ?
– What is the average access time for a k-level cache system ?
Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Semiconductor Memory Types RAM
• RAM, ROM, EPROM, EEPROM FalshMem • RAM – Prob the most important type for Computer – Misnamed as all semiconductor memory is random access – Support multiple read/write – Volatile – need refresh , provides temporary storage – Can be Static or Dynamic, will discuss in more detail later
Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008
Memory Cell Operation (conceptually) Dynamic RAM Structure
• Simple, bits stored as charge in capacitors,
– Uses only 1 transistor and 1 capacitor
• Charges leak, need refreshing even when powered • Mem cell need to be selected by address line
• When write, the state of mem cell is changed • Recharge cycles make it
• When read, just sensing. slow
Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Transistor Operation DRAM Operation
• When there’s no voltage • Requires some Physics background to understand
on address line, the – Will explain intuitively, don’t panic ☺,
transistor is • Address line active when bit read or written
disconnected – Addr line controls the current flow on the line – • Use addr line to switch If no voltage on addr line, bit line and capacitor not connected • Write on/off – Voltage to bit line
» High for 1 low for 0
– Then signal address line
» Transfers charge to capacitor
Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008
DRAM Operation Static RAM Structure
• Read • Bits stored as voltages on bit – Address line selected line B and B complement » transistor turns on – S-R latch, will cover later in – Charge from capacitor fed via bit line to sense amplifier digital logic part. » Compares with reference value to determine 0 or 1 • No charges to leak, no – Capacitor charge must be restored refreshing needed when powered
• More complex construction
– 6 transistors to implement
Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Static RAM Static RAM Operation • Transistor arrangement • More Complex Implementations gives stable logic state
– Requires more transistors • Address line transistors T5 T6 are switches • More expensive • State 1 – C high, C low • Does not need refresh circuits, so 1 2 – T1 T4 off, T2 T3 on – Operates faster • State 0 – – Can be used as cache C2 high, C1 low – T2 T3 off, T1 T4 on • Write – apply value to B & compliment to B • Read – value is on line B
Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008
SRAM & DRAM Summary Read Only Memory (ROM)
• Both volatile • Permanent storage – Power needed to preserve data – Nonvolatile, does not require power • Dynamic cell • Typically used to store – Simpler to build, less expensive, – Microprogramming (see later) – smaller and denser : more bits per silicon area – Library subroutines – Needs refresh circuits – Systems programs (BIOS) – Used as Main Mem. – Function tables • Static
– Faster
– More expensive
– Used as Cache Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z. Li, 2008 Types of ROM Types of ROM
• ROM • Read “mostly”
– Customer built, – Erasable Programmable (EPROM) - optical
– Hardwired, very expensive for small runs » Erased by UV, very slow, takes 20 min , e.g.
• PROM - Programmable (once) – Electrically Erasable (EEPROM) » Takes much longer to write than read, but faster than EPROM – Needs special equipment to program » More expensive than EPROM » Erase whole memory electrically – Flash memory: in between EPROM & EEPROM in cost
» Erase electrically,
» Can only erase blocks of mem
» Less expensive than EEPROM.
Comp 212 Computer Org & Arch 33 Z. Li, 2008 Comp 212 Computer Org & Arch 34 Z. Li, 2008
Organisation of DRAM
• We can have memory chips of 2W word and k bits for each word, for a total of k*2W bits.
– E.g. 4M word of 4 bits Organization of Memory • A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array, k=4, W=22.
– Address by column and row selection, 11 each
– Reduces number of address pins
» Multiplex row address and column address
» 11 pins to column/row address (211=2048),
Comp 212 Computer Org & Arch 35 Z. Li, 2008 Comp 212 Computer Org & Arch 36 Z. Li, 2008 Typical 16 Mb DRAM (4M x 4bits) Logic 16Mbit (4M x 4bit) Packaging
• A0~A10: address line row/col selection • RAS: row selection
• CAS: col selection
• WE: write enable Address lines • OE: output enable Data lines • Vcc: power supply • Vss: ground
• CE: chip enable Refresh Comp 212 Computer Org & Arch 37 Z. Li, 2008 Comp 212 Computer Org & Arch 38 Z. Li, 2008
Advanced DRAM Organization Synchronous DRAM (SDRAM)
• Synch RAM • Access is synchronized with an external clock
– DRAM is async, mem access need to wait • Address is presented to RAM
– SRAM sync with system clock • RAM finds data (CPU waits in conventional DRAM)
• RAM Bus • Since SDRAM moves data in time with system clock, CPU – Not using RAS, CAS, R/W enable and CE typical in DRAM knows when data will be ready
– Request via a asynchronous block request • CPU does not have to wait, it can do something else – Communicated over a Bus • Burst mode allows SDRAM to set up stream of data and fire it out in block
• DDR-SDRAM sends data twice per clock cycle
Comp 212 Computer Org & Arch 39 Z. Li, 2008 Comp 212 Computer Org & Arch 40 Z. Li, 2008 IBM 64Mb Sync DRAM Block Diagram RAMBUS Diagram
8-bit Data lines
Address lines • Asynchronous block protocol
– 480ns access time
– 16 data lines, address up to 320 DRAM chips.
Comp 212 Computer Org & Arch 41 Z. Li, 2008 Comp 212 Computer Org & Arch 42 Z. Li, 2008
RAMBUS DDR SDRAM
• Adopted by Intel for Pentium & Itanium • SDRAM can only send data once per clock
• Main competitor to Sync DRAM • Double-data-rate (DDR) SDRAM can send data twice per
• Vertical package – all pins on one side clock cycle
• Asynchronous block protocol
– 480ns access time
– Then 1.6 Gbps
Comp 212 Computer Org & Arch 43 Z. Li, 2008 Comp 212 Computer Org & Arch 44 Z. Li, 2008 Mitsubishi Cache DRAM Summary of RAMs
• Integrates small Static RAM cache (16 kb) onto generic • Dynamic RAM
DRAM chip – Analog technology, use capacitor voltage to indicate bit
• Used as true cache – Need refresh, slow – Easier to implement, denser solution – 64-bit lines
– Effective for ordinary random access • Static RAM
• To support serial access of block of data – Use transistor state to store bit, need more transistors per bit – No need to refresh, fast – E.g. refresh bit-mapped screen
» Cache DRAM can pre-fetch data from DRAM into SRAM buffer – More expensive
» Subsequent accesses solely to SRAM
Comp 212 Computer Org & Arch 45 Z. Li, 2008 Comp 212 Computer Org & Arch 46 Z. Li, 2008
Summary of RAMs
• DRAM organization
– DRAM:
» Widely used
» 2D bit array, addressed by row and column address lines External Memory (Disk) System » Have refresh circuits
– Advanced DRAM organization
» Sync DRAM: sync with system clock operation, no wait
» RAM BUS: block transfer protocol, bus implementation .
» DDR Sync RAM, aka, DDR SRAM, double the data access rate of SRAM
» Cache DRAM: local static RAM cache.
Comp 212 Computer Org & Arch 47 Z. Li, 2008 Comp 212 Computer Org & Arch 48 Z. Li, 2008 Types of External Memory Physics of Magnetic Disk
• Magnetic Disk (Hard Disk) • Disk substrate coated with magnetizable material (iron
– RAID oxide…rust)
– Removable • Substrate, or body of the disk can be glass, steel, • Optical Disk aluminium. – CD-ROM • Operates by magnetizing the elements on disk surface – CD-Recordable (CD-R)
– CD-R/W
– DVD
Comp 212 Computer Org & Arch 49 Z. Li, 2008 Comp 212 Computer Org & Arch 50 Z. Li, 2008
Inductive Write MR Read Read and Write Mechanisms Moving over magnetic Field generate current • Recording & retrieval via conductive coil called a head
Write current • Direction change May be single read/write head or separate ones N-S pattern • During read/write, head is stationary, disk rotates
Magnetic patterns
Comp 212 Computer Org & Arch 51 Z. Li, 2008 Comp 212 Computer Org & Arch 52 Z. Li, 2008 Write Mechanisms Read Mechanisms
• Write • Read (traditional) – Magnetic field moving relative to coil produces current, – Current through coil produces magnetic field – The same physics as write – Pulses sent to head – So use the same head for read and write – Magnetic pattern recorded on surface below • Read (contemporary)
– Separate read head, close to write head
– Partially shielded magneto-resistive (MR) sensor
– Electrical resistance depends on direction of magnetic field, so the polarization patterns can be read as different voltage values.
Comp 212 Computer Org & Arch 53 Z. Li, 2008 Comp 212 Computer Org & Arch 54 Z. Li, 2008
Disk Data Organization and Layout Disk Layout Methods Diagram
• Concentric rings or tracks
– Gaps between tracks
– Reduce gap to increase capacity
– Same number of bits per track (variable packing density)
– Constant angular velocity
• Tracks divided into sectors
• Minimum block size is one sector
• May have more than one sector per block
Comp 212 Computer Org & Arch 55 Z. Li, 2008 Comp 212 Computer Org & Arch 56 Z. Li, 2008 Disk Velocity Disk Characteristics
• Constant angular velocity (CAV)
– Gives pie shaped sectors and concentric tracks
– Individual tracks and sectors addressable
– Move head to given track and wait for given sector
– Waste of space on outer tracks
» Lower data density
• Multi-zone recording:
– Each zone has fixed bits per track
– More complex circuitry
Comp 212 Computer Org & Arch 57 Z. Li, 2008 Comp 212 Computer Org & Arch 58 Z. Li, 2008
Fixed/Movable Head Disk Removable or Not
• Fixed head • Removable disk
– One read write head per track – Can be removed from drive and replaced with another disk
– Heads mounted on fixed ridged arm – Provides unlimited storage capacity
• Movable head – Easy data transfer between systems
– One read write head per side • Nonremovable disk
– Mounted on a movable arm – Permanently mounted in the drive
Comp 212 Computer Org & Arch 59 Z. Li, 2008 Comp 212 Computer Org & Arch 60 Z. Li, 2008 Multiple Platters Disk Performance - Speed
• One head per side • Track-Track Seek time
• Heads are joined and aligned – Moving head to correct track
• Aligned tracks on each • (Rotational) latency platter form cylinders – Waiting for data to rotate under head, related to rpm: 1/(2*rpm) • • Data is striped by cylinder Access time = Seek + Latency
– reduces head movement • Transfer time:
– Increases speed (transfer rate) – how fast data can be read /write from disk
– T = b / (rpm*N): b, bytes to be transferred, rpm, rotation speed, N: bytes per track.
Comp 212 Computer Org & Arch 61 Z. Li, 2008 Comp 212 Computer Org & Arch 62 Z. Li, 2008
Disk I/O Performance Factors RAID
• RAID = Redundant Array of Independent Disks
• Set of physical disks viewed as single logical drive by O/S
• Data distributed across physical drives
• Can use redundant capacity to store parity information • Total time: for error correction
– T = Tseek + 1/ (2*rpm) + b / (rpm*N) • RAID0~6 for different levels of redundancy – Disk spin speed: rpm
– Disk data density: N bytes per track
– Tseek: how fast to locate a track.
Comp 212 Computer Org & Arch 63 Z. Li, 2008 Comp 212 Computer Org & Arch 64 Z. Li, 2008 RAID RAID 0, 1, 2
• RAID 0:
– No redundancy, 1-1 match between logic and physical disks
• RAID 1:
– Mirrored disk, 1-2 match between logic and physical disks
• RAID 2:
– Error correction coded, 4 logical disk mapped to 7 physical disk via Hamming coding
• RAID 3~6:
– More complex system
Comp 212 Computer Org & Arch 65 Z. Li, 2008 Comp 212 Computer Org & Arch 66 Z. Li, 2008
CD Operation
Optical Disk and Magnetic Tapes
Reflect lights differently at Pits and lands
Comp 212 Computer Org & Arch 67 Z. Li, 2008 Comp 212 Computer Org & Arch 68 Z. Li, 2008 Optical Storage CD-ROM How about Random Access on CD-ROM ?
• Originally for audio • Difficult
• 650Mbytes giving over 70 minutes audio • Move head to rough position
• Polycarbonate coated with highly reflective coat, usually • Set correct speed
aluminium • Read address
• Data stored as pits • Adjust to required location
• Read by reflecting laser • (Yawn!) • Constant packing density
• Constant linear velocity, variable rotation velocity
Comp 212 Computer Org & Arch 69 Z. Li, 2008 Comp 212 Computer Org & Arch 70 Z. Li, 2008
Other Optical Storage DVD - what’s in a name?
• CD-Recordable (CD-R) • Digital Video Disk
– WORM – Used to indicate a player for movies
– Now affordable » Only plays video disks – Compatible with CD-ROM drives • Digital Versatile Disk
• CD-RW – Used to indicate a computer drive
– Erasable » Will read computer disks and play video disks
– Getting cheaper • Dogs Veritable Dinner
– Mostly CD-ROM drive compatible • Officially - nothing!!!
Comp 212 Computer Org & Arch 71 Z. Li, 2008 Comp 212 Computer Org & Arch 72 Z. Li, 2008 DVD - technology Magnetic Tape
• Multi-layer • Serial access
• Very high capacity (4.7G per layer) • Slow
• Full length movie on single disk • Very cheap
– Using MPEG compression • Backup and archive • Finally standardized (honest!)
• Movies carry regional coding
• Next generation: BlueRay.
Comp 212 Computer Org & Arch 73 Z. Li, 2008 Comp 212 Computer Org & Arch 74 Z. Li, 2008
Summary of Lecture #4 Summary of Lecture #4
• Mostly informational, • External Memory
– do not need to memorize all those technical details – Types: Magnetic Disk, Optical Disk, Magnetic Tape
– But be able to appreciate the underlying physics and technology that – What affects the Disk performance ?
makes computer possible » Seek time
• Internal Memory » Rotation latency » Density – Types: RAM, ROM, EPROM, EEPROM, FlashMem
– Difference between Static and Dynamic RAM
» Capacitor vs transistor state based
» Refresh or not refresh
» Cost
Comp 212 Computer Org & Arch 75 Z. Li, 2008 Comp 212 Computer Org & Arch 76 Z. Li, 2008 Summary of Lecture #4
• Review questions:
– 5.4, 6.7
• Homework #1 question #4:
– Consider a disk system with average track seek time Tseek=8ms, rotation speed of 7200 rpm, number of bytes per sector 512, and number of sector per track 800, what is the average access time to read 4KB data ?
Comp 212 Computer Org & Arch 77 Z. Li, 2008