<<

COMP 212 Organization & Architecture Re-Cap of Lecture #3 • system is a compromise between

– COMP 212 Fall 2008 More memory system capacity – Faster access speed Lecture 4 – Cost • Memory System is Hierarchical Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages

– Cost: other way around

Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008

Re-Cap of Lecture #3 Re-Cap of Lecture #3

• Addressing • Direct Cache Mapping

– If partition mem address into blocks, then higher bits correspond to – each mem has a fixed cache line location, and each cache line is block address, lower bits correspond to word locations within the mapped to fixed locations in memory, e.g. block Tag s-r Line or Slot r Word w – Example, 8 14 2

8 » 8 bit address space, give us 2 = 256 word address – We have 214 cache lines, but 222 mem blocks. » If we group 4 word into a block, then we have 26= 64 blocks. – Cache hit/miss ? Check Tag, if mem request tag does not matches » Word address: 01101001 (69h)-> block address 011011 (1Bh) that in cache line, a miss. » Conversion between hex and binary: group binary in 4 bits blocks, each 4 – Pros/Cons: Simple to implement, but not flexible. bit block correspond to a hex number

Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3

• Associative Cache Mapping • Set Associative Cache Mapping:

– each mem block can reside in any cache line, e.g. – A compromise between direct and associative mapping Word – Cache line addressable by cache set Tag 22 bit 2 bit – Each cache set contains k cache lines, called k-way set associative cache. – We have 214 cache lines, but 222 mem blocks. – Mem address mapped to tag, cache set address, and word addr. – Cache hit/miss ? Check Tag with all cache line tags, if requested mem block tag does not exists, a miss. Word – Pros/Cons: flexible, can support complex cache replacement Tag 9 bit Cache Set 13 bit 2 bit algorithms, but expensive to implement (comparing all cache lines’ tags)

Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008

Cache Replacement algorithms

• When there’s a cache miss, a new memory block is loaded into the cache, we need replace cache content

– If direct mapping, don’t have a choice, the new block has a fixed Cache Performance location in cache (spill over from lec #3) – If set associative mapping, need to choose which line in a set to replace

– In associative mapping, more choices, larger space to choose from.

• Typically hardware implemented, no CPU involvement.

Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Replacement algorithms Write Policy

• Algorithms used • Memory consistency issue – no free lunch theorem.

– Least Recently used (LRU) – When replace cache line, if cache data changed, before it is replaced,

» e.g. in 2 way set associative cache, which of the 2 block is lru? need to write back to corresponding memory location

• First in first out (FIFO) – When IO modified memory word via DMA, cache word becomes invalid, need to reload into cache – replace block that has been in cache longest – Multi-core CPU with its own cache: cache word invalid if changed by • Least frequently used one of the CPU – replace block which has had fewest hits

• Random

– Generate a random number to determine which one to replace

Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008

Write through Write back

• All writes go to main memory as well as cache • Purpose is to minimize write operations on BUS

• Multiple CPUs can monitor main memory traffic to keep • When a cache line is updated, a bit is set to indicate local (to CPU) cache up to date that

• Problem: • At the time of cache line replacement, only write to

– Many writes to memory memory those lines updated.

– Lots of traffic on bus – Average cache update is 15%, but for vector computing, 33%, matrix transposition, 50%.

– Write involves a line instead of a word, so only if a cache word gets written multiple times before replacement, can make it profitable

Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 Example Cache Performance

• Memory write is 32 bit, takes 30ns • Cost per bits for a two level cache system

• Cache line is 16 byes, 196bits – C1: cost for cache per bit – C2: cost of mem per bit • Average word writes per replacement is 12 times – S1: cache size • How will write back save BUS time than write thru ? – S2: mem size • Solutions • What is the average cost per bit ? – Write thru: 12 x 30 = 360ns / replacement cycle

– Write back: (196/32)x30 = 240ns / replacement cycle C1* S1+ C2 * S2 S1+ S2

Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008

Cache Performance - Cost Cache Performance – Access

• Consider the following 2 level system:

– Cache hit ratio is h, i.e, prob of a memory word access is in cache

– Time to access a word in L1 and L2 cache: T1, T2.

• What is the average word access time ? Ts = h *T1+ (1− h) *(T1+ T 2) ⇒ T1 1 = Ts T 2 1+ (1− h) T1

– We want T1/Ts to be close to 1.0

Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Cache access as function of hit ratio Hit ratio vs data access locality

• Different program has different access locality characteristics

• What is the cache size affecting the hit ratio ?

– If no locality, totally proportional to the S1/S2 ratio

Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008

Re-Cap of Lecture #3

• Cache System Performance:

– What are the cache replacement algorithms ? Memory and Disk System » LRU, FIFO, LFU, Random (mostly informational) – What is the difference between write back and write thru ?

– When will write back be better than write thru ?

– What is the cost per bit of a k-level cache system ?

– What is the average access time for a k-level cache system ?

Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Types RAM

• RAM, ROM, EPROM, EEPROM FalshMem • RAM – Prob the most important type for Computer – Misnamed as all semiconductor memory is random access – Support multiple read/write – Volatile – need refresh , provides temporary storage – Can be Static or Dynamic, will discuss in more detail later

Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008

Memory Cell Operation (conceptually) Dynamic RAM Structure

• Simple, bits stored as charge in capacitors,

– Uses only 1 transistor and 1 capacitor

• Charges leak, need refreshing even when powered • Mem cell need to be selected by address line

• When write, the state of mem cell is changed • Recharge cycles make it

• When read, just sensing. slow

Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Transistor Operation DRAM Operation

• When there’s no voltage • Requires some Physics background to understand

on address line, the – Will explain intuitively, don’t panic ☺,

transistor is • Address line active when bit read or written

disconnected – Addr line controls the current flow on the line – • Use addr line to switch If no voltage on addr line, bit line and capacitor not connected • Write on/off – Voltage to bit line

» High for 1 low for 0

– Then signal address line

» Transfers charge to capacitor

Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008

DRAM Operation Static RAM Structure

• Read • Bits stored as voltages on bit – Address line selected line B and B complement » transistor turns on – S-R latch, will cover later in – Charge from capacitor fed via bit line to sense amplifier digital logic part. » Compares with reference value to determine 0 or 1 • No charges to leak, no – Capacitor charge must be restored refreshing needed when powered

• More complex construction

– 6 transistors to implement

Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Static RAM Static RAM Operation • Transistor arrangement • More Complex Implementations gives stable logic state

– Requires more transistors • Address line transistors T5 T6 are switches • More expensive • State 1 – C high, C low • Does not need refresh circuits, so 1 2 – T1 T4 off, T2 T3 on – Operates faster • State 0 – – Can be used as cache C2 high, C1 low – T2 T3 off, T1 T4 on • Write – apply value to B & compliment to B • Read – value is on line B

Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008

SRAM & DRAM Summary Read Only Memory (ROM)

• Both volatile • Permanent storage – Power needed to preserve data – Nonvolatile, does not require power • Dynamic cell • Typically used to store – Simpler to build, less expensive, – Microprogramming (see later) – smaller and denser : more bits per silicon area – Library subroutines – Needs refresh circuits – Systems programs (BIOS) – Used as Main Mem. – Function tables • Static

– Faster

– More expensive

– Used as Cache Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z. Li, 2008 Types of ROM Types of ROM

• ROM • Read “mostly”

– Customer built, – Erasable Programmable (EPROM) - optical

– Hardwired, very expensive for small runs » Erased by UV, very slow, takes 20 min , e.g.

• PROM - Programmable (once) – Electrically Erasable (EEPROM) » Takes much longer to write than read, but faster than EPROM – Needs special equipment to program » More expensive than EPROM » Erase whole memory electrically – : in between EPROM & EEPROM in cost

» Erase electrically,

» Can only erase blocks of mem

» Less expensive than EEPROM.

Comp 212 Computer Org & Arch 33 Z. Li, 2008 Comp 212 Computer Org & Arch 34 Z. Li, 2008

Organisation of DRAM

• We can have memory chips of 2W word and k bits for each word, for a total of k*2W bits.

– E.g. 4M word of 4 bits Organization of Memory • A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array, k=4, W=22.

– Address by column and row selection, 11 each

– Reduces number of address pins

» Multiplex row address and column address

» 11 pins to column/row address (211=2048),

Comp 212 Computer Org & Arch 35 Z. Li, 2008 Comp 212 Computer Org & Arch 36 Z. Li, 2008 Typical 16 Mb DRAM (4M x 4bits) Logic 16Mbit (4M x 4bit) Packaging

• A0~A10: address line row/col selection • RAS: row selection

• CAS: col selection

• WE: write enable Address lines • OE: output enable Data lines • Vcc: power supply • Vss: ground

• CE: chip enable Refresh Comp 212 Computer Org & Arch 37 Z. Li, 2008 Comp 212 Computer Org & Arch 38 Z. Li, 2008

Advanced DRAM Organization Synchronous DRAM (SDRAM)

• Synch RAM • Access is synchronized with an external clock

– DRAM is async, mem access need to wait • Address is presented to RAM

– SRAM sync with system clock • RAM finds data (CPU waits in conventional DRAM)

• RAM Bus • Since SDRAM moves data in time with system clock, CPU – Not using RAS, CAS, R/W enable and CE typical in DRAM knows when data will be ready

– Request via a asynchronous block request • CPU does not have to wait, it can do something else – Communicated over a Bus • Burst mode allows SDRAM to set up stream of data and fire it out in block

• DDR-SDRAM sends data twice per clock cycle

Comp 212 Computer Org & Arch 39 Z. Li, 2008 Comp 212 Computer Org & Arch 40 Z. Li, 2008 IBM 64Mb Sync DRAM Block Diagram RAMBUS Diagram

8-bit Data lines

Address lines • Asynchronous block protocol

– 480ns access time

– 16 data lines, address up to 320 DRAM chips.

Comp 212 Computer Org & Arch 41 Z. Li, 2008 Comp 212 Computer Org & Arch 42 Z. Li, 2008

RAMBUS DDR SDRAM

• Adopted by Intel for Pentium & Itanium • SDRAM can only send data once per clock

• Main competitor to Sync DRAM • Double-data-rate (DDR) SDRAM can send data twice per

• Vertical package – all pins on one side clock cycle

• Asynchronous block protocol

– 480ns access time

– Then 1.6 Gbps

Comp 212 Computer Org & Arch 43 Z. Li, 2008 Comp 212 Computer Org & Arch 44 Z. Li, 2008 Mitsubishi Cache DRAM Summary of RAMs

• Integrates small Static RAM cache (16 kb) onto generic • Dynamic RAM

DRAM chip – Analog technology, use capacitor voltage to indicate bit

• Used as true cache – Need refresh, slow – Easier to implement, denser solution – 64-bit lines

– Effective for ordinary random access • Static RAM

• To support serial access of block of data – Use transistor state to store bit, need more transistors per bit – No need to refresh, fast – E.g. refresh bit-mapped screen

» Cache DRAM can pre-fetch data from DRAM into SRAM buffer – More expensive

» Subsequent accesses solely to SRAM

Comp 212 Computer Org & Arch 45 Z. Li, 2008 Comp 212 Computer Org & Arch 46 Z. Li, 2008

Summary of RAMs

• DRAM organization

– DRAM:

» Widely used

» 2D bit array, addressed by row and column address lines External Memory (Disk) System » Have refresh circuits

– Advanced DRAM organization

» Sync DRAM: sync with system clock operation, no wait

» RAM BUS: block transfer protocol, bus implementation .

» DDR Sync RAM, aka, DDR SRAM, double the data access rate of SRAM

» Cache DRAM: local static RAM cache.

Comp 212 Computer Org & Arch 47 Z. Li, 2008 Comp 212 Computer Org & Arch 48 Z. Li, 2008 Types of External Memory Physics of Magnetic Disk

• Magnetic Disk (Hard Disk) • Disk substrate coated with magnetizable material (iron

– RAID oxide…rust)

– Removable • Substrate, or body of the disk can be glass, steel, • Optical Disk aluminium. – CD-ROM • Operates by magnetizing the elements on disk surface – CD-Recordable (CD-R)

– CD-R/W

– DVD

Comp 212 Computer Org & Arch 49 Z. Li, 2008 Comp 212 Computer Org & Arch 50 Z. Li, 2008

Inductive Write MR Read Read and Write Mechanisms Moving over magnetic Field generate current • Recording & retrieval via conductive coil called a head

Write current • Direction change May be single read/write head or separate ones N-S pattern • During read/write, head is stationary, disk rotates

Magnetic patterns

Comp 212 Computer Org & Arch 51 Z. Li, 2008 Comp 212 Computer Org & Arch 52 Z. Li, 2008 Write Mechanisms Read Mechanisms

• Write • Read (traditional) – Magnetic field moving relative to coil produces current, – Current through coil produces magnetic field – The same physics as write – Pulses sent to head – So use the same head for read and write – Magnetic pattern recorded on surface below • Read (contemporary)

– Separate read head, close to write head

– Partially shielded magneto-resistive (MR) sensor

– Electrical resistance depends on direction of magnetic field, so the polarization patterns can be read as different voltage values.

Comp 212 Computer Org & Arch 53 Z. Li, 2008 Comp 212 Computer Org & Arch 54 Z. Li, 2008

Disk Data Organization and Layout Disk Layout Methods Diagram

• Concentric rings or tracks

– Gaps between tracks

– Reduce gap to increase capacity

– Same number of bits per track (variable packing density)

– Constant angular velocity

• Tracks divided into sectors

• Minimum block size is one sector

• May have more than one sector per block

Comp 212 Computer Org & Arch 55 Z. Li, 2008 Comp 212 Computer Org & Arch 56 Z. Li, 2008 Disk Velocity Disk Characteristics

• Constant angular velocity (CAV)

– Gives pie shaped sectors and concentric tracks

– Individual tracks and sectors addressable

– Move head to given track and wait for given sector

– Waste of space on outer tracks

» Lower data density

• Multi-zone recording:

– Each zone has fixed bits per track

– More complex circuitry

Comp 212 Computer Org & Arch 57 Z. Li, 2008 Comp 212 Computer Org & Arch 58 Z. Li, 2008

Fixed/Movable Head Disk Removable or Not

• Fixed head • Removable disk

– One read write head per track – Can be removed from drive and replaced with another disk

– Heads mounted on fixed ridged arm – Provides unlimited storage capacity

• Movable head – Easy data transfer between systems

– One read write head per side • Nonremovable disk

– Mounted on a movable arm – Permanently mounted in the drive

Comp 212 Computer Org & Arch 59 Z. Li, 2008 Comp 212 Computer Org & Arch 60 Z. Li, 2008 Multiple Platters Disk Performance - Speed

• One head per side • Track-Track Seek time

• Heads are joined and aligned – Moving head to correct track

• Aligned tracks on each • (Rotational) latency platter form cylinders – Waiting for data to rotate under head, related to rpm: 1/(2*rpm) • • Data is striped by cylinder Access time = Seek + Latency

– reduces head movement • Transfer time:

– Increases speed (transfer rate) – how fast data can be read /write from disk

– T = b / (rpm*N): b, bytes to be transferred, rpm, rotation speed, N: bytes per track.

Comp 212 Computer Org & Arch 61 Z. Li, 2008 Comp 212 Computer Org & Arch 62 Z. Li, 2008

Disk I/O Performance Factors RAID

• RAID = Redundant Array of Independent Disks

• Set of physical disks viewed as single logical drive by O/S

• Data distributed across physical drives

• Can use redundant capacity to store parity information • Total time: for error correction

– T = Tseek + 1/ (2*rpm) + b / (rpm*N) • RAID0~6 for different levels of redundancy – Disk spin speed: rpm

– Disk data density: N bytes per track

– Tseek: how fast to locate a track.

Comp 212 Computer Org & Arch 63 Z. Li, 2008 Comp 212 Computer Org & Arch 64 Z. Li, 2008 RAID RAID 0, 1, 2

• RAID 0:

– No redundancy, 1-1 match between logic and physical disks

• RAID 1:

– Mirrored disk, 1-2 match between logic and physical disks

• RAID 2:

– Error correction coded, 4 mapped to 7 physical disk via Hamming coding

• RAID 3~6:

– More complex system

Comp 212 Computer Org & Arch 65 Z. Li, 2008 Comp 212 Computer Org & Arch 66 Z. Li, 2008

CD Operation

Optical Disk and Magnetic Tapes

Reflect lights differently at Pits and lands

Comp 212 Computer Org & Arch 67 Z. Li, 2008 Comp 212 Computer Org & Arch 68 Z. Li, 2008 CD-ROM How about Random Access on CD-ROM ?

• Originally for audio • Difficult

• 650Mbytes giving over 70 minutes audio • Move head to rough position

• Polycarbonate coated with highly reflective coat, usually • Set correct speed

aluminium • Read address

• Data stored as pits • Adjust to required location

• Read by reflecting • (Yawn!) • Constant packing density

, variable rotation velocity

Comp 212 Computer Org & Arch 69 Z. Li, 2008 Comp 212 Computer Org & Arch 70 Z. Li, 2008

Other Optical Storage DVD - what’s in a name?

• CD-Recordable (CD-R) • Digital Video Disk

– WORM – Used to indicate a player for movies

– Now affordable » Only plays video disks – Compatible with CD-ROM drives • Digital Versatile Disk

• CD-RW – Used to indicate a computer drive

– Erasable » Will read computer disks and play video disks

– Getting cheaper • Dogs Veritable Dinner

– Mostly CD-ROM drive compatible • Officially - nothing!!!

Comp 212 Computer Org & Arch 71 Z. Li, 2008 Comp 212 Computer Org & Arch 72 Z. Li, 2008 DVD - technology Magnetic Tape

• Multi-layer • Serial access

• Very high capacity (4.7G per layer) • Slow

• Full length movie on single disk • Very cheap

– Using MPEG compression • and archive • Finally standardized (honest!)

• Movies carry regional coding

• Next generation: BlueRay.

Comp 212 Computer Org & Arch 73 Z. Li, 2008 Comp 212 Computer Org & Arch 74 Z. Li, 2008

Summary of Lecture #4 Summary of Lecture #4

• Mostly informational, • External Memory

– do not need to memorize all those technical details – Types: Magnetic Disk, Optical Disk, Magnetic Tape

– But be able to appreciate the underlying physics and technology that – What affects the Disk performance ?

makes computer possible » Seek time

• Internal Memory » Rotation latency » Density – Types: RAM, ROM, EPROM, EEPROM, FlashMem

– Difference between Static and Dynamic RAM

» Capacitor vs transistor state based

» Refresh or not refresh

» Cost

Comp 212 Computer Org & Arch 75 Z. Li, 2008 Comp 212 Computer Org & Arch 76 Z. Li, 2008 Summary of Lecture #4

• Review questions:

– 5.4, 6.7

• Homework #1 question #4:

– Consider a disk system with average track seek time Tseek=8ms, rotation speed of 7200 rpm, number of bytes per sector 512, and number of sector per track 800, what is the average access time to read 4KB data ?

Comp 212 Computer Org & Arch 77 Z. Li, 2008