
COMP 212 Computer Organization & Architecture Re-Cap of Lecture #3 • Cache system is a compromise between – COMP 212 Fall 2008 More memory system capacity – Faster access speed Lecture 4 – Cost • Memory System is Hierarchical Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages – Cost: other way around Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3 • Addressing • Direct Cache Mapping – If partition mem address into blocks, then higher bits correspond to – each mem block has a fixed cache line location, and each cache line is block address, lower bits correspond to word locations within the mapped to fixed locations in memory, e.g. block Tag s-r Line or Slot r Word w – Example, 8 14 2 8 » 8 bit address space, give us 2 = 256 word address – We have 214 cache lines, but 222 mem blocks. » If we group 4 word into a block, then we have 26= 64 blocks. – Cache hit/miss ? Check Tag, if mem request tag does not matches » Word address: 01101001 (69h)-> block address 011011 (1Bh) that in cache line, a miss. » Conversion between hex and binary: group binary in 4 bits blocks, each 4 – Pros/Cons: Simple to implement, but not flexible. bit block correspond to a hex number Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3 • Associative Cache Mapping • Set Associative Cache Mapping: – each mem block can reside in any cache line, e.g. – A compromise between direct and associative mapping Word – Cache line addressable by cache set Tag 22 bit 2 bit – Each cache set contains k cache lines, called k-way set associative cache. – We have 214 cache lines, but 222 mem blocks. – Mem address mapped to tag, cache set address, and word addr. – Cache hit/miss ? Check Tag with all cache line tags, if requested mem block tag does not exists, a miss. Word – Pros/Cons: flexible, can support complex cache replacement Tag 9 bit Cache Set 13 bit 2 bit algorithms, but expensive to implement (comparing all cache lines’ tags) Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008 Cache Replacement algorithms • When there’s a cache miss, a new memory block is loaded into the cache, we need replace cache content – If direct mapping, don’t have a choice, the new block has a fixed Cache Performance location in cache (spill over from lec #3) – If set associative mapping, need to choose which line in a set to replace – In associative mapping, more choices, larger space to choose from. • Typically hardware implemented, no CPU involvement. Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Replacement algorithms Write Policy • Algorithms used • Memory data consistency issue – no free lunch theorem. – Least Recently used (LRU) – When replace cache line, if cache data changed, before it is replaced, » e.g. in 2 way set associative cache, which of the 2 block is lru? need to write back to corresponding memory location • First in first out (FIFO) – When IO modified memory word via DMA, cache word becomes invalid, need to reload into cache – replace block that has been in cache longest – Multi-core CPU with its own cache: cache word invalid if changed by • Least frequently used one of the CPU – replace block which has had fewest hits • Random – Generate a random number to determine which one to replace Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008 Write through Write back • All writes go to main memory as well as cache • Purpose is to minimize write operations on BUS • Multiple CPUs can monitor main memory traffic to keep • When a cache line is updated, a bit is set to indicate local (to CPU) cache up to date that • Problem: • At the time of cache line replacement, only write to – Many writes to memory memory those lines updated. – Lots of traffic on bus – Average cache update is 15%, but for vector computing, 33%, matrix transposition, 50%. – Write involves a line instead of a word, so only if a cache word gets written multiple times before replacement, can make it profitable Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 Example Cache Performance • Memory write is 32 bit, takes 30ns • Cost per bits for a two level cache system • Cache line is 16 byes, 196bits – C1: cost for cache per bit – C2: cost of mem per bit • Average word writes per replacement is 12 times – S1: cache size • How will write back save BUS time than write thru ? – S2: mem size • Solutions • What is the average cost per bit ? – Write thru: 12 x 30 = 360ns / replacement cycle – Write back: (196/32)x30 = 240ns / replacement cycle C1* S1+ C2 * S2 S1+ S2 Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008 Cache Performance - Cost Cache Performance – Access • Consider the following 2 level system: – Cache hit ratio is h, i.e, prob of a memory word access is in cache – Time to access a word in L1 and L2 cache: T1, T2. • What is the average word access time ? Ts = h *T1+ (1− h) *(T1+ T 2) ⇒ T1 1 = Ts T 2 1+ (1− h) T1 – We want T1/Ts to be close to 1.0 Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Cache access as function of hit ratio Hit ratio vs data access locality • Different program has different access locality characteristics • What is the cache size affecting the hit ratio ? – If no locality, totally proportional to the S1/S2 ratio Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008 Re-Cap of Lecture #3 • Cache System Performance: – What are the cache replacement algorithms ? Memory and Disk System » LRU, FIFO, LFU, Random (mostly informational) – What is the difference between write back and write thru ? – When will write back be better than write thru ? – What is the cost per bit of a k-level cache system ? – What is the average access time for a k-level cache system ? Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Semiconductor Memory Types RAM • RAM, ROM, EPROM, EEPROM FalshMem • RAM – Prob the most important type for Computer – Misnamed as all semiconductor memory is random access – Support multiple read/write – Volatile – need refresh , provides temporary storage – Can be Static or Dynamic, will discuss in more detail later Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008 Memory Cell Operation (conceptually) Dynamic RAM Structure • Simple, bits stored as charge in capacitors, – Uses only 1 transistor and 1 capacitor • Charges leak, need refreshing even when powered • Mem cell need to be selected by address line • When write, the state of mem cell is changed • Recharge cycles make it • When read, just sensing. slow Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Transistor Operation DRAM Operation • When there’s no voltage • Requires some Physics background to understand on address line, the – Will explain intuitively, don’t panic ☺, transistor is • Address line active when bit read or written disconnected – Addr line controls the current flow on the line – • Use addr line to switch If no voltage on addr line, bit line and capacitor not connected • Write on/off – Voltage to bit line » High for 1 low for 0 – Then signal address line » Transfers charge to capacitor Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008 DRAM Operation Static RAM Structure • Read • Bits stored as voltages on bit – Address line selected line B and B complement » transistor turns on – S-R latch, will cover later in – Charge from capacitor fed via bit line to sense amplifier digital logic part. » Compares with reference value to determine 0 or 1 • No charges to leak, no – Capacitor charge must be restored refreshing needed when powered • More complex construction – 6 transistors to implement Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Static RAM Static RAM Operation • Transistor arrangement • More Complex Implementations gives stable logic state – Requires more transistors • Address line transistors T5 T6 are switches • More expensive • State 1 – C high, C low • Does not need refresh circuits, so 1 2 – T1 T4 off, T2 T3 on – Operates faster • State 0 – – Can be used as cache C2 high, C1 low – T2 T3 off, T1 T4 on • Write – apply value to B & compliment to B • Read – value is on line B Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008 SRAM & DRAM Summary Read Only Memory (ROM) • Both volatile • Permanent storage – Power needed to preserve data – Nonvolatile, does not require power • Dynamic cell • Typically used to store – Simpler to build, less expensive, – Microprogramming (see later) – smaller and denser : more bits per silicon area – Library subroutines – Needs refresh circuits – Systems programs (BIOS) – Used as Main Mem. – Function tables • Static – Faster – More expensive – Used as Cache Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages20 Page
-
File Size-