Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages

Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages

COMP 212 Computer Organization & Architecture Re-Cap of Lecture #3 • Cache system is a compromise between – COMP 212 Fall 2008 More memory system capacity – Faster access speed Lecture 4 – Cost • Memory System is Hierarchical Memory and Disk Systems – Speed: » Registers > Cache > RAM > Hard Disk > Optical Storages – Cost: other way around Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3 • Addressing • Direct Cache Mapping – If partition mem address into blocks, then higher bits correspond to – each mem block has a fixed cache line location, and each cache line is block address, lower bits correspond to word locations within the mapped to fixed locations in memory, e.g. block Tag s-r Line or Slot r Word w – Example, 8 14 2 8 » 8 bit address space, give us 2 = 256 word address – We have 214 cache lines, but 222 mem blocks. » If we group 4 word into a block, then we have 26= 64 blocks. – Cache hit/miss ? Check Tag, if mem request tag does not matches » Word address: 01101001 (69h)-> block address 011011 (1Bh) that in cache line, a miss. » Conversion between hex and binary: group binary in 4 bits blocks, each 4 – Pros/Cons: Simple to implement, but not flexible. bit block correspond to a hex number Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Re-Cap of Lecture #3 Re-Cap of Lecture #3 • Associative Cache Mapping • Set Associative Cache Mapping: – each mem block can reside in any cache line, e.g. – A compromise between direct and associative mapping Word – Cache line addressable by cache set Tag 22 bit 2 bit – Each cache set contains k cache lines, called k-way set associative cache. – We have 214 cache lines, but 222 mem blocks. – Mem address mapped to tag, cache set address, and word addr. – Cache hit/miss ? Check Tag with all cache line tags, if requested mem block tag does not exists, a miss. Word – Pros/Cons: flexible, can support complex cache replacement Tag 9 bit Cache Set 13 bit 2 bit algorithms, but expensive to implement (comparing all cache lines’ tags) Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008 Cache Replacement algorithms • When there’s a cache miss, a new memory block is loaded into the cache, we need replace cache content – If direct mapping, don’t have a choice, the new block has a fixed Cache Performance location in cache (spill over from lec #3) – If set associative mapping, need to choose which line in a set to replace – In associative mapping, more choices, larger space to choose from. • Typically hardware implemented, no CPU involvement. Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Replacement algorithms Write Policy • Algorithms used • Memory data consistency issue – no free lunch theorem. – Least Recently used (LRU) – When replace cache line, if cache data changed, before it is replaced, » e.g. in 2 way set associative cache, which of the 2 block is lru? need to write back to corresponding memory location • First in first out (FIFO) – When IO modified memory word via DMA, cache word becomes invalid, need to reload into cache – replace block that has been in cache longest – Multi-core CPU with its own cache: cache word invalid if changed by • Least frequently used one of the CPU – replace block which has had fewest hits • Random – Generate a random number to determine which one to replace Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008 Write through Write back • All writes go to main memory as well as cache • Purpose is to minimize write operations on BUS • Multiple CPUs can monitor main memory traffic to keep • When a cache line is updated, a bit is set to indicate local (to CPU) cache up to date that • Problem: • At the time of cache line replacement, only write to – Many writes to memory memory those lines updated. – Lots of traffic on bus – Average cache update is 15%, but for vector computing, 33%, matrix transposition, 50%. – Write involves a line instead of a word, so only if a cache word gets written multiple times before replacement, can make it profitable Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 Example Cache Performance • Memory write is 32 bit, takes 30ns • Cost per bits for a two level cache system • Cache line is 16 byes, 196bits – C1: cost for cache per bit – C2: cost of mem per bit • Average word writes per replacement is 12 times – S1: cache size • How will write back save BUS time than write thru ? – S2: mem size • Solutions • What is the average cost per bit ? – Write thru: 12 x 30 = 360ns / replacement cycle – Write back: (196/32)x30 = 240ns / replacement cycle C1* S1+ C2 * S2 S1+ S2 Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008 Cache Performance - Cost Cache Performance – Access • Consider the following 2 level system: – Cache hit ratio is h, i.e, prob of a memory word access is in cache – Time to access a word in L1 and L2 cache: T1, T2. • What is the average word access time ? Ts = h *T1+ (1− h) *(T1+ T 2) ⇒ T1 1 = Ts T 2 1+ (1− h) T1 – We want T1/Ts to be close to 1.0 Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Cache access as function of hit ratio Hit ratio vs data access locality • Different program has different access locality characteristics • What is the cache size affecting the hit ratio ? – If no locality, totally proportional to the S1/S2 ratio Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008 Re-Cap of Lecture #3 • Cache System Performance: – What are the cache replacement algorithms ? Memory and Disk System » LRU, FIFO, LFU, Random (mostly informational) – What is the difference between write back and write thru ? – When will write back be better than write thru ? – What is the cost per bit of a k-level cache system ? – What is the average access time for a k-level cache system ? Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Semiconductor Memory Types RAM • RAM, ROM, EPROM, EEPROM FalshMem • RAM – Prob the most important type for Computer – Misnamed as all semiconductor memory is random access – Support multiple read/write – Volatile – need refresh , provides temporary storage – Can be Static or Dynamic, will discuss in more detail later Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008 Memory Cell Operation (conceptually) Dynamic RAM Structure • Simple, bits stored as charge in capacitors, – Uses only 1 transistor and 1 capacitor • Charges leak, need refreshing even when powered • Mem cell need to be selected by address line • When write, the state of mem cell is changed • Recharge cycles make it • When read, just sensing. slow Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Transistor Operation DRAM Operation • When there’s no voltage • Requires some Physics background to understand on address line, the – Will explain intuitively, don’t panic ☺, transistor is • Address line active when bit read or written disconnected – Addr line controls the current flow on the line – • Use addr line to switch If no voltage on addr line, bit line and capacitor not connected • Write on/off – Voltage to bit line » High for 1 low for 0 – Then signal address line » Transfers charge to capacitor Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008 DRAM Operation Static RAM Structure • Read • Bits stored as voltages on bit – Address line selected line B and B complement » transistor turns on – S-R latch, will cover later in – Charge from capacitor fed via bit line to sense amplifier digital logic part. » Compares with reference value to determine 0 or 1 • No charges to leak, no – Capacitor charge must be restored refreshing needed when powered • More complex construction – 6 transistors to implement Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Static RAM Static RAM Operation • Transistor arrangement • More Complex Implementations gives stable logic state – Requires more transistors • Address line transistors T5 T6 are switches • More expensive • State 1 – C high, C low • Does not need refresh circuits, so 1 2 – T1 T4 off, T2 T3 on – Operates faster • State 0 – – Can be used as cache C2 high, C1 low – T2 T3 off, T1 T4 on • Write – apply value to B & compliment to B • Read – value is on line B Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008 SRAM & DRAM Summary Read Only Memory (ROM) • Both volatile • Permanent storage – Power needed to preserve data – Nonvolatile, does not require power • Dynamic cell • Typically used to store – Simpler to build, less expensive, – Microprogramming (see later) – smaller and denser : more bits per silicon area – Library subroutines – Needs refresh circuits – Systems programs (BIOS) – Used as Main Mem. – Function tables • Static – Faster – More expensive – Used as Cache Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us