Computer Architecture Cache Memory Dr. Falah Hassan

Computer Architecture Cache Memory Dr. Falah Hassan Ali Part-time Professor uOttawa 2012 Cache memory, also called CPU memory, is random access memory (RAM) that a computer microprocessor can access more quickly than it can access regular RAM. Cache memory is typically integrated directly with the CPU chip or placed on a separate chip that has a separate bus interconnect with the CPU A CPU cache is a cache used by the central processing unit (CPU) of a computer to reduce the average time to access data from the main memory. The cache is a smaller, faster memory which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.). Cache memory is a small size memory but it is faster than RAM. Cache is faster because it uses SRAM which contains flip-flops which is faster than capacitors storing charge in DRAM. Cache are of three types : L1(level 1),L2(level 2),L3(level 3) cache. L1 cache is inbuilt on the processor. The processor access RAM once and transfers the block of it in it's cache so processor doesn't need to access RAM again and again it is there on the processor cache. L2 cache can be inbuilt or outside the processor as shown in above image it is faster than RAM but slower than L1 cache. L3 cache is always outside the processor as shown in figure it is faster than RAM but slower than L1 and L2 cache. The amount of cache memory used in a computer is way more less than RAM and hard drive memory etc. because of it's high cost and, more the cache memory more is the heating on the processor. When an instruction is to be executed by the processor it will first search it in the cache and if it is not found it will search in RAM, this type of cache is called look up cache. Another type of cache is look aside cache which searches simultaneously in RAM and cache. Look aside cache saves more time in searching. In today's world cache memory are 6,8,12 MB for computers. In mobiles the cache memory concept is same, but the app's cache data is different. Apps keep cache data to access content faster in future. Cache Line Cache is partitioned into lines (also called blocks). Each line has 4-64 bytes in it. During data transfer, a whole line is read or written. Each line has a tag that indicates the address in M from which the line has been copied. Index Tag Data Index Data 0 2 ABC 0 DEF 1 0 DEF 1 PQR 2 ABC 3 XYZ Cache Main Memory Cache hit is detected through an associative search of all the tags. Associative search provides a fast response to the query: “Does this key match with any of the tags”? Data is read only if a match is found. Types of Cache 1. Fully Associative 2. Direct Mapped 3. Set Associative Fully Associative Cache tag data M-addr key C M 1- No restriction on mapping from M to C. 2- Associative search of tags is expensive. 3- Feasible for very small size caches only. Direct-Mapped Cache A given memory block can be mapped into one and only cache line. Here is an example of mapping Cache line Main memory block 0 8 … ,24 ,16 ,8 ,0n 1 8 … ,25 .17 ,9 ,1n+1 2 8 … ,26 ,18 ,10 ,2n+2 3 8 … ,27 ,19 ,11 ,3n+3 Advantage No need of expensive associative search! Disadvantage Miss rate may go up due to possible increase of mapping conflicts. Set-Associative Cache C M set 0 set 1 Set 3 Two-way Set-associative cache N-way set-associative cache Each M-block can now be mapped into any one of a set of N C-blocks. The sets are predefined. Let there be K blocks in the cache. Then N = 1 Direct-mapped cache N = K Fully associative cache Most commercial cache have N= 2, 4, or .8 Cheaper than a fully associative cache. Lower miss ratio than a direct mapped cache. But direct-mapped cache is the fastest. Specification of a cache memory Block size 64-4byte Hit time 2-1cycle Miss penalty 32-8cycles Access 10-6cycles Transfer 22-2cycles Miss rate %20-1 Cache size L1 8KB-64KB L2 128KB-2 MB Cache speed L1 0.5ns (8 GB/sec) L*2 0.75ns (6 GB/se)) on-chip cache What happens to the cache during a write operation? Write Policies If data is written to the cache, at some point it must also be written to main memory; the timing of this write is known as the write policy. In a write-through cache, every write to the cache causes a write to main memory. Alternatively, in a write-back or copy-back cache, writes are not immediately mirrored to the main memory, and the cache instead tracks which locations have been written over, marking them as dirty. The data in these locations is written back to the main memory only when that data is evicted from the cache. For this reason, a read miss in a write-back cache may sometimes require two memory accesses to service: one to first write the dirty location to main memory, and then another to read the new location from memory. Also, a write to a main memory location that is not yet mapped in a write-back cache may evict an already dirty location, thereby freeing that cache space for the new memory location. Problem with Direct-Mapped Direct-mapped cache: Two blocks in memory that map to the same index in the cache cannot be present in the cache Can lead to 0% hit rate if more than one block accessed an interleaved manner map to the same index Assume addresses A and B have the same in index bits but different tag bits A, B, A, B, A, B, A, B, … conflict in the cache index All accesses are conflict misses Set Associativity Associative memory within the set tag store --More complex, slower access, larger +Accommodates conflicts better (fewer conflict misses) 7 Modern processors have multiple interacting caches on chip. The operation of a particular cache can be completely specified by: •the cache size •the cache block size •the number of blocks in a set •the cache set replacement policy •the cache write policy (write-through or write-back) While all of the cache blocks in a particular cache are the same size and have the same associativity, typically "lower-level" caches (such as the L1 cache) have a smaller size, have smaller blocks, and have fewer blocks in a set, while "higher-level" caches (such as the L3 cache) have larger size, larger blocks, and more blocks in a set. Question : In a certain system the main memory access time is 100 ns. The cache is 10 time faster than the main memory and uses the write though protocol. If the hit ratio for read request is 0.92 and 85% of the memory requests generated by the CPU are for read, the remaining being for write; find the average time considered for read and write requests ? memory access time = 100ns, so cache access time would be = 10 ns (10 time faster). In order to find avg. time, we have a formula Tavg = hc+(1-h)M where h = hit rate, (1-h) = miss rate, c = time to access information from cache, M = miss penalty (time to access main memory) Write through operation : cache location and main memory location is updated simultaneously. It is given that 85% request generated by CPU is read request and 15% is write request. Tavg = 0.85(avg time for read request)+ 0.15(avg time for write request) = 0.85(0.92*10+0.08*100)+0.15(avg time for write request) //* 0.92 is a hit ratio for read request , but hit ratio for write request is not given ?? If I assume that hit ratio for write request is same as hit ratio for read request then, = 0.85(0.92*10+0.08*100)+0.15(0.92*(10+100)+0.08*100) =31 ns If I assume that hit ratio is 0% for write request then, = 0.85(0.92*10+0.08*100)+0.15(0*110+1*100) =29.62 ns Avg access time considering only read = 0.92*10 + 0.08*100 = 17.2 ns. Avg access time considering only write = 100 ns (because in write through you have to go back to memory to update even if it is a hit or miss. if you assume hit ratio = 0.5 and miss = 0.5 then 0.5*100 + 0.5*100 = 1*100) So, total access time for both read and write would be - 0.85*17.2 + 0.15*100 = 14.62 + 15 = 29.62 ns **you cant assume hit ratio for write same as hit ratio for read. for write request (write through) whatever will be the case, you have to write back in the memory. so the write access time will be equal to memory access time. Question 1. What advantage does a Harvard cache have over a unified cache? A Harvard (split) cache permits the processor to access both an instruction word and a data word in a single cache memory cycle.

Load more