<<

CS/COE1541: Introduction to Architecture Dept. of University of Pittsburgh

http://www.cs.pitt.edu/~melhem/courses/1541p/index.html

Chapter 5: Exploiting the Lecture 2

Lecturer: Rami Melhem

1

The Basics of Caches

• Until specified otherwise, it will be assumed that a is one word of

• Three issues: – How do we know if a data item is in the ? – If it is, how do we find it? address – If it is not, what do we do?

Main CPU Cache memory • It boils down to – where do we put an item in the cache? – how do we identify the items in the cache? data

• Two solutions – put item anywhere in cache (associative cache) – associate specific locations to specific items (direct mapped cache)

Note: we will assume that memory requests are for words (4 ) although an instruction can address a 2 Fully associative cache

00000 00001 To cache a data word, d, whose is L: 00010 00011 •Put d in any location in the cache 00100 00101 • Tag the data with the memory address L. 00110 00111 01000 Cache index 01001 01010 01011 000 01001000 01100 Example: 001 001 01101 010 010 01110 • An 8-word cache 011 011 01111 (indexed by 000, … , 111) 100 11000100 10000 101 101 10001 • A 32-words memory 110 01110110 10010 111 111 10011 (addresses 00000, … , 11111) Tags Data 10100 10101 Cache 10110 10111 11000 Advantage: can fully utilize the cache capacity 11001 11010 Disadvantage: need to search all tags to find data 11011 11100 Memory word addresses 11101 11110 (without the 2- byte offset) 11111 Memory 3

Direct Mapped Cache (direct hashing)

00000 Assume that the size of the cache is N words. 00001 00010 To cache a data word, d, whose memory address is L: 00011 00100 •Put d in cache at index = L mod N least significant log N bits of L 00101 00110 • Tag the data with L / N most significant N-log N bits of L 00111 01000 01001 Example: 01010 • memory address L = 6 (00110) 01011 000 01100 • cached in index 6 mod 8 = 6 (110) 001 01 01101 010 01110 • Tagged with 6 / 8 = 0 (00) 011 01111 100 10000 101 10001 110 00 10010 • memory address L = 9 (01001) 111 10011 10100 • cached in index 9 mod 8 = 1 (001) Tags Data 10101 • Tagged with 9 / 8 = 1 (01) 10110 Cache 10111 11000 11001 Drawback: collision 11010 • memory address L = 25 (11001) 11011 Should “evict” Location 9 11100 • cached in index 25 mod 8 = 1 (001) 11101 before caching Location 25. 11110 • Tagged with 25 / 8 = 3 (11) 11111 4 Direct Mapped Cache (example of a 4KB cache)

4KB cache, 4Bytes/word Requested memory address (showing positions)  31 30 . . . 13 12 11 . . . 2 1 0 Cache size =210 words Byte offset Hit Tag 20 10 data Index A valid bit is used to Index valid Tag data indicate that a location 0 contains valid data 1 2

Assume that a memory address is 32 bit long 1021 1022 (byte address, as in MIPS) 1023  20 32 • 2 bits for byte offset • 10 bits for index • 20 bits for tag

• Use the 10-bit index to access the cache • Compare the stored tag with the 20 bit tag from the address • if tag match and valid bit = 1, then cache hit (Hit = 1) return data • Else cache miss (Hit = 0) 5

Hits vs. Misses in direct mapped cache

One of the following scenarios occur in the cache when the CPU requests a memory address to read or write memory location L:

• In case of a cache hit (tag match and valid bit = 1) – read from or write the new data into the cache at index LmodN

• In case of a miss: (tag mismatch or valid bit = 0) – Stall the CPU – If cache index “L mod N” contains valid data (valid bit =1), evict the current data  write the current data back to memory – Fetch the data from memory and deliver to cache – Read from or write the new data into the cache – Resume the execution of the CPU Write back L:L: 100300 Reason: if data was changed (due to write operations) after writewritereadread L=200 L+NL=300L L+N: xxx it was brought to cache, then L mod N: 300xxx100200 the change has to be reflected in memory Cache of size N Memory 6 Two policies when writing data into the cache

• Write back policy – uses a “dirty bit” – When you bring data into the cache, its dirty bit to 0 – On a write operation, write only into the cache and set the dirty bit to 1 – When the data is evicted from the cache, write the data back to memory only if the dirty bit = 1. write L=200 dirty bit Write back L: 100 L: 100200

10 L mod N: 200100 L mod N: 100200

• Write through policy – does not use a “dirty bit” – Every time you write into the cache, write also into memory – Hence, do not need to “write back” when you evict data from the cache – On a write miss, may even write only to memory and not bring data to the cache (called a “no write allocate policy”, as opposed to “write allocate”) 7

Storage overhead (example of a 4KB cache)

Memory address (showing bit positions) 31 30 . . . 13 12 11 . . . 2 1 0

Byte offset Hit Tag 20 10 data Index Index valid Tag data 0 1 2

1021 1022 1023 20 32

The total number of bits needed for a 4KB cache assuming 32-bit address: • 1024 * 32 bits for data • 1024 * 20 bits for tags 210 * 53 = 53 Kbits In , there is usually a “dirty bit” per cache entry. • 1024 * 1 valid bits 8 Exploiting spatial locality by using “multiple word” blocks

Block Block Example: two words per block Block offset (offset of the address offset word within the block) Block index 0000 0 tag data 0000 1 tag offset(0) offset(1) 00 0 block 0, word 0 0001 0 00 1 block 0, word 1 0001 1 00 01 0 block 1, word 0 0010 0 01 01 1 block 1, word 1 0010 1 10 10 0 block 2, word 0 0011 0 11 10 1 block 2, word 1 0011 1 Block 11 0 block 3, word 0 0100 0 index cache 11 1 block 3, word 1 0100 1 Word 0101 0 index 0101 1 cache 0110 0 0110 1 Alternate representations 0111 0 0111 1 1000 0 1000 1 Tag = (memory word address) / (cache size in words) 1001 0 1001 1 Word Index = (memory word address) mod (cache size in words) 1010 0 1010 1 1011 0 1011 1 Block Index = Word Index / block size in words 1100 0 1100 1 Block offset = Word Index mod block size in words 1101 0 1101 1 1110 0 1110 1 1111 0 1111 1 • One tag per block ==> improves efficiency Word memory • Write misses have to preserve block integrity address - Copy block from memory to cache - Write the word into the block in the cache. 9

Example: block size = 2 words

• When a CPU issues a memory address, it is decomposed into a tag and a word index • The word index is decomposed into block index and a block offset Byte Word address offset tag block index block offset 00 Memory word address block Word index tag data Word index Block address Word index

tag data data Block index

Equivalent formulas: Block address = memory word address / block size in words Tag = block address / cache size in blocks memory Block index = block address mod cache size in blocks 10 Example: 64KB cache with blocks of 4 words (16 bytes)

Word = 2 2 bytes 32-bits address (showing bit positions) 2 Block index (12 bits) Block offset (2 bits) Block = 2 words Tag 31 … 16 15 … 4 3 21 0 Cache = 212 blocks Byte offset (2 bits) 16 12 2 data Hit Tag Index Block offset 16?? bits 128 bits V Tag Data block (4 words)

4K entries

16 32 32 32 32

Mux 32

The total number of bits needed for a 64KB cache assuming 32-bit address is: • 212 * 128 bits for data • 212 * 16 bits for tags 212 * 145 = 580 Kbits • 212 * 1 valid bits 11