Copy-Back Cache Organisation for An
Total Page:16
File Type:pdf, Size:1020Kb
COPY-BACK CACHE ORGANISATION FOR AN ASYNCHRONOUS MICROPROCESSOR A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science & Engineering 2002 DARANEE HORMDEE Department of Computer Science Contents Contents ...................................................................................................................2 List of Figures .........................................................................................................6 List of Tables ...........................................................................................................8 Abstract ...................................................................................................................9 Declaration ............................................................................................................10 Copyright ...............................................................................................................10 The Author ............................................................................................................11 Acknowledgements ...............................................................................................12 Dedication .............................................................................................................13 Chapter 1: Introduction ....................................................................................14 1.1 Thesis organisation .................................................................................16 1.2 Research contributions ...........................................................................18 Chapter 2: Background Material .....................................................................19 2.1 Asynchronous design ..............................................................................19 2.1.1 Claimed advantages .....................................................................21 2.1.2 Drawbacks ....................................................................................22 2.2 Cache and memory hierarchy .................................................................24 2.2.1 Locality of reference ....................................................................27 2.2.2 Hit or miss ....................................................................................28 2.2.3 Cache line fetch ............................................................................29 2.2.4 Cache (physical) organisation ......................................................29 2.2.5 Degree of associativity .................................................................30 2.2.6 Cache replacement strategies .......................................................32 2.2.7 Memory burst access ....................................................................33 2.2.8 Write policies ...............................................................................34 2.2.9 Write buffering .............................................................................35 2.3 Summary .................................................................................................36 Chapter 3: Tuning Memory Hierarchy Performance ....................................37 3.1 Measuring performance ..........................................................................37 3.2 Reducing cache hit time .........................................................................38 3.3 Reducing cache miss rate .......................................................................39 3.3.1 Larger cache size ..........................................................................39 3.3.2 Longer cache line .........................................................................39 3.3.3 Higher degree of associativity ......................................................40 3.3.4 Better replacement strategies .......................................................40 3.3.5 Victim cache ................................................................................40 3.4 Reducing cache miss penalty ..................................................................41 3.4.1 Giving read misses priority over writes .......................................42 3.4.2 Line fetch mechanism ..................................................................42 3.4.3 Using multiple levels of cache .....................................................46 3.5 Hiding latency .........................................................................................49 3.5.1 Prefetching ...................................................................................50 3.5.2 Pipelining .....................................................................................51 2 3.6 Reducing memory traffic ........................................................................54 3.6.1 Write merging ..............................................................................54 3.6.2 Copy-back write policy ................................................................54 3.7 Other Notable Techniques ......................................................................58 3.7.1 Sub-blocking ................................................................................58 3.7.2 Cache lock-down ..........................................................................58 3.8 Commercial Cache Implementations ......................................................59 3.8.1 The AMD-K6-III cache system ...................................................59 3.8.2 The Intel Pentium 4 cache system ................................................60 3.8.3 The Intel StrongARM SA-1110 cache system .............................60 3.8.4 The ARM940T cache system .......................................................60 3.8.5 The Sun UltraSPARC III cache system .......................................61 3.8.6 The IBM PowerPC 405 cache system ..........................................61 3.9 Discussion ...............................................................................................62 3.10 Summary ...............................................................................................64 Chapter 4: Asynchronous Memories ................................................................65 4.1 Asynchronous processor survey .............................................................65 4.2 Asynchronous cache systems .................................................................71 4.2.1 The ECSTAC cache system .........................................................71 4.2.2 The TITAC-2 cache system .........................................................71 4.2.3 The Caltech MiniMIPS cache system ..........................................72 4.2.4 The Kin memory system ..............................................................73 4.3 AMULET memory systems ....................................................................73 4.3.1 The AMULET2e cache system ....................................................73 4.3.2 The AMULET3i dual-port RAM system .....................................75 4.4 Observations ...........................................................................................80 4.5 Summary .................................................................................................81 Chapter 5: An Asynchronous Copy-back Cache ............................................82 5.1 Environment ...........................................................................................82 5.2 Basic architecture ...................................................................................84 5.3 Pseudo two-level cache structure ...........................................................87 5.3.1 ‘Cache hit’ ....................................................................................87 5.3.2 ‘Cache miss’ .................................................................................88 5.4 Line fetch engine ....................................................................................89 5.5 Line allocation mechanism .....................................................................90 5.6 Cache operations .....................................................................................92 5.6.1 Line-buffer read hit ......................................................................92 5.6.2 Line-buffer write hit .....................................................................93 5.6.3 Cache RAM read hit ....................................................................95 5.6.4 Cache RAM write hit ...................................................................96 5.6.5 LFL read hit .................................................................................96 5.6.6 LFL write hit ................................................................................97 5.6.7 Read miss ...................................................................................100 5.6.8 Write miss ..................................................................................101 5.7 Exploiting sequentiality ........................................................................102 5.8 Timing in a non-blocking line fetch mechanism ..................................103 5.8.1 Hits and misses in a non-blocking scheme ................................104 5.8.2 Handling writes ..........................................................................105 3 5.9 Resolving ordering problems ................................................................107 5.9.1 Inter-block data