Verification of Hierarchical Cache Coherence Protocols for Future Processors

Total Page:16

File Type:pdf, Size:1020Kb

Verification of Hierarchical Cache Coherence Protocols for Future Processors VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURE PROCESSORS by Xiaofang Chen A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah May 2008 Copyright c Xiaofang Chen 2008 All Rights Reserved THE UNIVERSITY OF UTAH GRADUATE SCHOOL SUPERVISORY COMMITTEE APPROVAL of a dissertation submitted by Xiaofang Chen This dissertation has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair: Ganesh L. Gopalakrishnan Steven M. German Ching-Tsun Chou John B. Carter Rajeev Balasubramonian THE UNIVERSITY OF UTAH GRADUATE SCHOOL FINAL READING APPROVAL To the Graduate Council of the University of Utah: I have read the dissertation of Xiaofang Chen in its final form and have found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to The Graduate School. Date Ganesh L. Gopalakrishnan Chair: Supervisory Committee Approved for the Major Department Martin Berzins Chair/Director Approved for the Graduate Council David S. Chapman Dean of The Graduate School ABSTRACT The advancement of technology promises to make chip multiprocessors or multicores ubiquitous. With multicores, there naturally exists a memory hierarchy across which caches have to be kept coherent. Currently, large (hierarchical) cache coherence proto- cols are verified at either the high (specification) level or at the low (RTL implementation) level. Experiences show that model checking at the high level can detect and eliminate almost all the high level concurrency bugs. Most RTL implementations are too complex to be verified monolithically, and simulation, semi-formal, or formal techniques are often used. With hierarchical cache coherence protocols, there exist two unsolved problems: (i) handle the complexity of several coherence protocols running concurrently, and (ii) verify that the RTL implementations correctly implement the specifications. To solve the first problem, we propose to use assume guarantee reasoning to decom- pose a hierarchical coherence protocol into a set of abstract protocols. The approach is conservative, in the sense that by verifying these abstract protocols, the original hierar- chical protocol is guaranteed to be correct with respect to its properties. For the second problem, we propose a formal theory to check the refinement relationship, i.e. under which conditions can we claim that a RTL implementation correctly implements a high level specification. We also propose a compositional approach using abstraction and assume guarantee reasoning to reduce the verification complexity of refinement check. Finally, we propose to extend existing work in industry to mechanize the refinement check. We propose research that will lead to a characterization of the proposed mechanisms through several verification tools and protocol benchmarks. Our preliminary experiments show promising results. For high level modeling and verification, we show that for three hierarchical protocols with different features which we developed for multiple chip-multiprocessors, more than 95% of the explicit state space can be reduced. For refinement check, we show that for a driving coherence protocol example with real- istic hardware features, refinement check can find subtle bugs which are easy to miss by checking coherence properties alone. Furthermore, we show that for the protocol example, our compositional approach can finish the refinement check within 30 minutes while a state-of-art verification tool in industry cannot finish in over a day. Finally, we extend a hardware language and a tool, to mechanize the refinement check process. We believe that the proposed research will help solve some of the problems for future processors. v CONTENTS ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : iv LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : viii LIST OF TABLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix CHAPTERS 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Verification of Cache Coherence Protocols . 3 1.2 Problem Statement . 4 1.3 Thesis Statement . 6 1.4 Contributions . 7 1.5 Outline . 8 2. BACKGROUND AND RELATED WORK : : : : : : : : : : : : : : : : : : : : : : : : 9 2.1 Cache Coherence States . 9 2.1.1 The MSI Protocol . 9 2.1.2 The MESI Protocol . 10 2.2 Common Cache Coherence Protocols . 11 2.2.1 Snooping Protocols . 12 2.2.2 Directory-based Coherence Protocols . 13 2.2.3 Token Coherence Protocols . 14 2.3 Inclusive, Exclusive and Non-inclusive Caching Hierarchies . 15 2.4 Common Coherence Properties to Be Verified . 16 2.5 Related Work . 17 2.5.1 Verifying Hierarchical Cache Coherence Protocols in the High Level 17 2.5.2 Checking Refinement between Protocol Specifications and Imple- mentations . 19 3. VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS IN THE HIGH LEVEL SPECIFICATIONS : : : : : : : : : : : : : : : : : : : : : : : 23 3.1 Benchmarking Hierarchical Coherence Protocols . 23 3.1.1 The First Benchmark: An Inclusive Multicore Coherence Protocol 23 3.1.2 The Second Benchmark: A Non-inclusive Multicore Protocol . 27 3.1.3 The Third Benchmark: A Multicore Protocol With Snooping . 29 3.2 A Compositional Framework to Verify Hierarchical Coherence Protocols 30 3.2.1 Abstraction . 32 3.2.2 Assume Guarantee Reasoning . 38 3.2.3 Experimental results . 41 3.2.4 Soundness Proof . 42 3.3 Verification of Hierarchical Coherence Protocols One Level At A Time . 45 3.3.1 Building Abstract Protocols One Level At A Time . 47 3.3.2 When Can “Per-Level” Abstraction Be Applied? . 50 3.3.3 Using History Variables in Assume Guarantee Reasoning . 52 3.3.4 Experimental Results . 55 3.3.5 Soundness Proof . 56 3.4 Summary . 57 4. CHECKING REFINEMENT BETWEEN HIGH LEVEL PROTOCOL SPECIFICATIONS AND RTL IMPLEMENTATIONS : : : : : : : : : : : : : : : 59 4.1 Some Preliminaries to The Formal Definition of Refinement . 60 4.2 Checking Refinement . 64 4.3 Monolithic Model Checking . 65 4.3.1 Checking Correct Implementation . 67 4.4 Compositional Model Checking . 71 4.4.1 Abstraction . 72 4.4.2 Assume Guarantee Reasoning . 78 4.5 An Implementation of the Compositional Model Checking . 83 4.6 Muv: A Transaction Based Refinement Checker . 85 4.6.1 Hardware Murphi . 86 4.6.2 An Overview of Muv . 88 4.6.3 Assertion Generation for Refinement Check . 89 4.6.4 Experimental Benchmarks . 95 4.6.5 Experimental Results . 96 4.7 Summary . 98 5. CONCLUSION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100 REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 102 vii LIST OF FIGURES 2.1 A CMP with two levels of caches. 15 3.1 A 2-level hierarchical cache coherence protocol. 24 3.2 A scenario that can lead to the livelock problem. 25 3.3 A sample scenario of the hierarchical protocol. 26 3.4 Imprecise information of the local directory. 28 3.5 A multicore cache coherence protocol with snooping. 30 3.6 The workflow of our approach. 31 3.7 An abstract protocol. 32 3.8 The data structures of an original and an abstract cluster. 34 3.9 The Murphi rule for the writeback example. 37 3.10 Refine the writeback example. 40 3.11 A sample scenario of the genuine bug in the original hierarchical protocol. 41 3.12 Verification complexity of the hierarchical inclusive protocol. 42 3.13 The abstract protocols obtained via the new decomposition approach. 47 3.14 An abstracted cluster in the intra- and inter-cluster protocols. 48 3.15 Example rules of tightly and loosely coupled modeling. 52 3.16 The writeback example before/after abstraction, and during refinement. 53 3.17 Verification complexity on the three benchmark protocols. 56 4.1 The executions of MMC , ML, and MH . 69 4.2 The proposed workflow of the refinement check. 86 4.3 An example of transaction. 87 4.4 An Overview of the implementation protocol. 96 LIST OF TABLES 2.1 MSI state transitions . 10 2.2 MESI state transitions . 11 CHAPTER 1 INTRODUCTION Multicore architectures are considered inevitable, given the limitations of sequential processing hardware has hit various limits. Two classes of related protocols will be used in multicore processors: cache coherence protocols that manage the coherence of individ- ual cache lines, and shared memory consistency protocols that manage the view across multiple addresses. These protocols will also have non-trivial relationships with other protocols, e.g. hardware transaction memory protocols [22,87], and power management protocols [32]. For example, when a hardware transaction wins and commits, the cache lines affected by the losing transaction must all, typically, be invalidated. We will for now consider only coherence protocols, assuming the existence of suitable methods to decouple and verify related protocols such as hardware transaction memory protocols. With multicores, there naturally exists a memory hierarchy: the L1 cache of each core, the L2 and/or the L3 caches, the memory shared within a chip, the memory shared among the chips, etc. While there will be many differences from one
Recommended publications
  • Page 1 Cache Coherence in More Detail
    What is (Hardware) Shared Memory? • Take multiple microprocessors ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) • Implement a memory system with a single global physical address space (usually) Shared Memory MPs – Coherence & Snooping – Communication assist HW does the “magic” of cache coherence Copyright 2006 Daniel J. Sorin • Goal 1: Minimize memory latency Duke University – Use co-location & caches Slides are derived from work by • Goal 2: Maximize memory bandwidth Sarita Adve (Illinois), Babak Falsafi (CMU), – Use parallelism & caches Mark Hill (Wisconsin), Alvy Lebeck (Duke), Steve Reinhardt (Michigan), and J. P. Singh (Princeton). Thanks! (C) 2006 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 3 Outline Some Memory System Options • Motivation for Cache-Coherent Shared Memory P1 Pn Switch P P1 n (Interleaved) First-level $ • Snooping Cache Coherence (Chapter 5) $ $ – Basic systems Bus (Interleaved) – Design tradeoffs Main memory Mem I/O devices • Implementing Snooping Systems (Chapter 6) (a) Shared cache (b) Bus-based shared memory P1 Pn P • Advanced Snooping Systems P1 n $ $ $ $ Mem Mem Interconnection network Interconnection network Mem Mem (c) Dancehall (d) Distributed-memory (C) 2006 Daniel J. Sorin from Adve, (C) 2006 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 2 Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 4 Page 1 Cache Coherence In More Detail • According to Webster’s dictionary … • Efficient
    [Show full text]
  • Introduction and Cache Coherency
    EEC 581 Computer Architecture Multiprocessor and Memory Coherence Department of Electrical Engineering and Computer Science Cleveland State University Memory Hierarchy in a Multiprocessor Shared cache Bus-based shared memory P P P P P P $ $ $ Cache Memory Memory Fully-connected shared memory Distributed shared memory (Dancehall) P P P P P $ $ $ $ $ Memory Memory Interconnection Network Interconnection Network Memory Memory 2 1 Cache Coherency Closest cache level is private Multiple copies of cache line can be present across different processor nodes Local updates Lead to incoherent state Problem exhibits in both write-through and writeback caches Bus-based globally visible Point-to-point interconnect visible only to communicated processor nodes 3 Example (Writeback Cache) P P P Rd? Rd? Cache Cache Cache X= -100 X= -100 X=X= - 100505 X= -100 Memory 4 2 Example (Write-through Cache) P P P Rd? Cache Cache Cache X= -100 X= 505 X=X= - 100505 X=X= -505100 Memory 5 Defining Coherence An MP is coherent if the results of any execution of a program can be reconstructed by a hypothetical serial order Implicit definition of coherence Write propagation Writes are visible to other processes Write serialization All writes to the same location are seen in the same order by all processes (to “all” locations called write atomicity) E.g., w1 followed by w2 seen by a read from P1, will be seen in the same order by all reads by other processors Pi 6 3 Sounds Easy? A=0 B=0 P0 P1 P2 P3 T1 A=1 B=2 T2 A=1 A=1 B=2 B=2 T3 A=1 A=1 B=2 B=2 B=2 A=1
    [Show full text]
  • Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems
    Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universität Mannheim vorgelegt von Dipl.-Inf. David Christoph Slogsnat aus Heidelberg Mannheim, 2008 Dekan: Prof. Dr. Matthias Krause, Universität Mannheim Referent: Prof. Dr. Ulrich Brüning, Universität Heidelberg Koreferent: Prof. Dr. Reinhard Männer, Universität Heidelberg Tag der mündlichen Prüfung: 4. August 2008 Abstract The demand for processing power is increasing steadily. In the past, single processor archi- tectures clearly dominated the markets. As instruction level parallelism is limited in most applications, significant performance can only be achieved in the future by exploiting par- allelism at the higher levels of thread or process parallelism. As a consequence, modern “processors” incorporate multiple processor cores that form a single shared memory multi- processor. In such systems, high performance devices like network interface controllers are connected to processors and memory like every other input/output device over a hierarchy of periph- eral interconnects. Thus, one target must be to couple coprocessors physically closer to main memory and to the processors of a computing node. This removes the overhead of today’s peripheral interconnect structures. Such a step is the direct connection of Hyper- Transport (HT) devices to Opteron processors, which is presented in this thesis. Also, this work analyzes how communication from a device to processors can be optimized on the protocol level. As today’s computing nodes are shared memory systems, the cache coherence protocol is the central protocol for data exchange between processors and devices. Consequently, the analysis extends to classes of devices that are cache coherence protocol aware.
    [Show full text]
  • A Simulation Framework for Evaluating Location Consistency Based Cache
    LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS by Pouya Fotouhi A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Spring 2017 c 2017 Pouya Fotouhi All Rights Reserved LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS by Pouya Fotouhi Approved: Guang R. Gao,Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved: Kenneth E. Barner, Ph.D. Chair of the Department of Electrical and Computer Engineering Approved: Babatunde A. Ogunnaike, Ph.D. Dean of the College of Engineering Approved: Ann L. Ardis, Ph.D. Senior Vice Provost for Graduate and Professional Education ACKNOWLEDGMENTS I would like to thank Professor Gao for giving me the opportunity of joining CAPSL and multi-dimensional learning experience. With special thanks to Dr. St´ephaneZuckerman for guiding me step by step over the research, and my colleague Jose Monsalve Diaz for deep discussions and his technical help. Very special thanks to my wife Elnaz , and also my parents for their support and love. iii TABLE OF CONTENTS LIST OF FIGURES ::::::::::::::::::::::::::::::: vi ABSTRACT ::::::::::::::::::::::::::::::::::: ix Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 2 BACKGROUND ::::::::::::::::::::::::::::::: 4 2.1 An Introduction to Memory Consistency Models :::::::::::: 5 2.1.1 Uniform Memory Consistency Models :::::::::::::: 6 2.1.1.1 Sequential Consistency
    [Show full text]
  • Verification of Hierarchical Cache Coherence Protocols for Futuristic Processors
    VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURISTIC PROCESSORS by Xiaofang Chen A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah December 2008 Copyright °c Xiaofang Chen 2008 All Rights Reserved THE UNIVERSITY OF UTAH GRADUATE SCHOOL SUPERVISORY COMMITTEE APPROVAL of a dissertation submitted by Xiaofang Chen This dissertation has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair: Ganesh L. Gopalakrishnan Steven M. German Ching-Tsun Chou John B. Carter Rajeev Balasubramonian THE UNIVERSITY OF UTAH GRADUATE SCHOOL FINAL READING APPROVAL To the Graduate Council of the University of Utah: I have read the dissertation of Xiaofang Chen in its final form and have found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to The Graduate School. Date Ganesh L. Gopalakrishnan Chair: Supervisory Committee Approved for the Major Department Martin Berzins Chair/Director Approved for the Graduate Council David S. Chapman Dean of The Graduate School ABSTRACT Multicore architectures are considered inevitable, given that sequential processing hardware has hit various limits. Unfortunately, the memory system of multicore pro- cessors is a huge bottleneck, as distant memory accesses cost thousands of cycles. To combat this problem, one must design aggressively optimized cache coherence protocols.
    [Show full text]
  • Study and Performance Analysis of Cache-Coherence Protocols in Shared-Memory Multiprocessors
    Study and performance analysis of cache-coherence protocols in shared-memory multiprocessors Dissertation presented by Anthony GÉGO for obtaining the Master’s degree in Electrical Engineering Supervisor(s) Jean-Didier LEGAT Reader(s) Olivier BONAVENTURE, Ludovic MOREAU, Guillaume MAUDOUX Academic year 2015-2016 Abstract Cache coherence is one of the main challenges to tackle when designing a shared-memory mul- tiprocessors system. Incoherence may happen when multiple actors in a system are working on the same pieces of data without any coordination. This coordination is brought by the coher- ence protocol : a set of finite states machines, managing the caches and memory and keeping the coherence invariants true. This master’s thesis aims at introducing cache coherence in details and providing a high- level performance analysis of some state-of-the art protocols. First, shared-memory multipro- cessors are briefly introduced. Then, a substantial bibliographical summary of cache coherence protocol design is proposed. Afterwards, gem5, an architectural simulator, and the way co- herence protocols are designed into it are introduced. A simulation framework adapted to the problematic is then designed to run on the simulator. Eventually, several coherence protocols and their associated memory hierarchies are simulated and analysed to highlight the perfor- mance impact of finer-designed protocols and their reaction faced to qualitative and quantita- tive changes into the hierarchy. Résumé La cohérence des caches est un des principaux défis auxquels il faut faire face lors de la concep- tion d’un système multiprocesseur à mémoire partagée. Une incohérence peut se produire lorsque plusieurs acteurs manipulent le même jeu de données sans aucune coordination.
    [Show full text]
  • A Primer on Memory Consistency and CACHE COHERENCE CONSISTENCY on MEMORY a PRIMER and Cache Coherence Consistency and Daniel J
    Series ISSN: 1935-3235 SORINWOOD •HILL • SYNTHESIS LECTURES ON M Morgan& Claypool Publishers COMPUTER ARCHITECTURE &C Series Editor: Mark D. Hill, University of Wisconsin A Primer on Memory A Primer on Memory Consistency A PRIMER ON MEMORY CONSISTENCY AND CACHE COHERENCE and Cache Coherence Consistency and Daniel J. Sorin, Duke University Mark D. Hill and David A. Wood, University of Wisconsin, Madison Cache Coherence Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence proto-cols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that Daniel J. Sorin must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems. Mark D. Hill David A. Wood About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis MORGAN Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats.
    [Show full text]
  • Parallel System Architectures 2017 — Lab Assignment 1: Cache Coherency —
    Jun Xiao Simon Polstra Institute of Informatics Dr. Andy Pimentel Computer Systems Architecture September 4, 2017 Parallel System Architectures 2017 | Lab Assignment 1: Cache Coherency | Introduction In this assignment you will use SystemC to build a simulator of a Level-1 Data-cache and various implementations of a cache coherency protocol and evaluate their performance. The simulator will be driven using trace files which will be provided. All documents and files for this assignment can be found on the website of the lab listed below. For information about cache coherencies and memory, see Appendix A and B. The framework can be downloaded from: https://staff.fnwi.uva.nl/s.polstra/psa2017/ This framework contains all documentation, the helper library with supporting functions for managing trace files and statistics, the trace files, and a Makefile to automatically compile your assignments. Using the Makefile, the contents of each directory under src/ is compiled automatically as a separate target. It already includes directories and the code of the Tutorial, as well as a piece of example code for Assignment 1. Trace files are loaded through the functions provided by the helper library (psa.h and psa.cpp), which is in the provided framework or can be downloaded separately from the resources mentioned above. Detailed documentation for these functions and classes is provided in Appendix D, but the example code for Assignment 1 also contains all these functions and should make it self-explanatory. Note that, although later on simulations with up to eight processors with caches are required to be built, it is easier if you start with a single CPU organization.
    [Show full text]
  • Cache Coherence Protocols
    Computer Architecture(EECC551) Cache Coherence Protocols Presentation By: Sundararaman Nakshatra Cache Coherence Protocols Overview ¾Multiple processor system System which has two or more processors working simultaneously Advantages ¾Multiple Processor Hardware Types based on memory (Distributed, Shared and Distributed Shared Memory) ¾Need for CACHE Functions and Advantages ¾Problem when using cache for Multiprocessor System ¾Cache Coherence Problem (assuming write back cache) ¾Cache Coherence Solution ¾Bus Snooping Cache Coherence Protocol ¾Write Invalidate Bus Snooping Protocol For write through For write back Problems with write invalidate ¾Write Update or Write Invalidate? A Comparison ¾Some other Cache Coherence Protocols ¾Enhancements in Cache Coherence Protocols ¾References Multiple Processor System A computer system which has two or more processors working simultaneously and sharing the same hard disk, memory and other memory devices. Advantages: • Reduced Cost: Multiple processors share the same resources (like power supply and mother board). • Increased Reliability: The failure of one processor does not affect the other processors though it will slow down the machine provided there is no master and slave processor. • Increased Throughput: An increase in the number of processes completes the work in less time. Multiple Processor Hardware Bus-based multiprocessors Why do we need cache? Cache Memory : “A computer memory with very short access time used for storage of frequently used instructions or data” – webster.com Cache memory
    [Show full text]
  • Cache Coherence and Mesi Protocol
    Cache Coherence And Mesi Protocol Multijugate and Cairene Ernest inwrapped her handstands superstructs painstakingly or shent overnight, is Kelvin lily-white? Unstooping and undissociated Carlton undercoats her photocells voyageurs plume and rebounds cagily. Maneuverable and radiophonic Rusty secludes while semipalmate Hodge overtopping her Kenny unmeritedly and menace therefrom. Disadvantage of coherence and other cached copies of its cache coherence transactions caused by intel does not need to cache. Fill in portable table this with the states of the cache lines at your step. PDF Teaching the cache memory coherence with the MESI. Coherence and the shared bus of the SMP system only looks at the types of. Two processors P1 and P2 and uniform memory are connected to a shared bus which implements the MESI cache coherency protocol. Protokoll wurde zuerst von forschern der caches and cache coherency protocol is cached content and more. And vent are many. This makes directories smaller and disgrace can be clocked faster. What chance a Cache Coherence Problem? NoC-Based Support of Heterogeneous Cache-Coherence. When next to a shared location the related coherent cache line is invalidated in grey other caches. Write-invalidate protocols Based on the assumption that shared data as likely always remain shared Basic protocol similar to MESI but. MOESI protocol is slower than MESI protocol as it handles lesser number of requests in the same perk as compared to MESI protocol, which is caused by that fact that MOESI takes more cycles to input a group or write transaction. Controller and mesi protocol to cache coherence issue in a previous write cache discards a vigenere matrix? The universe present possess the cache is a cucumber data.
    [Show full text]
  • PAP Advanced Computer Architectures 1 Content
    Advanced Computer Architectures Multiprocessor systems and memory coherence problem Czech Technical University in Prague, Faculty of Electrical Engineering Slides authors: Michal Štepanovský, update Pavel Píša B4M35PAP Advanced Computer Architectures 1 Content • What is cache memory? • What is SMP? • Other multi-processor systems? UMA, NUMA, aj. • Consistence and coherence • Coherence protocol • Explanation of states of cache lines B4M35PAP Advanced Computer Architectures 2 Multiprocessor systems Change of meaning: Multiprocessor systems = system with multiple processors. Toady term processor can refer even to package/silicon with multiple cores. Software point of view (how programmer seems the system): • Shared memory systems - SMS. Single operating system (OS, single image), Standard: OpenMP, MPI (Message Passing Interface) can be used as well. • Advantages: easier programming and data sharing • Distributed memory systems - DMS: communication and data exchange by message passing. The unique instance of OS on each node (processor, a group of processors - hybrid). Network protocols, RPC, Standard: MPI. Sometimes labeled as NoRMA (No Remote Memory Access) • Advantages: less HW required, easier scaling • Often speedup by Remote Direct Memory Access (RDMA), Remote Memory Access (RMA), i.e., for InfiniBand B4M35PAP Advanced Computer Architectures 3 Multiprocessor systems Hardware point of view: • Shared memory systems – single/common physical address- space – SMP: UMA • Distributed memory system – memory physically distributed to multiple nodes, address-space private to the node (cluster) or global i.e., NUMA, then more or less hybrid memory organization Definitions: • SMP – Symmetric multiprocessor – processors are connected to the central common memory. Processors are identical and „access“ memory and the rest of the system same way (by same address, port, instructions).
    [Show full text]
  • Design Options for Small Scale Shared Memory Multiprocessors
    DESIGN OPTIONS FOR SMALL SCALE SHARED MEMORY MULTIPROCESSORS by Luiz André Barroso _____________________ A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) December, 1996 Copyright 1996 Luiz André Barroso i to Jacqueline Chame ii Acknowledgments During my stay at USC I have had the previlege of interacting with a number of people who have, in many different and significant ways, helped me along the way. Out of this large group I would like to mention just a few names. Koray Oner, Krishnan Ramamurthy, Weihua Mao, Barton Sano and Fong Pong have been friends and colleages in the every day grind. Koray and Jaeheon Jeong and I spent way too many sleepless nights together building and debugging the RPM multiprocessor. Thanks to their work ethic, talent and self-motivation we were able to get it done. I am also thankful to the support of my thesis committee throughout the years. Although separated from them by thousends of miles, my family has been very much present all along, and I cannot thank them enough for their love and support. The Nobrega Chame family has been no less loving and supportive. My friends, PC and Beto, have also been in my heart and thoughts despite the distance. I am indebted to the people at Digital Equipment Western Research Laboratory for offering me a job in a very special place. Thanks in particular to Joel Bartlett, Kourosh Gharachorloo and Marco Annaratone for reminding me that I had a thesis to finish when I was imersed in a lot of other fun stuff.
    [Show full text]