Verification of Hierarchical Cache Coherence Protocols for Future Processors

VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURE PROCESSORS by Xiaofang Chen A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah May 2008 Copyright c Xiaofang Chen 2008 All Rights Reserved THE UNIVERSITY OF UTAH GRADUATE SCHOOL SUPERVISORY COMMITTEE APPROVAL of a dissertation submitted by Xiaofang Chen This dissertation has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair: Ganesh L. Gopalakrishnan Steven M. German Ching-Tsun Chou John B. Carter Rajeev Balasubramonian THE UNIVERSITY OF UTAH GRADUATE SCHOOL FINAL READING APPROVAL To the Graduate Council of the University of Utah: I have read the dissertation of Xiaofang Chen in its final form and have found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to The Graduate School. Date Ganesh L. Gopalakrishnan Chair: Supervisory Committee Approved for the Major Department Martin Berzins Chair/Director Approved for the Graduate Council David S. Chapman Dean of The Graduate School ABSTRACT The advancement of technology promises to make chip multiprocessors or multicores ubiquitous. With multicores, there naturally exists a memory hierarchy across which caches have to be kept coherent. Currently, large (hierarchical) cache coherence protocols are verified at either the high (specification) level or at the low (RTL implementation) level. Experiences show that model checking at the high level can detect and eliminate almost all the high level concurrency bugs. Most RTL implementations are too complex to be verified monolithically, and simulation, semi-formal, or formal techniques are often used. With hierarchical cache coherence protocols, there exist two unsolved problems: (i) handle the complexity of several coherence protocols running concurrently, and (ii) verify that the RTL implementations correctly implement the specifications. To solve the first problem, we propose to use assume guarantee reasoning to decom- pose a hierarchical coherence protocol into a set of abstract protocols. The approach is conservative, in the sense that by verifying these abstract protocols, the original hierarchical protocol is guaranteed to be correct with respect to its properties. For the second problem, we propose a formal theory to check the refinement relationship, i.e. under which conditions can we claim that a RTL implementation correctly implements a high level specification. We also propose a compositional approach using abstraction and assume guarantee reasoning to reduce the verification complexity of refinement check. Finally, we propose to extend existing work in industry to mechanize the refinement check. We propose research that will lead to a characterization of the proposed mechanisms through several verification tools and protocol benchmarks. Our preliminary experiments show promising results. For high level modeling and verification, we show that for three hierarchical protocols with different features which we developed for multiple chip-multiprocessors, more than 95% of the explicit state space can be reduced. For refinement check, we show that for a driving coherence protocol example with real- istic hardware features, refinement check can find subtle bugs which are easy to miss by checking coherence properties alone. Furthermore, we show that for the protocol example, our compositional approach can finish the refinement check within 30 minutes while a state-of-art verification tool in industry cannot finish in over a day. Finally, we extend a hardware language and a tool, to mechanize the refinement check process. We believe that the proposed research will help solve some of the problems for future processors. v CONTENTS ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : iv LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : viii LIST OF TABLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix CHAPTERS 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Verification of Cache Coherence Protocols . 3 1.2 Problem Statement . 4 1.3 Thesis Statement . 6 1.4 Contributions . 7 1.5 Outline . 8 2. BACKGROUND AND RELATED WORK : : : : : : : : : : : : : : : : : : : : : : : : 9 2.1 Cache Coherence States . 9 2.1.1 The MSI Protocol . 9 2.1.2 The MESI Protocol . 10 2.2 Common Cache Coherence Protocols . 11 2.2.1 Snooping Protocols . 12 2.2.2 Directory-based Coherence Protocols . 13 2.2.3 Token Coherence Protocols . 14 2.3 Inclusive, Exclusive and Non-inclusive Caching Hierarchies . 15 2.4 Common Coherence Properties to Be Verified . 16 2.5 Related Work . 17 2.5.1 Verifying Hierarchical Cache Coherence Protocols in the High Level 17 2.5.2 Checking Refinement between Protocol Specifications and Imple- mentations . 19 3. VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS IN THE HIGH LEVEL SPECIFICATIONS : : : : : : : : : : : : : : : : : : : : : : : 23 3.1 Benchmarking Hierarchical Coherence Protocols . 23 3.1.1 The First Benchmark: An Inclusive Multicore Coherence Protocol 23 3.1.2 The Second Benchmark: A Non-inclusive Multicore Protocol . 27 3.1.3 The Third Benchmark: A Multicore Protocol With Snooping . 29 3.2 A Compositional Framework to Verify Hierarchical Coherence Protocols 30 3.2.1 Abstraction . 32 3.2.2 Assume Guarantee Reasoning . 38 3.2.3 Experimental results . 41 3.2.4 Soundness Proof . 42 3.3 Verification of Hierarchical Coherence Protocols One Level At A Time . 45 3.3.1 Building Abstract Protocols One Level At A Time . 47 3.3.2 When Can “Per-Level” Abstraction Be Applied? . 50 3.3.3 Using History Variables in Assume Guarantee Reasoning . 52 3.3.4 Experimental Results . 55 3.3.5 Soundness Proof . 56 3.4 Summary . 57 4. CHECKING REFINEMENT BETWEEN HIGH LEVEL PROTOCOL SPECIFICATIONS AND RTL IMPLEMENTATIONS : : : : : : : : : : : : : : : 59 4.1 Some Preliminaries to The Formal Definition of Refinement . 60 4.2 Checking Refinement . 64 4.3 Monolithic Model Checking . 65 4.3.1 Checking Correct Implementation . 67 4.4 Compositional Model Checking . 71 4.4.1 Abstraction . 72 4.4.2 Assume Guarantee Reasoning . 78 4.5 An Implementation of the Compositional Model Checking . 83 4.6 Muv: A Transaction Based Refinement Checker . 85 4.6.1 Hardware Murphi . 86 4.6.2 An Overview of Muv . 88 4.6.3 Assertion Generation for Refinement Check . 89 4.6.4 Experimental Benchmarks . 95 4.6.5 Experimental Results . 96 4.7 Summary . 98 5. CONCLUSION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100 REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 102 vii LIST OF FIGURES 2.1 A CMP with two levels of caches. 15 3.1 A 2-level hierarchical cache coherence protocol. 24 3.2 A scenario that can lead to the livelock problem. 25 3.3 A sample scenario of the hierarchical protocol. 26 3.4 Imprecise information of the local directory. 28 3.5 A multicore cache coherence protocol with snooping. 30 3.6 The workflow of our approach. 31 3.7 An abstract protocol. 32 3.8 The data structures of an original and an abstract cluster. 34 3.9 The Murphi rule for the writeback example. 37 3.10 Refine the writeback example. 40 3.11 A sample scenario of the genuine bug in the original hierarchical protocol. 41 3.12 Verification complexity of the hierarchical inclusive protocol. 42 3.13 The abstract protocols obtained via the new decomposition approach. 47 3.14 An abstracted cluster in the intra- and inter-cluster protocols. 48 3.15 Example rules of tightly and loosely coupled modeling. 52 3.16 The writeback example before/after abstraction, and during refinement. 53 3.17 Verification complexity on the three benchmark protocols. 56 4.1 The executions of MMC , ML, and MH . 69 4.2 The proposed workflow of the refinement check. 86 4.3 An example of transaction. 87 4.4 An Overview of the implementation protocol. 96 LIST OF TABLES 2.1 MSI state transitions . 10 2.2 MESI state transitions . 11 CHAPTER 1 INTRODUCTION Multicore architectures are considered inevitable, given the limitations of sequential processing hardware has hit various limits. Two classes of related protocols will be used in multicore processors: cache coherence protocols that manage the coherence of individ- ual cache lines, and shared memory consistency protocols that manage the view across multiple addresses. These protocols will also have non-trivial relationships with other protocols, e.g. hardware transaction memory protocols [22,87], and power management protocols [32]. For example, when a hardware transaction wins and commits, the cache lines affected by the losing transaction must all, typically, be invalidated. We will for now consider only coherence protocols, assuming the existence of suitable methods to decouple and verify related protocols such as hardware transaction memory protocols. With multicores, there naturally exists a memory hierarchy: the L1 cache of each core, the L2 and/or the L3 caches, the memory shared within a chip, the memory shared among the chips, etc. While there will be many differences from one

Verification of Hierarchical Cache Coherence Protocols for Future Processors

Page 1 Cache Coherence in More Detail

Introduction and Cache Coherency

Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems

A Simulation Framework for Evaluating Location Consistency Based Cache

Verification of Hierarchical Cache Coherence Protocols for Futuristic Processors

Study and Performance Analysis of Cache-Coherence Protocols in Shared-Memory Multiprocessors

A Primer on Memory Consistency and CACHE COHERENCE CONSISTENCY on MEMORY a PRIMER and Cache Coherence Consistency and Daniel J

Parallel System Architectures 2017 — Lab Assignment 1: Cache Coherency —

Cache Coherence Protocols

Cache Coherence and Mesi Protocol

PAP Advanced Computer Architectures 1 Content

Design Options for Small Scale Shared Memory Multiprocessors