Directed Test Generation for Validation of Cache Coherence Protocols Yangdi Lyu, Xiaoke Qin, Mingsong Chen, Member, IEEE and Prabhat Mishra, Senior Member, IEEE

Total Page:16

File Type:pdf, Size:1020Kb

Directed Test Generation for Validation of Cache Coherence Protocols Yangdi Lyu, Xiaoke Qin, Mingsong Chen, Member, IEEE and Prabhat Mishra, Senior Member, IEEE IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. NN, MMM YYYY 1 Directed Test Generation for Validation of Cache Coherence Protocols Yangdi Lyu, Xiaoke Qin, Mingsong Chen, Member, IEEE and Prabhat Mishra, Senior Member, IEEE Abstract—Computing systems utilize multi-core processors Since all possible behaviors of the cache blocks in a system with complex cache coherence protocols to meet the increasing with n cores can be defined by a global finite state machine need for performance and energy improvement. It is a major (FSM), the entire state space is the product of n cache block challenge to verify the correctness of a cache coherence protocol since the number of reachable states grows exponentially with level FSMs. Although the FSM of each cache controller is the number of cores. In this paper, we propose an efficient test easy to understand, the structure of the product FSM for generation technique, which can be used to achieve full state modern cache coherence protocols usually have quite obscure and transition coverage in simulation based verification for a structures that are hard to analyze. Clearly, it is inefficient to wide variety of cache coherence protocols. Based on effective use breadth-first search (BFS) on this product FSM to achieve analysis of the state space structure, our method can generate more efficient test sequences (50% shorter) on-the-fly compared full state or transition coverage, because a large number of with tests generated by breadth-first search. While our on-the- transitions may be unnecessarily repeated, if they are on the fly method can reduce the numbers of required tests by half, it shortest path to many other states. can still be impractical to verify all possible transitions in the Simulation using random and constrained-random tests is presence of large number of cores. We propose scalable on-the- fly test generation techniques using quotient state space. The widely used in industry because of its good scalability. How- proposed approach guarantees selection of important transitions ever, the random nature of test sequences also introduces by utilizing equivalence classes, and omits only similar transi- unacceptable time requirement to cover all possible state tions. Our experimental results demonstrate that our proposed transitions in modern cache coherence protocols with many approaches can efficiently trade-off between transition coverage cores. Directed tests, on the other hand, are promising to and validation effort. achieve high coverage with a drastically small number of tests Index Terms—Cache coherence, quotient space, test genera- [2]. Therefore, they can be applied in addition to random tests tion, verification. to further improve the chances of capturing potential bugs. However, directed test generation is not practical in this case I. INTRODUCTION since the time and memory requirements can be prohibitive. Therefore, it is desirable to have an on-the-fly test generator YSTEM designers incorporate multi-core processors to with a space- and time-efficient test generation algorithm. S meet the increasing performance requirements. To address In this paper, we propose an on-the-fly test generation the memory bottleneck, caching has been the most effec- technique for cache coherence protocols by analyzing the state tive approach to reduce the memory access time for several space structure of their corresponding global FSMs. Instead decades. When the same data is cached by different processors, of using structure-independent BFS to obtain directed tests, cache coherence protocols are employed to guarantee that a we show that complex state space can be decomposed into read always returns most recently written data. Due to the several components with simple structures. Since the activation power wall encountered by single core architectures, more of states and transitions can be viewed as a path searching and more cores are integrated into the same chip to boost problem in the state space, these decomposed components with the performance. As a result, the modern cache coherence known structures can be exploited for efficient test generation. protocols, like MOESI in AMD [1], are becoming quite Our contributions in this paper are: complex. Unfortunately, since the reachable protocol state space grows exponentially with the number of processing units 1) We develop a graphical state space description of several (cores) and states, the verification teams are facing significant commonly used cache coherence protocols, which can challenges to achieve the required coverage within tight time- be viewed as a composition of simple structures [3], and to-market window. present an on-the-fly directed test generation algorithm based on Euler tour [4]. This work was partially supported by the NSF grant CNS-1441667, Natural 2) We propose an efficient quotient space based test gen- Science Foundation of China 61672230 and Cisco. eration approach to address the scalability concerns in Y. Lyu and P. Mishra are with the Department of Computer and Information Science and Engineering at the University of Florida, Gainesville FL 32611- existing test generation techniques. The proposed ap- 6120, USA. proach utilizes the symmetric structure of protocol state E-mail: lvyangdi@ufl.edu space, which enables designers to cover important state X. Qin is an architect with Nvidia. E-mail: [email protected]. transitions within limited verification budget. M. Chen is with the Shanghai Key Laboratory of Trustworthy Computing at East China Normal University, Shanghai 200062, China. The rest of the paper is organized as follows. Section II E-mail: [email protected]. introduces related works. Section III provides background on 2 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. NN, MMM YYYY cache coherence protocols. Section IV presents our on-the-fly state space to representatives, verification techniques can be test generation algorithms. Section V proposes scalable on-the- used to deal with large number of states. Clarke et al. [20] fly test generation using quotient space. Experimental results and Emerson [21] exploited symmetry reduction techniques are presented in Section VI. Finally, Section VII concludes the in model checking. Kamkin [22] address state exploration paper. problem by projecting state space to a number of subspace. However, to the best of our knowledge, quotient space was never utilized to improve test generation and validation of II. RELATED WORK cache coherence protocols. Researchers have designed many cache coherence proto- cols for different platforms and architectures, such as MSI, III. BACKGROUND AND MOTIVATION MESI, MOSI, MOESI, MESIF [5], MEUSI [6] and many In modern computer systems, each processing unit usually other variations. As these protocols are becoming more and maintains its local copy of the main memory, or cache for more complex, validating the correctness of these protocols fast access. One major problem of caching is that when the also becomes challenging. Existing cache coherence protocol same data, memory block, is cached in two or more different validation techniques can be broadly classified into two cat- places, any modification should be propagated to all the cached egories: formal verification and simulation based validation. copies. Cache coherence protocols are used to define the Formal methods using model checking can prove mathemat- correct behavior of each cache controller. ically whether the description of certain protocol violates the required property. For example, Mur' [7] was used to Eviction Other ST verify various cache coherence protocols based on explicit IS model checking. Symbolic model checking tools are also Self LD Self LD developed for coherence verification. For example, the veri- Other LD Other LD fication problem with parameterized cache coherence protocol Self ST is investigated by Emerson et al. [8] and Li et al. [9]. Fractal Eviction Self ST coherence [10] [11] and PVCoherence [12] enable the scalable Other ST verification of a family of properly designed coherence proto- cols. Deadlock detection techniques [13] [14] are designed to LD=Load M Self LD ST=Store automatically detect deadlock in cache coherence protocols. Although formal methods can guarantee the correctness of a Fig. 1. State transitions for a cache block in MSI protocol. design, they usually require that the design should be described in certain input languages. As a result, it is usually difficult One of the simplest cache coherence protocol is the MSI to apply model checking on implementations directly. More- snoopy protocol [23]. The behavior of the cache controller in a over, manual translation (implementation to formal language) processing unit is modeled as an FSM (Figure 1). The state of associated abstractions may introduce errors. a cache block (line) can be either “Invalid”(I), “Modified”(M), Simulation based approaches, on the other hand, are able or “Shared”(S). At the beginning, all cache blocks are in to handle designs at different abstraction levels and therefore the invalid state. When a load request (Self LD) arrives, the widely used in practice. For example, Wood et al. [15] used cache controller requests the data from the main memory and random tests to verify the memory subsystem of SPUR ma- switches to shared state. When the core issues a store request chine. Genesys Pro test generator [16] from IBM extended this (Self ST), the cache controller first broadcasts an invalidate direction with complex and
Recommended publications
  • Page 1 Cache Coherence in More Detail
    What is (Hardware) Shared Memory? • Take multiple microprocessors ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) • Implement a memory system with a single global physical address space (usually) Shared Memory MPs – Coherence & Snooping – Communication assist HW does the “magic” of cache coherence Copyright 2006 Daniel J. Sorin • Goal 1: Minimize memory latency Duke University – Use co-location & caches Slides are derived from work by • Goal 2: Maximize memory bandwidth Sarita Adve (Illinois), Babak Falsafi (CMU), – Use parallelism & caches Mark Hill (Wisconsin), Alvy Lebeck (Duke), Steve Reinhardt (Michigan), and J. P. Singh (Princeton). Thanks! (C) 2006 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 3 Outline Some Memory System Options • Motivation for Cache-Coherent Shared Memory P1 Pn Switch P P1 n (Interleaved) First-level $ • Snooping Cache Coherence (Chapter 5) $ $ – Basic systems Bus (Interleaved) – Design tradeoffs Main memory Mem I/O devices • Implementing Snooping Systems (Chapter 6) (a) Shared cache (b) Bus-based shared memory P1 Pn P • Advanced Snooping Systems P1 n $ $ $ $ Mem Mem Interconnection network Interconnection network Mem Mem (c) Dancehall (d) Distributed-memory (C) 2006 Daniel J. Sorin from Adve, (C) 2006 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 2 Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 4 Page 1 Cache Coherence In More Detail • According to Webster’s dictionary … • Efficient
    [Show full text]
  • A Simulation Framework for Evaluating Location Consistency Based Cache
    LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS by Pouya Fotouhi A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Spring 2017 c 2017 Pouya Fotouhi All Rights Reserved LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS by Pouya Fotouhi Approved: Guang R. Gao,Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved: Kenneth E. Barner, Ph.D. Chair of the Department of Electrical and Computer Engineering Approved: Babatunde A. Ogunnaike, Ph.D. Dean of the College of Engineering Approved: Ann L. Ardis, Ph.D. Senior Vice Provost for Graduate and Professional Education ACKNOWLEDGMENTS I would like to thank Professor Gao for giving me the opportunity of joining CAPSL and multi-dimensional learning experience. With special thanks to Dr. St´ephaneZuckerman for guiding me step by step over the research, and my colleague Jose Monsalve Diaz for deep discussions and his technical help. Very special thanks to my wife Elnaz , and also my parents for their support and love. iii TABLE OF CONTENTS LIST OF FIGURES ::::::::::::::::::::::::::::::: vi ABSTRACT ::::::::::::::::::::::::::::::::::: ix Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 2 BACKGROUND ::::::::::::::::::::::::::::::: 4 2.1 An Introduction to Memory Consistency Models :::::::::::: 5 2.1.1 Uniform Memory Consistency Models :::::::::::::: 6 2.1.1.1 Sequential Consistency
    [Show full text]
  • Models for Addressing Cache Coherence Problems 9 INTERNATIONAL JOURNAL of NATURAL and APPLIED SCIENCES (IJNAS), VOL
    Models for addressing cache coherence problems 9 INTERNATIONAL JOURNAL OF NATURAL AND APPLIED SCIENCES (IJNAS), VOL. 12, Special Edition (2019); P. 9 – 14, 1 TABLE, 4 FIG. Models for addressing cache coherence problems in multilevel caches: A review paper * I. I. Arikpo and A. T. Okoro ABSTRACT The computational tasks of the 21st century require systems with high processing speed. This can be achieved by having, among other system specifications, a high capacity microprocessor and large cache size in a single processor system. The cache prevents round-trips to RAM for every memory access. Nevertheless, dependence on a single cache for both data and instructions limits the performance of the system regardless of the capacity of the processor and cache size. Multilevel cache is a technological advancement that has made it possible for a multi-core processor system to have more than one cache. The introduction of multilevel cache technology no doubt speeds up system operations and enhances performance, but is also not without limitations. Cache coherence issues are common problems inherent in this design. Different protocols have been designed to ameliorate the issues surrounding cache coherence. The aim of this paper is to discuss the efficiency of these protocols in addressing cache coherence problems in multilevel cache technologies. INTRODUCTION instructions and data over a short period of time (Stallings, 2013). With the ever-increasing rate of processor speed, an equivalent Computer hardware consists of many components including the increase in cache memory is necessary to ensure that data is timely memory unit, which is primarily used for information storage.
    [Show full text]
  • A Primer on Memory Consistency and CACHE COHERENCE CONSISTENCY on MEMORY a PRIMER and Cache Coherence Consistency and Daniel J
    Series ISSN: 1935-3235 SORINWOOD •HILL • SYNTHESIS LECTURES ON M Morgan& Claypool Publishers COMPUTER ARCHITECTURE &C Series Editor: Mark D. Hill, University of Wisconsin A Primer on Memory A Primer on Memory Consistency A PRIMER ON MEMORY CONSISTENCY AND CACHE COHERENCE and Cache Coherence Consistency and Daniel J. Sorin, Duke University Mark D. Hill and David A. Wood, University of Wisconsin, Madison Cache Coherence Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence proto-cols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that Daniel J. Sorin must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems. Mark D. Hill David A. Wood About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis MORGAN Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats.
    [Show full text]
  • Cache Coherence Protocols
    Computer Architecture(EECC551) Cache Coherence Protocols Presentation By: Sundararaman Nakshatra Cache Coherence Protocols Overview ¾Multiple processor system System which has two or more processors working simultaneously Advantages ¾Multiple Processor Hardware Types based on memory (Distributed, Shared and Distributed Shared Memory) ¾Need for CACHE Functions and Advantages ¾Problem when using cache for Multiprocessor System ¾Cache Coherence Problem (assuming write back cache) ¾Cache Coherence Solution ¾Bus Snooping Cache Coherence Protocol ¾Write Invalidate Bus Snooping Protocol For write through For write back Problems with write invalidate ¾Write Update or Write Invalidate? A Comparison ¾Some other Cache Coherence Protocols ¾Enhancements in Cache Coherence Protocols ¾References Multiple Processor System A computer system which has two or more processors working simultaneously and sharing the same hard disk, memory and other memory devices. Advantages: • Reduced Cost: Multiple processors share the same resources (like power supply and mother board). • Increased Reliability: The failure of one processor does not affect the other processors though it will slow down the machine provided there is no master and slave processor. • Increased Throughput: An increase in the number of processes completes the work in less time. Multiple Processor Hardware Bus-based multiprocessors Why do we need cache? Cache Memory : “A computer memory with very short access time used for storage of frequently used instructions or data” – webster.com Cache memory
    [Show full text]