Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip by Doowon Lee A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in the University of Michigan 2018 Doctoral Committee: Professor Valeria M. Bertacco, Chair Assistant Professor Reetuparna Das Professor Scott Mahlke Associate Professor Zhengya Zhang Doowon Lee [email protected] ORCID iD: 0000-0003-0046-7746 c Doowon Lee 2018 For my wife, Ghaeun ii ACKNOWLEDGMENTS I would like to express my deepest gratitude to my advisor, Professor Valeria Bertacco. She has affected me in many aspects. She is the most enthusiastic person I have ever met; she always inspires me to achieve more than I could imagine. She has offered an extraordinary research mentorship throughout my Ph.D. journey. She has been consistently available whenever I need her thoughtful advice and keen insight. Also, she has been very patient about my research endeavors, allowing me to navigate different research topics. Her guidance on writing and presentation has been helpful for me to become an outstanding professional after graduation. I also thank my dissertation committee members, Professors Scott Mahlke, Reetuparna Das, and Zhengya Zhang. Their comments on my research projects have been invaluable for me to improve this dissertation. In particular, in-depth discussions with them stimulate me to think out of the box, which in turn strengthens the impacts of the research proposed in the dissertation. Our research group members have helped enjoy a long journey to the degree at the University of Michigan. First of all, I am grateful to Professor Todd Austin for his advice on my research projects. I am also inspired by outstanding senior students in the research group: Andrea Pellegrini, Debapriya Chatterjee, Ritesh Parikh, Rawan Abdel Khalek, and Biruk Mammo. I was fortunate to have opportunities to collaborate with some of them, especially when I just joined the research group, so that I could quickly grow as an inde- pendent researcher. Also, I thank current students in the group: Abraham Addisie, Zelalem Aweke, Misiker Aga, Vidushi Goyal, Timothy Linscott, Pete Ehrett, Andrew McCrabb, Hi- wot Kassa, Mark Gallagher, Lauren Biernacki, Leul Belayneh, Nishil Talati, Colton Holo- day, and Tarunesh Verma. I also thank former members of the research group: Salessawi Yitbarek, Brendan West, and Opeoluwa Matthews. I appreciate the help that I received from staffs in the Computer Science and Engineering department. Among them, I would like to give a special thank to Alice Melloni. The two internships that I had during my study are unforgettable. First of all, I would like to appreciate the great hospitality that I received from the members of IBM Research in Haifa, Israel. In particular, I am grateful to Ronny Morad who offered, arranged, and iii managed my internship. I also thank Vitali Sokhin, Tom Kolan, Arkadiy Morgenshtein, and Avi Ziv for their mentorships. During my internship at Intel, I gained a broad industry experience, not particularly related to the research conducted for this dissertation, from my mentors, Truc Nguyen and Oleksandr Bazhaniuk. My Ph.D. studies in Ann Arbor would have been much harder if I had not provided with financial support. I have been generously supported by the Applications Driving Architec- ture (ADA) Center and the Center for Future Architectures Research (C-FAR), sponsored by Semiconductor Research Corporation (SRC) and DARPA. Also, I have been supported by several fellowships including IBM PhD Fellowship, Rackham Predoctoral Fellowship, and Rackham International Student Fellowship. My wife, Ghaeun Lim, has been supportive to finish my degree at the University of Michigan. With her support, I have been able to better focus on my study and research. Our daughter, Jiyo Lee, makes us smile every day. In addition, I thank my parents and parents-in-law for their endless support. iv TABLE OF CONTENTS DEDICATION ::::::::::::::::::::::::::::::::::: ii ACKNOWLEDGMENTS ::::::::::::::::::::::::::::: iii LIST OF FIGURES :::::::::::::::::::::::::::::::: ix LIST OF TABLES ::::::::::::::::::::::::::::::::: xi LIST OF ALGORITHMS ::::::::::::::::::::::::::::: xii ABSTRACT ::::::::::::::::::::::::::::::::::::: xiii CHAPTER 1. Introduction .................................1 1.1 Systems on Chip . .2 1.2 Functional Verification Challenge . .7 1.3 Reliability Challenge . .9 1.4 Contributions and Outline of This Dissertation . 13 2. Related Work ................................ 16 2.1 Addressing Errors in Microprocessor Cores . 16 2.2 Addressing Errors in Memory Subsystems . 20 2.3 Addressing Errors in On-Chip Networks . 24 2.4 Addressing Errors in Accelerators . 27 3. Probabilistic Bug-Masking Analysis in Instruction Tests ........ 30 3.1 Background and Motivation . 31 3.1.1 Dynamic Bug-Masking Analysis . 33 3.2 BugMAPI Overview . 35 3.3 Baseline Static Information-Flow Bug-Masking Analysis . 36 3.4 Probabilistic Information-Flow Bug-Masking Analysis . 37 v 3.4.1 Bug-Masking through an Instruction . 38 3.4.2 Bug-Masking over an Instruction Sequence . 39 3.5 Experimental Evaluation . 40 3.5.1 Bug Models . 41 3.5.2 Performance Analysis . 41 3.5.3 Bug-Masking Characterization . 42 3.5.4 Accuracy . 43 3.5.5 Discussions . 45 3.6 Application: Masking Reduction . 46 3.7 Related Work . 47 3.8 Summary . 48 4. Efficient Post-Silicon Memory Consistency Validation .......... 49 4.1 Background and Motivation . 50 4.2 MTraceCheck Overview . 53 4.3 Code Instrumentation . 54 4.3.1 Memory-Access Interleaving Signature . 54 4.3.2 Per-thread Signature Size and Decoding . 57 4.4 Collective Graph Checking . 59 4.4.1 Similarity among Constraint Graphs . 59 4.4.2 Re-sorting Topological Order . 61 4.5 Experimental Evaluation . 63 4.5.1 Experimental Setup . 63 4.5.2 Non-determinism in Memory Ordering . 64 4.5.3 Validation Performance . 67 4.5.4 Intrusiveness . 69 4.6 Bug-Injection Case-Studies . 71 4.6.1 Bug Descriptions . 71 4.6.2 Bug-Detection Results . 72 4.7 Insights and Discussions . 73 4.8 Related Work . 75 4.9 Summary . 75 5. Patching Design Bugs in Memory Subsystems .............. 77 5.1 Motivation . 78 5.2 uMemPatch Overview . 79 5.3 Classifying Memory Subsystem Bugs . 81 5.3.1 Root Causes of Memory Subsystem Bugs . 81 5.3.2 Symptoms of Memory Subsystem Bugs . 83 5.3.3 Memory Subsystem Bugs in Open-Source Designs . 84 5.4 Detecting Bug-Prone Microarchitectural State . 84 5.4.1 Collecting Microarchitectural Events . 85 vi 5.4.2 Reducing the Area Overhead of the Bug-Anticipation Module . 86 5.5 Microarchitectural Bug-Elusion . 87 5.5.1 ImmSq & DelSq: Instruction Squash and Re-execution . 87 5.5.2 DelEv: Limiting Data Races by Delaying Cache Eviction 89 5.5.3 DynFc: Limiting Speculation by Dynamic Fence Insertion 89 5.5.4 Bug-Elusion Implementation . 90 5.6 Experimental Evaluation . 90 5.6.1 Experimental Setup . 90 5.6.2 Micro-Benchmark Results . 91 5.6.3 SPLASH-2 Results . 92 5.6.4 Implementation Overheads . 94 5.7 Discussions . 95 5.8 Related Work . 97 5.9 Summary . 98 6. Quick Routing Reconfiguration for On-Chip Networks ......... 100 6.1 Background and Motivation . 101 6.1.1 Avoiding Deadlock by Removing Cyclic Resource De- pendencies . 101 6.1.2 Alternative Route Generation Cost Analysis . 102 6.2 BLINC Overview . 103 6.3 Route Computation . 105 6.3.1 Our Enhanced Segmentation Infrastructure . 105 6.3.2 Routing Metadata . 105 6.3.3 Reconfiguration Process . 106 6.4 Experimental Evaluation . 110 6.4.1 Characterization . 111 6.4.2 Discussion . 112 6.5 Application: Online Fault Detection . 113 6.6 Related Work . 114 6.7 Summary . 116 7. Application-Aware Routing for Faulty On-Chip Networks ....... 117 7.1 Motivation . 118 7.2 FATE Overview . 121 7.3 Turn-Enabling Rules . 122 7.3.1 Turn-Enabling Rules in Faulty Topologies . 125 7.4 Route Computation . 126 7.4.1 Traffic Load Estimation . 126 7.4.2 Iterative Turn-Restriction Decision . 129 7.4.3 Avoiding Illegal Routing Function (Backtracking) . 130 7.5 Experimental Evaluation . 130 vii 7.5.1 Performance on Faulty Networks . 131 7.5.2 Performance on Fault-Free Networks . 134 7.5.3 Overheads . 135 7.6 Related Work . 137 7.7 Summary . 138 8. Test Generation for Verifying Accelerator Interactions ......... 140 8.1 Background and Motivation . 141 8.2 AGARSoC Overview . 142 8.3 Capturing Software Behavior . 143 8.3.1 Analyzing Memory Traces . 144 8.3.2 Identifying Interaction Scenarios . 145 8.3.3 Abstract Representation . 146 8.4 Coverage Model Generation and Analysis . 147 8.5 Test Generation . 149 8.5.1 Generated Test Program Structure . 150 8.5.2 Schedule Generation Algorithm . 150 8.6 Experimental Evaluation . 151 8.7 Discussions . 153 8.8 Related Work . 154 8.9 Summary . 155 9. Conclusion and Future Directions ..................... 156 9.1 Dissertation Summary . 157 9.2 Future Directions . 159 BIBLIOGRAPHY ::::::::::::::::::::::::::::::::: 162 viii LIST OF FIGURES Figure 1.1 Commercial SoCs and a schematic of an SoC example design . .3 1.2 Specification update examples . .6 1.3 Bathtub curve illustrating qualitatively the exposure to faults by a silicon chip over its lifespan . 10 1.4 Three MOSFET wear-out mechanisms . 11 3.1 The bug-masking problem in post-silicon random instruction tests . 32 3.2 Dynamic bug-masking analysis flow . 33 3.3 Overview of BugMAPI probabilistic bug-masking analysis . 35 3.4 Baseline information-flow analysis . 36 3.5 Examples of input-output bug-masking probabilities . 38 3.6 Example of bug-propagation analysis over an instruction sequence . 39 3.7 Bug-masking rate for individual program instructions . 42 3.8 Classification of masked bugs . 43 3.9.

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Shared Memory Multiprocessors

Introduction to Multi-Threading and Vectorization Matti Kortelainen Larsoft Workshop 2019 25 June 2019 Outline

Detecting False Sharing Efficiently and Effectively

A Design Methodology for Soft-Core Platforms on FPGA with SMP Linux

Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

The POWER4 Processor Introduction and Tuning Guide

Multi-Threading for Multi-Core Architectures

Intel Hyper-Threading Technology

PREDATOR: Predictive False Sharing Detection

Embedded Multicore: an Introduction

AMC: Advanced Multi-Accelerator Controller

Coherent Shared Memories for Fpgas by David Woods a Thesis