Testing Memory Consistency of Shared-Memory Multiprocessors
Total Page:16
File Type:pdf, Size:1020Kb
TESTING MEMORY CONSISTENCY OF SHARED-MEMORY MULTIPROCESSORS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Chaiyasit Manovit June 2006 c Copyright by Chaiyasit Manovit 2006 All Rights Reserved ii Abstract Shared-memory multiprocessors are becoming the dominant architecture for single- chip and multi-chip microprocessor based systems. Shared memory architectures are difficult to design because they must correctly implement the complexity of cache co- herence and a memory consistency model. Memory consistency is a contract between hardware and software that specifies how memory behaves with respect to read and write operations from multiple processors. We address the challenge of correctly implementing a memory consistency model by developing a methodology for testing shared-memory multiprocessors which is com- posed of three steps: generating pseudo-random multithreaded programs, executing these programs on a system under test, and checking their compliance with the given memory consistency model. Although the last step is known to be an NP-complete problem, we develop a suite of novel algorithms that work efficiently in practice. Us- ing these algorithms, our methodology has found hundreds of bugs during design and verification of several commercial-graded processors. Many of these bugs are subtle and could not have been detected otherwise. We also successfully apply our methodology to transactional memory, an emerging architecture that can significantly improve programmability while preserving or even enhancing efficiency of the memory system. v vi Acknowledgments First and foremost, I would like to thank all my dissertation committee members, Oyekunle Olukotun, Giovanni De Micheli, and Robert Cypher, all of whom are truly great advisers and mentors. Without the generous support from Giovanni De Micheli, my Ph.D. pursuit may not have even started. His optimism also encouraged me to welcome changes when my research interest began to shift, which resulted in my joining Sun Microsystems and switching to Olukotun’s group. At Sun, I am grateful to Robert Cypher for his exceptional expertise and the inspiration which saw me navigate through the research in verifying memory consistency and related concepts. Oyekunle Olukotun helped connect my work to a trendy research topic, and it is with his vision and support that I was eventually able to reach this final milestone. I would also like to thank Bernard Widrow who graciously served as my orals com- mittee chairman. The team at Sun were also a great source of support. Sudheendra Hangal was practically my fourth adviser, with many interesting questions and ideas often bounc- ing between us. In particular, the following people have made Sun one of my best experiences: Durgam Vahia, Sridhar Narayanan, Gopal Reddy, Aleksandr Gert, and Juin-Yeu Joseph Lu. With De Micheli’s group, I received financial support from Stanford’s Electri- cal Engineering Department, the Microelectronics Advanced Research Corporation (MARCO), and the National Science Foundation (NSF). I am thankful for the guidance from many of his former students, especially Jim Smith, Luca Benini, Tajana Simunic, and Yung-Hsiang Lu. I also enjoyed the friendships with other vii group members and visitors, particularly Armita Peymandoust, Terry Tao Ye, Luc Semeria, Eui-Young Chung, Davide Bertozzi, and Srinivasan Murali. The following people helped improve the quality of this thesis one way or an- other: Christoforos Kozyrakis, Hassan Chafi, Austen McDonald, John Davis, and David Lande. I am also appreciative for the administrative support from Kathleen DiTommaso, Evelyn Ubhoff, and Darlene Hadding. Finally, I would like to thank my friends and family for fulfilling my life outside school and work, with special thanks to my parents for their constant enthusiasm in providing me as best an education as they could. viii Contents Abstract v Acknowledgments vii 1 Introduction 1 1.1Motivation................................. 1 1.1.1 Shared-MemoryMultiprocessors................. 1 1.1.2 MemoryConsistencyModels................... 2 1.1.3 VerifyingShared-MemoryMultiprocessors........... 3 1.2ThesisContributions........................... 4 1.3ThesisOrganization............................ 5 2 Memory Consistency Models 7 2.1SequentialConsistency.......................... 8 2.2SpecificationofSequentialConsistency................. 11 2.2.1 MemoryOperations........................ 11 2.2.2 Orders............................... 11 2.2.3 Axioms.............................. 12 2.3RelaxingSequentialConsistency..................... 13 2.3.1 RelaxingtheWriteAtomicity.................. 14 2.3.2 RelaxingtheProgramOrder................... 15 2.3.3 RelaxingtheValueSemantics.................. 16 2.4SpecificationsofRelaxedMemoryModels................ 17 2.4.1 TotalStoreOrder(TSO)..................... 17 ix 2.4.2 ProcessorConsistency(PC)................... 19 2.4.3 RelaxedMemoryOrder(RMO)................. 22 2.4.4 OtherRelaxedMemoryModels................. 24 2.5RelatedWork............................... 25 3 TSOtool: A Testing Methodology 26 3.1OverviewofTSOtool........................... 27 3.2TSOtoolOperation............................ 28 3.2.1 TestGeneration.......................... 29 3.2.2 TestRun............................. 31 3.2.3 Analysis.............................. 32 3.2.4 Debug............................... 33 3.3RelatedWork............................... 34 4 Algorithms for Verifying Memory Consistency 36 4.1TheProblems............................... 37 4.1.1 TheVTSOProblem....................... 39 4.1.2 TheVTSO-readProblem..................... 40 4.1.3 TheVTSO-conflictProblem................... 41 4.2BaselineAlgorithms............................ 41 4.2.1 AlgorithmforVTSO-conflict................... 42 4.2.2 BaselineAlgorithmforVTSO-read............... 45 4.2.3 VTSO-readExample....................... 46 4.3OptimizationsforVTSO-read...................... 48 4.3.1 VectorClocks........................... 48 4.3.2 Transitivity............................ 51 4.3.3 OptimizedBaselineAlgorithmforVTSO-read......... 52 4.4IncompletenessofBaselineAlgorithmforVTSO-read......... 54 4.5CompleteAlgorithmforVTSO-read................... 57 4.5.1 Heuristic for Topological Sort (Heu) .............. 57 4.5.2 Deriving Edges During Topological Sort (Deriv)........ 58 4.5.3 Backtracking (Heu+Back, Deriv+Back) ............ 59 x 4.6CharacterizationofAlgorithmsforVTSO-read............. 60 4.7RelatedWork............................... 65 5 Transactional Memory 67 5.1MotivationforTransactionalMemory.................. 67 5.2FlavorsofTransactionalMemory.................... 68 5.3FormalSpecificationofTransactionalMemory............. 71 5.4TransactionalMemoryVerification................... 74 5.4.1 TestGeneration.......................... 74 5.4.2 Analysis.............................. 75 5.4.3 AnalysisAlgorithms....................... 76 5.4.4 Example.............................. 79 5.4.5 CharacterizationofAlgorithmsforVTM-read......... 79 5.5RelatedWork............................... 83 6Results 85 6.1TestingTSOImplementations...................... 85 6.2TestingTMImplementations...................... 91 6.3Summary................................. 95 7 Conclusions and Future Work 96 7.1ThesisSummary............................. 97 7.2FutureDirections............................. 98 A Equivalence of Definitions of the Atomicity Axiom 100 Bibliography 102 xi List of Tables 4.1AsummaryofcomplexitiesofVSCproblems.............. 38 4.2 Baseline analysis time and slowdown ratio of Deriv+Back. ...... 64 6.1ClassificationofbugsfoundbyTSOtoolonvariousprocessors..... 86 6.2BugsfoundbyTSOtoolinvariousfunctionalareas........... 87 xii List of Figures 1.1 A programmer’s model of a simple shared-memory multiprocessor. 2 2.1AconceptualmodelofSequentialConsistency(SC)........... 9 2.2Examplesofexecutionresults....................... 10 2.3ExamplesofrelaxingtheSCrequirements................ 15 2.4AconceptualmodelofTotalStoreOrder(TSO)............ 18 2.5AconceptualmodelofProcessorConsistency(PC)........... 20 3.1TSOtoolusageflow............................ 28 4.1Anexecutionresultexample........................ 37 4.2BaselinealgorithmforVTSO-read.................... 47 4.3AnexecutionresultwhichviolatesTSO................. 49 4.4Vectorclocksexample........................... 50 4.5 Optimized baseline algorithm for VTSO-read (rules R6 and R7). 53 4.6Examplesofincompleteness........................ 56 4.7 Effectiveness of Heu and Deriv in finding valid TOO’s. ........ 62 4.8 Analysis time of Baseline and Deriv+Back forVTSO-read....... 63 5.1Producer-consumerexample........................ 80 5.2 Analysis time of Deriv+Back and its slowdown ratio for VTM-read. 82 6.1ExamplesofUltraSPARCbugsfoundbyTSOtool........... 89 6.2 Importance of the Atomicity enforcement. ............... 91 6.3ExamplesofTCCbugsfoundbyTSOtool................ 93 xiii xiv Chapter 1 Introduction 1.1 Motivation Although microprocessor performance has been growing at an exponential rate as suggested by Moore’s law, there is always demand for large computing capacity be- yond what can readily be provided by a single processor, even the most advanced one, hence a need for multiprocessor systems. Furthermore, striving to achieve ever higher performance, the industry has now turned more toward dual-core or multi- core designs as these are proving to be a better way to utilize