Specifying, Verifying, and Translating Between Memory Consistency Models

Specifying, Verifying, and Translating Between Memory Consistency Models Daniel Joseph Lustig A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department of Electrical Engineering Adviser: Professor Margaret Martonosi November 2015 © Copyright by Daniel Joseph Lustig, 2015. All rights reserved. Abstract Gone are the days when single-threaded performance was the primary metric of in- terest for a processor design. Driven by increasingly tight power constraints and by the slowing down of Moore's law and Dennard scaling, architects have turned to mul- ticore parallelism and architectural heterogeneity as a means to continue delivering increases in chip performance and power efficiency. Both approaches can deliver im- proved performance per watt characteristics at the expense of creating systems which are dramatically more complex to design and harder to program. In order to share data and coordinate their actions, the compute elements (e.g., CPU cores, GPU cores, accelerators) in modern processors communicate via shared virtual memory. In this model, regardless of the underlying physical reality, cores see the abstraction of a single unified address space shared with the other processing elements in the system. Communication through shared virtual memory takes place according to a memory consistency model|the set of rules and guarantees about the ordering and visibility of memory accesses issued by one processing element and observed by other processing elements. Unfortunately, in practice, hardware consistency model definitions frequently suffer from a lack of formalism, precision, and/or completeness, resulting in numerous situations in which the correct outcome(s) for a given scenario may be ambiguous or even simply undefined. To solve these problems, this thesis develops formal analysis techniques, specification formats, and automated tools that aim to mitigate the problems of incompleteness, imprecision, and/or in- compatibility among memory consistency models. First, this thesis proposes Memory Ordering Specification Tables (MOSTs), a systematic method for fully and explicitly enumerating the memory ordering requirements of axiomatic memory models. The architecture- and model-independent approach used by the MOST format allows for the direct comparison of preserved program order, fences, and other ordering enforcement mechanisms from the same iii or even from different models. In particular, it allows for the direct comparison of models that are weakly-ordered (e.g., ARM, IBM Power, GPUs), strongly-ordered (e.g., sequential consistency, SPARC/x86 Total Store Ordering), or points in between. This thesis also presents the ArMOR framework for systematically analyzing and manipulating MOSTs. As a demonstration of the power of ArMOR, this thesis presents a methodology for automatically and dynamically translating code compiled under the assumptions of one memory model into code which can be executed by a (micro)architecture implementing a different memory model. By analyzing MOSTs for the source and destination architectures, ArMOR analysis can be used to produce optimized, self-contained translation engines called shims. The shim designs can be derived offline and in advance, and they can be easily implemented in software or hardware with manageable performance overhead. Second, this thesis proposes PipeCheck, a framework for verifying the correctness of particular implementations of a given architectural memory model. PipeCheck defines a domain-specific language for specifying memory ordering behavior at the microarchitecture level. This language allows PipeCheck to explicitly specify and verify the behavior of common microarchitectural optimizations such as specula- tive load reordering which are intentionally abstracted away by higher-level models. PipeCheck's fast constraint solver software quantifies in just minutes whether an implementation is stronger than, weaker than, or equal in strength to its memory model requirements. It also reduces a currently intractable problem|verifying reg- ister transfer level (RTL) hardware descriptions against architectural memory model requirements|into a much more tractable task: verifying RTL against a set of locally- scoped, per-pipeline stage ordering properties. This thesis applies PipeCheck at a number of locations within a processor: within the pipeline, within the cache coherence protocol, and at the coherence/consistency interface, and it demonstrates the practicality of rigorously analyzing a broad range of microarchitectural scenarios. iv In summary, this thesis presents architects with new techniques for reducing the complexity of defining memory consistency models and building systems that correctly implement them. In doing so, it narrows the existing gap between performance-first architectural design methodologies and rigorous verification of implementation correctness. As hardware and software complexity and dynamism continue to grow, the techniques developed in this thesis will help designers avoid the memory model bugs that continue to appear in hardware even today, and they will present programmers, compiler writers, and runtime systems with usable, precise descriptions of the memory ordering behaviors of any simple or complex computing system. v Acknowledgements It's hard to believe that my time at Princeton is coming to an end. It was an amazing experience, and I can't help but acknowledge all of the help and friendship I received from so many people over my years here. None of this would have been possible without all of their support and guidance, and I'm going to do my best to thank them all here. First of all, I would like to give enormous thanks to my adviser Margaret Martonosi. I couldn't have asked for anything more from an adviser. She helped guide me through the highs and the lows of my Ph.D., and she always pushed me to improve in all aspects of my career. She was always there for me whenever I needed advice, whether it be technical, professional, personal, or otherwise. She accepted nothing less than my best, and I will be forever grateful for her mentorship. I would also like to thank the rest of my thesis committee: Andrew Appel, Sharad Malik, Michael Pellauer, and David Wentzlaff. Their feedback, suggestions, and mentorship made this thesis possible. I would especially like to thank Michael Pellauer for all of his support and guidance during my internships and during my time back at Princeton. Michael went out of his way to keep our collaborations going strong, and his input really helped my Ph.D. start to take shape. I'm extremely thankful that he was willing to make time for our weekly meetings and frequent emails, even through a change in employers. I'd also like to thank all of the other members of the former VSSAD, especially Joel, Angshu, Aamer, Neal, Michael, Elliott, Bushra, and Mohit. Joel Emer was an excellent mentor to me during my three internships with the group, and I'm grateful for his support during and after my time there. I'm really looking forward to working with many of them again at NVIDIA. Abhishek Bhattacharjee has been an amazing friend and mentor to me since the day I joined MRMGroup. From the beginning, he taught me what it means to be vi a Ph.D. student and an academic, and I'm infinitely better off for having him as someone I can trust and count on for advice. I'm really glad we got to start working together again, and I hope there's more where that came from in the future. In addition, I have to give huge thanks to all of the other members of MRMGroup: Sibren, Carole, Wenhao, Yavuz, Ozlem, Ali, Tae Jun, Logan, Elba, Caroline, Yatin, Themis, Shruti, Daeki, and Geet. They made it fun to be in the office every day, and we shared so many thoughts, laughs, lunches, insights, and stories that it's hard to even remember them all. I also have to give a shoutout to the Malik group members, the Wentzlaff group members, the reading group members, and everyone else with whom I've worked and become friends during my time at Princeton. I've benefited tremendously from other collaborations and interactions as well. Thanks to Kevin Skadron, Kelly Shaw, and Ke Wang for their help during our earlier teleconferences. Thanks also to the staffs of the Electrical Engineering and Computer Science departments for helping me through the logistics of the program. I'd especially like to thank Lori Bailey and Stacey Weber Jackson for being so friendly and helpful with anything I ever asked of them. A very special thank you goes to my wonderful girlfriend, Sunha Ahn. Since that first EE holiday party we attended, we shared so many special and important moments together, at Princeton and around the world. We made each other laugh, we supported each other throughout the bad times, and we celebrated together during the good times. She kept me motivated, and she kept me focused on what is really important in life. I couldn't have made it through my Ph.D. without her. Finally, I would like to extend a huge thank you to everyone on both sides of my family. First and foremost, I have to thank my parents for channeling my interests in math and science into a successful career path, for making the sacrifice to send me to Loyola and to Penn, and for the endless and uninterrupted love and support. I also have to thank so many others: David, Mamá,Papá,Martha, Laura, John, Brad, vii Reagan, Carmen, and the hundreds of other actual and honorary Cubans that were behind me all the way. On the other side, I would like to thank Grandpa, Marion, Josh, Patty, Ashley, Sharon, Jessica, Adam, and Bryan for being there for me as well. I am extremely lucky to have come from such a huge and loving extended family. I am also extremely lucky to have benefited from their decisions to escape bad situations across the world to make better lives for themselves and for their families here in America.

Specifying, Verifying, and Translating Between Memory Consistency Models

Design and VHDL Implementation of an Application-Specific Instruction Set Processor

Comparative Architectures

CS152: Computer Systems Architecture Pipelining

Quantum Chemical Computational Methods Have Proved to Be An

Design and Implementation of a Multithreaded Associative Simd Processor

Notes-Cso-Unit-5

Jon Stokes Jon

MIPS-To-Verilog, Hardware Compilation for the Emips Processor

DEPARTMENT of INFORMATICS Optimization of Small Matrix

CS3350B Computer Organization Chapter 3: CPU Control & Datapath

An Open Characterizing Platform for Integrated Circuits Morgan Bourrée, Florent Bruguier, Lyonel Barthe, Pascal Benoit, Philippe Maurine, Lionel Torres

Design of a Soft