Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions by Kevin A. Angstadt A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in the University of Michigan 2020 Doctoral Committee: Professor Westley Weimer, Chair Assistant Professor Reetuparna Das Assistant Professor Jean-Baptiste Jeannin Professor Kevin Skadron, University of Virginia If you find that you’re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you’re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice. — Donald Knuth Ein neues Tor ins Unglaubliche und ins Mögliche, ein neuer Tag, an dem alles geschehen konnte, wenn man es nur wollte. — Tove Jansson, Muminvaters wildbewegte Jugend Kevin A. Angstadt [email protected] ORCID iD: 0000-0002-0104-5257 c Kevin A. Angstadt 2020 acknowledgments Research is an inherently collaborative endeavor, and throughout this long journey, I have been supported by some phenomenal individuals. I would like to acknowledge, with brevity unbefitting of their contributions, many of those who helped me reach this moment in my life. First, I would like to thank my advisor, Westley Weimer, who has patiently guided me through my journey of doctoral studies from start to finish. Wes con- tinually challenged me to step outside my academic comfort zone while helping me learn the skills necessary to be a successful researcher. Wes’s knowledge of program analysis techniques played a key role in the success of the work described in this dissertation. I may have needed to take graduate-level Programming Lan- guages two and a half times, but I got there eventually. I am also seriously indebted to Wes’s efforts to support my mentorship and teaching interests. I would not be the educator and scholar I am today without Wes. Finally, I must thank Wes for always being a good sport when it comes to terrible puns—even in our most overworked hours testing and implementing quadcopter-based software systems. Next, I would like to thank Kevin Skadron, my former advisor and continued mentor. Kevin is the reason this dissertation exists; he got me hooked during a visit to the University of Virginia when he described a new project involving an ii experimental processor. Little did I know that it would be the start of a six-year journey into understanding how to best leverage finite automata to program new kinds of hardware! I am also thankful for Kevin’s ability to always consider the “big picture” and to remind me of its importance. None of this work would be possible without the expertise and experience of my collaborators. Thank you to Jack Wadden, Tommy Tracy II, Matt Casias, Arun Subramaniyan, Xiaowei Wang, Elaheh Sadredini, Reza Rahimi, Vinh Dang, Ted Xie, Nathan Brunelle, Chunkun Bo, Dan Kramp, Reetuparna Das, Stephanie Forrest, Jean-Baptise Jeannin, Mircea Stan, and Lu Feng for all that they have taught me while we conducted research together. To my computer architecture colleagues, thank you for your patience whenever I would ask ignorant questions. To my software engineering and programming languages colleagues, thank you for your patience whenever I would ask ignorant questions. I am also very much appreciative of my officemates over the years: Jonathan Dorn, Kevin Leach, Yu Huang, Madeline Endres, Colton Holoday, Zohreh Sharafi, Jamie Floyd, Kate Highnam, Hammad Ahmad, Fee Christoph, Yirui Liu, Ryan Krueger, Xinyu Liu, and Martin Kellogg. I truly enjoyed our conversations over the years and their willingness to teach me about their research. I also wish to acknowledge my teaching colleagues and mentors. While teaching is not an explicit part of the doctoral experience, it was a significant portion of my experience. Teaching is what has kept me motivated to finish. I am indebted to Amir Kamil, Dave Paoletti, Marcus Darden, Mark Sherriff, Luther Tychonievich, iii Ed Harcourt, and Patti Frazer Lock (among others) for all that they have done to help my development as an educator over the years. I would not have been able to complete this degree without the unconditional love and support of my family (Mom, Dad, Mike, Charlotte, Linnea, Shay, my grandparents, and my aunts and uncles). I might not have always been able to communicate clearly to them what my research is about, but they’ve stuck by me nonetheless. I’m fortunate to be able to celebrate my successes and overcome my setbacks with them by my side. To my family: thank you, and I love you. Thank you to my friends (Katja, Elaine, Nya, Terry, Liz, Joey, Tristan, Erin, Samyukta, Christabel, and Clara, among others) for tolerating my quirkiness. They have made this six-year journey significantly more fun and tolerable. Finally, thank you to you, the reader. The fact that you are reading this now means that my efforts were not for naught. If I have forgotten to thank you, I apologize. It has more to do with fatigue than anything else. iv table of contents acknowledgments ii list of figures xiii list of tables xv list of source code listings xvii list of algorithms xvii list of acronyms xviii abstract xxiv chapter 1introduction 1 1.1 Approach . 3 1.2 Contributions . 5 1.2.1 Adapting Legacy Code for Execution on Hardware Acceler- ators . 5 v 1.2.2 High-Level Languages for Automata Processing . 6 1.2.3 Interactive Debugging for High-Level Languages and Accel- erators . 7 1.2.4 Architectural Support for Common Applications . 8 1.3 Methodology . 10 1.4 Summary and Organization . 11 2 background 13 2.1 Finite Automata . 13 2.1.1 Deterministic and Non-Deterministic Finite Automata . 14 2.1.2 Deterministic Pushdown Automata . 16 2.2 Accelerating Automata Processing . 19 2.2.1 Micron’s D480 AP......................... 21 2.2.2 Cache Automaton . 23 2.2.3 Field-Programmable Gate Arrays . 24 2.3 Programming Models . 26 2.3.1 Automata Representations and Regular Expressions . 26 2.3.2 Languages for Streaming Applications . 28 2.3.3 Non-Deterministic Languages . 29 2.3.4 Programming Models for Portability . 30 2.3.5 Languages for Programming FPGAs . 31 2.3.6 State Machine Learning Algorithms . 33 2.3.7 Program Synthesis . 34 2.4 Maintenance Tools . 35 vi 2.4.1 Debugging on Hardware Accelerators . 35 2.4.2 Understanding the Importance of Debugging . 36 2.4.3 Software Verification . 37 2.5 Applications Benefiting from Acceleration . 38 2.5.1 Parsing of XML Files . 38 2.5.2 Architectural Side-Channel Attacks . 40 2.5.3 Runtime Intrusion Detection Systems . 42 2.6 Chapter Summary . 43 3 accelerationoflegacystringfunctions 44 3.1 Learning State Machines from Legacy Code . 47 3.1.1 L* Primer . 47 3.1.2 AutomataSynth Problem Description . 49 3.1.3 Using Source Code as a MAT . 51 3.1.4 Synthesizing Hardware Descriptions from Automata . 54 3.1.5 System Architecture . 54 3.2 Implementation and Correctness . 56 3.2.1 Bounded Model Checking . 56 3.2.2 Reasoning about Strings . 57 3.2.3 Verification for Termination Queries . 58 3.2.4 Correctness . 60 3.2.5 Implications. 65 3.3 Experimental Methodology . 66 3.3.1 Benchmark Selection . 66 vii 3.3.2 Experimental Setup . 68 3.4 Evaluation . 70 3.4.1 State Machine Learning . 70 3.4.2 Hardware Acceleration . 72 3.5 Discussion . 73 3.5.1 Learning More Expressive Models . 74 3.5.2 Expressive Power and Performance of String Solvers . 75 3.5.3 Scaling Termination Queries . 76 3.5.4 Characterizing and Taming Approximation . 77 3.6 Chapter Summary . 78 4 rapid: a high-level language for portable automata processing 80 4.1 Automata Processing Stability . 83 4.1.1 Performance Stability . 83 4.1.2 Automata Processing Performance . 86 4.1.3 Discussion . 89 4.2 The RAPID Language . 90 4.2.1 Program Structure . 91 4.2.2 Types and Data in RAPID . 94 4.2.3 Parallel Control Structures . 96 4.3 Code Generation . 100 4.3.1 Converting Expressions . 101 4.3.2 Converting Statements . 103 viii 4.3.3 Converting Counters . 104 4.4 Executing RAPID Programs . 108 4.4.1 Targeting the Automata Processor . 109 4.4.2 Targeting CPUs . 109 4.4.3 Targeting GPUs . 111 4.4.4 Targeting FPGAs . 111 4.5 Evaluation . 112 4.5.1 Expressive Power . 113 4.5.2 Empirical Evaluation . 115 4.6 Chapter Summary . 121 5 interactive debugging for high-level languages and accelerators 123 5.1 Hardware-Supported Debugging . 127 5.1.1 Example Program . 127 5.1.2 Breakpoints . 129 5.1.3 Hardware Abstractions for Debugging . 130 5.1.4 Accessing the State Vector . 131 5.1.5 Hardware Support for Breakpoints . 134 5.1.6 Debugging of RAPID Programs . 137 5.1.7 Time-Travel Debugging . 138 5.2 FPGA Evaluation . 139 5.2.1 Experimental Methodology . 140 5.2.2 FPGA Results . 142 ix 5.3 Human Study Evaluation . 145 5.3.1 Experimental Methodology . 145 5.3.2 Statistical Analysis . 147 5.3.3 Threats to Validity . 150 5.4 Chapter Summary . 151 6 architectural support for automata-based computation 153 6.1 Detecting Attacks with Memory Accesses . 158 6.1.1 The Memory Access Pattern Abstraction . 159 6.1.2 Dictionaries of Program Behavior . 162 6.1.2.1 D-Windows . 163 6.1.2.2 Truncation . 163 6.1.2.3 Compression . 164 6.1.3 Detecting Anomalous Program Execution . 165 6.2 Compiling Grammars to Pushdown Automata .

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

TARDIS: Affordable Time-Travel Debugging in Managed Runtimes

Sindarin: a Versatile Scripting API for the Pharo Debugger

Deterministic Replay of System's Execution with Multi-Target QEMU

Time-Travel Debugging Java Applications

Refinement Types for Elm

CROCHET: Checkpoint and Rollback Via Lightweight Heap Traversal on Stock Jvms

PROGRAMME HEADLINE 2021 SPONSOR CONFERENCE Bloomberg Is the Leading Provider of Financial News and Information

EXPOSITOR: Scriptable Time-Travel Debugging with First-Class Traces Khoo Yit Phang, Jeffrey S

Towards Liveness in Game Development

A Time-Travel Debugger for Web Audio Applications

Righting Web Development

TARDIS: Affordable Time-Travel Debugging in Managed Runtimes