A Unified Model for Hardware/Software Codesign Nirav

A Unified Model for Hardware/Software Codesign by Nirav Dave Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2011 © Nirav Dave, MMXI. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part. Author.............................................................. Department of Electrical Engineering and Computer Science July 26, 2011 Certified by. Arvind Johnson Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Professor Leslie A. Kolodziejski Chair, Department Committee on Graduate Students 2 A Unified Model for Hardware/Software Codesign by Nirav Dave Submitted to the Department of Electrical Engineering and Computer Science on July 26, 2011, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract Embedded systems are almost always built with parts implemented in both hardware and software. Market forces encourage such systems to be developed with different hardware-software decompositions to meet different points on the price- performance-power curve. Current design methodologies make the exploration of different hardware-software decompositions difficult because such exploration is both expensive and introduces significant delays in time-to-market. This thesis addresses this problem by introducing, Bluespec Codesign Language (BCL), a unified language model based on guarded atomic actions for hardware-software codesign. The model provides an easy way of specifying which parts of the design should be implemented in hardware and which in software without obscuring important design decisions. In addition to describing BCL's operational semantics, we formalize the equivalence of BCL programs and use this to mechanically verify design refinements. We describe the partitioning of a BCL program via computational domains and the compilation of different computational domains into hardware and software, respectively. Thesis Supervisor: Arvind Title: Johnson Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments This thesis has been a long time in coming and as one might expect, the list of those who deserve proper acknowledgment are long and hard to list. Foremost, I would like to my Advisor Professor Arvind for his help and mentorship. It is not an easy thing to find an advisor with a love of both the theoretical aspects of the work and an appreciation for practical application. Arvind never forced me to step away from any vector of interest. This perhaps is part of why my time at MIT was so long, but also so worthwhile. I would also like to thank Rishiyur Nikhil, Joe Stoy, Jamey Hicks, Kattamuri Ekanadham, Derek Chiou, James Hoe, and John Ankcorn for their advice and dis- cussion on various aspects of this work and its presentation over the years. To Jacob Schwartz, Jeff Newbern, Ravi Nanavati, and everyone else at Bluespec Inc. who patiently dealt with both my complaints about the compiler and my off the wall ideas. To my fellow graduate students, Charles O'Donnell, Ryan Newton, Man Cheuk Ng, Jae Lee, Mieszko Lis, Abhinav Agarwal, Asif Khan, Muralidaran Vijayaraghavan, and Kermin Fleming, who put up with my distractions. Special thanks are needed for Michael Pellauer who helped slog through some of the semantic details, Myron King who spent many years working with me on the BCL compiler and Michael Katelman who was instrumental in the verification effort. To Peter Neumann, for significant English assistance, and convincing me to keep up the editing process even after the thesis was acceptable. To my parents who were always there to remind me that my thesis need not solve the field and perhaps eight years of work was enough. And last but not least Nick and Larissa who got me out of the office and kept me sane over the years. I do not think I would have made it without you both. 5 6 Contents 1 Introduction 15 1.1 Desirable Properties for a Hardware-Software Codesign Language . 17 1.2 Thesis Contributions . 19 1.3 Thesis Organization . 21 1.4 Related Work . 22 1.4.1 Software Models of Hardware Description Languages . 22 1.4.2 C-based Behavioral Synthesis . 23 1.4.3 Implementation Agnostic Parallel Models . 24 1.4.4 Heterogeneous Simulation Frameworks . 25 1.4.5 Algorithmic Approaches to Design . 26 1.4.6 Application-Specific Programmable Processors . 26 1.4.7 Single Specification Hardware/Software Approaches . 27 1.4.8 Previous Work on Atomic Actions and Guards . 27 2 BCL: A Language of Guarded Atomic Actions 29 2.1 Example: Longest Prefix Match . 32 2.2 Semantics of Rule Execution in BCL . 36 2.2.1 Action Composition . 37 2.2.2 Conditional versus Guarded Actions . 40 2.2.3 Looping Constructs . 42 2.2.4 Local Guard . 42 2.2.5 Method Calls . 43 2.2.6 Notation of Rule Execution . 44 7 2.3 A BCL Program as a State Transition System . 45 2.4 Program Equivalence and Ordering . 48 2.4.1 Derived Rules . 50 3 A Rule-level Interpreter of BCL 53 3.1 Functionalization of BCL: The GCD Example . 53 3.2 Translation of BCL to λ-Calculus . 56 3.3 The GCD Example Revisited . 62 4 Hardware Synthesis 65 4.1 Implementing a Single Rule . 66 4.1.1 Implementing State Merging Functions . 68 4.1.2 Constructing the FSM . 71 4.2 Modularizing the Translation . 73 4.3 Understanding the FSM Modularization . 76 5 Scheduling 83 5.1 A Reference Scheduler . 84 5.2 Scheduling via Rule Composition . 86 5.2.1 Rule Composition Operators . 88 5.2.2 Sequencing Rules: compose . 89 5.2.3 Merging Mutually Exclusive Rules: par . 89 5.2.4 Choosing from Rules: restrict . 92 5.2.5 Repeating Execution: repeat . 92 5.2.6 Expressing Other Rule Compositions . 93 5.3 Scheduling in the Context of Synchronous Hardware . 94 5.3.1 Exploiting Parallel Composition . 94 5.3.2 Extending for Sequentiality . 97 5.3.3 Improving Compilation Speed . 99 5.3.4 Introducing Sequential Composition . 99 5.4 Expressing User-defined Schedules via Rule Compositions . 101 8 5.4.1 The Three Schedules . 104 5.4.2 Performance Results . 104 6 Computational Domains 107 6.1 Representing Partitioning in BCL: Computational Domains . 111 6.2 Partitioning with Computational Domains . 115 6.2.1 Modifying the Program for Partitioning . 116 6.3 Isolating Domains for Implementation . 117 6.4 A Design Methodology in the Presence of Domains . 120 7 Software Generation 121 7.1 Rule Canonicalization . 121 7.2 Syntax-Directed Compilation . 123 7.2.1 Compiling Expressions . 125 7.2.2 Compiling Actions . 125 7.2.3 Compiling Rules and Methods . 130 7.2.4 Compiling Modules . 130 7.3 The Runtime: Constructing main() ................... 130 7.4 Software Optimizations . 135 7.4.1 Shadow Minimization . 135 7.4.2 Action Sequentialization . 140 7.4.3 When Lifting . 141 7.4.4 Scheduling and Runtime Improvements . 143 8 Refinements and Correctness 145 8.1 Understanding Design Refinements . 146 8.1.1 Observability . 151 8.1.2 An Example of an Incorrect Refinement . 153 8.1.3 Refinements in the Context of Choice . 156 8.2 Establishing Correctness of Implementation . 160 8.3 Checking Simulation Using SMT Solvers . 163 9 8.3.1 Checking Correctness . 164 8.3.2 The Algorithm . 166 8.3.3 Formulating the SMT Queries . 167 8.3.4 Step-By-Step Demonstration . 169 8.4 The Debugging Tool and Evaluation . 170 9 Conclusion 175 10 List of Figures 2-1 Grammar of BCL. For simplicity we will assume all module and method names are unique. 30 2-2 Sequential C code for longest prefix match example . 32 2-3 Example: A Table-Lookup Program. Other modules implementations are given in Figure 2-4 . 34 2-4 One-element FIFO and Na¨ıve Memory Modules . 35 2-5 Operational semantics of a BCL Expressions. When no rule applies the expression evaluates to NR . 38 2-6 Operational semantics of a BCL Actions. When no rule applies the action evaluates to NR. Rule bodies which evaluate to NR produce no state update. 39 2-7 Helper Functions for Operational Semantics . 40 2-8 When-Related Axioms on Actions . 41 3-1 GCD Program in BCL . 54 3-2 Functionalization of Actions. Fixed-width text is concrete syntax of the λ-calculus expression . 59 3-3 Functionalization of BCL Expressions. Fixed-width text is concrete syntax of the λ-calculus expression . 60 3-4 Conversion of Top-level BCL Design to top-level definitions. Fixed-width text is concrete syntax of the λ-calculus expression. 61 3-5 Functionalized form of GCD program . 63 11 4-1 Initial Example translation of λ-expressions to Circuits. White boxes can be implemented solely with wires, gray boxes need gates, dotted boxes correspond to λ abstractions. 68 4-2 Result of β-reduction on expression in Figure 4-1. Notice how the fundamental circuit structures does not change . 69 4-3 Result of Distribution and Constant Propagation on expression in Fig- ure 4-2. Sharing the white box structures (wire structures) do not change the . 69 4-4 Example Single-Rule Program for Hardware Synthesis . 72 4-5 Simplified λ expressions of functionalization of Figure 4-4 . 72 4-6 Final functionalization of Program in Figure 4-4 . 73 4-7 Example of implemented rule. State structure s0, s1, ns, and final output s2 have been flattened into a single bit-vector. fg is Verilog bit-vector concatenation. 74 4-8 Generation of δ Functions.

Load more