Hardware Synthesis from a Stream-Processing Functional Language
Total Page:16
File Type:pdf, Size:1020Kb
UCAM-CL-TR-824 Technical Report ISSN 1476-2986 Number 824 Computer Laboratory Hardware synthesis from a stream-processing functional language Simon Frankau November 2012 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2012 Simon Frankau This technical report is based on a dissertation submitted July 2004 by the author for the degree of Doctor of Philosophy to the University of Cambridge, St. John’s College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Abstract As hardware designs grow exponentially larger, there is an increasing challenge to use transistor budgets effectively. Without higher-level synthesis tools, so much effort may be spent on low-level details that it becomes impractical to efficiently design circuits of the size that can be fabricated. This possibility of a design gap has been documented for some time now. One solution is the use of domain-specific languages. This thesis covers the use of software-like lan- guages to describe algorithms that are to be implemented in hardware. Hardware engineers can use the tools to improve their productivity and effectiveness in this particular domain. Software engineers can also use this approach to benefit from the parallelism available in modern hardware (such as reconfig- urable systems and FPGAs), while retaining the convenience of a software description. In this thesis a statically-allocated pure functional language, SASL, is introduced. Static allocation makes the language suited to implementation in fixed hardware resources. The I/O model is based on streams (linear lazy lists), and implicit parallelism is used in order to maintain a software-like approach. The thesis contributes constraints which allow the language to be statically-allocated, and synthesis tech- niques for SASL targeting both basic CSP and a graph-based target that may be compiled to a register- transfer level (RTL) description. Further chapters examine the optimisation of the language, including the use of lenient evaluation to increase parallelism, the introduction of closures and general lazy evaluation, and the use of non- determinism in the language. The extensions are examined in terms of the restrictions required to ensure static allocation, and the techniques required to synthesise them. 3 4 Acknowledgements I would like to thank my supervisors, Simon Moore and Alan Mycroft, without whose advice and insight this thesis would not have been written. I also gratefully acknowledge Altera for the studentship they generously provided. Thanks to all my fellow students, for making the Computer Laboratory such an enjoyable and inter- esting place to (attempt to) work. Also, thanks to my family and house-mates, who have been highly supportive. This thesis is dedicated to the memory of my mother, Patricia. 5 6 Contents 1 Introduction and Related Work 15 1.1 TheNeedforHigh-LevelHDLs . 16 1.1.1 A Brief History of HDLs . 16 1.1.2 A Comparison to Software Languages . 19 1.1.3 Modern Hardware Development . 19 1.1.4 Runtime Reconfigurable Systems . 20 1.2 The Hardware Description Language Space . ....... 21 1.2.1 Language Assumptions . 21 1.2.2 ExampleLanguages ................................ 23 1.3 The Statically-Allocated Stream Language . 25 1.3.1 SASL’sNiche.................................... 26 1.3.2 FunctionalLanguages ............................... 27 1.3.3 Static Allocation . 28 1.3.4 Static Allocation of Functional Languages . 28 1.3.5 SASL’sI/OModel ................................. 29 1.3.6 A Comparison to Other Languages . 30 1.4 Thesis Contributions and Organisation . 30 2 The SASL Language 33 2.1 TheMotivation:SAFLandSAFL+. 33 2.1.1 TheSAFLLanguage................................ 33 2.1.2 SAFL+: An Attempt to Improve I/O . 34 2.1.3 FunctionalI/O ................................... 34 2.2 OtherRelatedWork.................................... 37 2.3 A Na¨ıveStreamProcessingLanguage . 38 2.3.1 The Stream-less Language . 38 2.3.2 Stream-processing extensions . 38 2.3.3 Problems raised . 39 2.4 Restrictions for Static Allocation . 40 2.4.1 The stratified type system . 40 2.4.2 Linearity ...................................... 43 2.4.3 Stability . 43 2.4.4 Static Allocation . 44 2.4.5 ExamplePrograms ................................. 44 7 8 CONTENTS 2.5 SASLSemantics ...................................... 44 2.6 Deforestation ....................................... 45 2.7 A Comparison to SAFL+ and Synchronous Dataflow . ..... 47 2.7.1 SAFL+ ....................................... 47 2.7.2 Lustre........................................ 47 2.8 Summary .......................................... 50 3 Translation to CSP 55 3.1 Synthesis Aims . 55 3.2 Synthesis Outline and Function Interfacing . 57 3.3 Variableaccess ..................................... 59 3.3.1 Broadcastvariables................................ 59 3.3.2 Unicast variables . 59 3.3.3 Stream Variable Access . 59 3.4 CSPSynthesis....................................... 61 3.4.1 Non-stream CSP Synthesis . 62 3.4.2 Stream CSP Synthesis . 64 3.5 Summary .......................................... 65 4 Dataflow Graph Translation 67 4.1 Pipelining SASL . 68 4.2 DataflowGraphGeneration.............................. 70 4.2.1 Translation to Linear SASL . 71 4.2.2 Translation to Dataflow Graph . 75 4.2.3 GraphProperties .................................. 75 4.2.4 Node Implementation . 78 4.2.5 Other Dataflow Architectures . 80 4.3 The Control/Dataflow Graph . 81 4.3.1 Removing CONS-enclosed Tail Recursion . 81 4.3.2 Removing Direct Tail Recursion . 83 4.3.3 Node Implementation . 85 4.4 Extracting stream buses . 85 4.4.1 StreamBuses.................................... 88 4.4.2 StreamBusTyping................................. 89 4.4.3 Typing Implementation . 89 4.4.4 TypingExamples.................................. 91 4.4.5 Representing Stream Buses . 91 4.4.6 ManagingStreamBuses .............................. 96 4.4.7 Node Implementation . 97 4.5 Summary .......................................... 99 5 Optimisation 101 5.1 Static Scheduling . 101 5.1.1 TheProblem ....................................102 5.1.2 ASAPandALAPScheduling ........................... 102 5.2 Lenient Evaluation . 106 5.2.1 Signalling on Lenient Streams: The “Push” Model . 107 5.2.2 Cancelling Lenient Evaluation . 108 5.2.3 Basic Lenient Evaluation . 109 5.2.4 Lenient Evaluation with a Stream Bus Controller . 112 CONTENTS 9 5.2.5 Changing the Evaluation Model: Lazy Tail Matching . 112 5.2.6 Rearranging Graphs for Lazy Tail Evaluation . 114 5.3 ProgramTransformation ............................... 117 5.3.1 Enabling Graph Optimisations . 117 5.3.2 Peep-hole Optimisation . 120 5.3.3 Flattening Conditionals . 123 5.3.4 Removing Conditional Nodes . 124 5.3.5 Unrolling Loops . 125 5.4 Summary ..........................................125 6 Closures and Statically-Allocated Laziness 127 6.1 Higher-order Functions as Macros . 127 6.1.1 Nested Function Definitions . 128 6.1.2 Lazily-Evaluated Closures . 130 6.2 Leniently-evaluated Expressions . 130 6.3 Statically-Allocated Laziness . 134 6.4 Summary ..........................................138 7 Multi-Set Processing and Non-Determinism 141 7.1 Non-Deterministic Stream Reading . 141 7.1.1 Language Considerations . 142 7.1.2 Hardware Implementation . 144 7.2 Identifying Non-Deterministic Values . 144 7.3 Generalising Streams . 148 7.3.1 Dealing with Multi-sets . 148 7.3.2 Syntax and Types for Reorderable Streams . 149 7.3.3 Implementing Reorderable Streams . 150 7.4 Identifying Reorderable Streams . 153 7.5 Restoring Referential Transparency . ....... 156 7.6 Summary ..........................................157 8 Conclusions and Further Work 159 8.1 Conclusions........................................ 159 8.2 LanguageExtensions ................................. 159 8.3 Synthesis Extensions . 162 A Example node implementations 165 A.1 Signalling . 165 A.2 NormalNodes....................................... 166 A.3 CONSNodes ........................................ 167 A.4 MatchNodes ........................................ 168 A.5 ResetNodes....................................... 169 A.6 OtherNodes........................................ 169 B CaseStudy 171 B.1 TheExample ........................................171 B.2 CSPSynthesis....................................... 171 B.3 GraphSynthesis..................................... 175 B.4 Performance....................................... 178 B.4.1 Tools ........................................178 10 CONTENTS B.4.2 Thesignalprogram................................. 178 B.4.3 Themapprograms ................................. 178 C Extending the Identification of Reorderable Streams 181 C.1 TheTypeSystem ..................................... 181 C.2 Stream-Generating Functions . 185 C.3 ForwardsAnalysis ................................... 185 C.4 BackwardsAnalysis.................................. 187 Bibliography 192 List of Figures 1.1 AselectionofHDLs .................................... 23 2.1 TheabstractgrammarofSharp’sSAFL . ..... 34 2.2 A SAFL program embedded in a system with state . 34 2.3 Streamless-SASL’s abstract grammar . ..... 39 2.4 Grammar extensions for stream processing . ...... 39 2.5 Programs that cause problems for static allocation . ..... 40 2.6 Grammar extensions for handling tuples . 41 2.7 Typingrules........................................ 42 2.8 Linearity rules . 42 2.9 Examples of recursive functions . ..... 44 2.10 Examples of common functions . 45 2.11 Examples of merge and (illegal) duplication functions in SASL . ..... 45 2.12 Big step transition relation for SASL . 46 2.13 Examples of function composition in SASL . 48 2.14 LustreWatchDogprogram . 49 2.15 SASLWatchDogprogram .............................. ..