Formal Design and Synthesis of GALS Architectures
Total Page:16
File Type:pdf, Size:1020Kb
School of Electrical, Electronic & Computer Engineering Formal Design and Synthesis of GALS Architectures Sohini Dasgupta Technical Report Series NCL-EECE-MSD-TR-2008-131 March 2008 Contact: [email protected] Supported by EPSRC grant GR/S12036 NCL-EECE-MSD-TR-2008-131 Copyright c 2008 University of Newcastle upon Tyne School of Electrical, Electronic & Computer Engineering, Merz Court, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK http://async.org.uk/ Formal Design and Synthesis of GALS Architectures by Sohini Dasgupta School of Electrical, Electronic & Computer Engineering Newcastle University PhD Thesis January 2008 To My Parents Contents List of Figures viii List of Tables xiii List of Abbreviations xv List of Publications xvii Acknowledgements xix Abstract xx 1 Introduction 1 1.1 Motivation and contribution . 4 1.2 Thesis Outline . 7 2 Background 9 2.1 Introduction . 9 2.1.1 Synchronous Design . 9 iii CONTENTS 2.1.2 Asynchronous Design . 10 2.1.2.1 Classes of Asynchronous Circuits . 12 2.1.2.2 Handshake Protocols . 13 2.1.3 GALS Design . 14 2.1.3.1 GALS architecture . 17 2.1.4 Local clock Generator . 18 2.1.5 Handshake Unit . 20 2.2 GALS Behavioural Modeling Schemes . 23 2.2.1 Synchronous Transition Systems . 23 2.2.2 Petri Nets . 26 2.2.3 Signal Transition Graphs . 29 3 GALS Design Technique 32 3.1 Introduction . 32 3.2 System Integration Strategies . 32 3.2.1 Standard Synchronisers . 34 3.2.2 Adaptive Synchronisers . 36 3.2.3 FIFO Synchronisation . 37 3.3 Local Clock Control Scheme . 39 3.4 System Desynchronisation Strategies . 43 3.4.1 Asynchronous Deployment . 44 3.4.2 GALS Deployment . 44 3.4.3 Endochronous and Weakly Endochronous Systems . 49 iv CONTENTS 3.4.3.1 Transition System model . 52 4 Comparative Analysis of GALS Clocking Schemes 54 4.1 Introduction . 54 4.2 Overview of the GALS system . 56 4.2.1 Models of the clocking schemes . 57 4.2.2 Discussions of the models . 62 4.3 Verification and Logic Synthesis . 64 4.4 Circuit Implementation . 66 4.5 Performance Analysis . 72 4.5.1 GALS system characterisation parameters . 72 4.5.2 Model Level Analysis . 73 4.6 Circuit Level: Experimental Results . 76 4.7 Summary . 87 5 GALS Implementation of Weakly Endochronous Systems 90 5.1 Introduction . 90 5.1.1 Overview of the methodology . 92 5.2 Preliminaries for modeling . 94 5.2.1 Microstep Transition System model . 95 5.2.2 Theory of Regions . 97 5.3 Composition-GALS idea . 98 5.3.0.1 Synchronous Composition: . 99 5.3.0.2 Asynchronous Composition: . 100 v CONTENTS 5.4 Weak endochrony . 102 5.4.1 Weakly Endochronous Criteria . 103 5.4.2 Correctness results . 107 5.5 Synthesis of the weakly endochronous modules . 109 5.5.1 Requirements of the specification for synthesis . 109 5.5.2 Pre-requisites for transformation . 112 5.5.3 Synthesis algorithm steps . 113 5.6 Case Study:DLX architecture . 123 5.7 Correctness of Handshake Refinement . 126 5.8 Implementation . 128 5.9 Summary . 129 6 Desynchronisation Technique using Petri nets 131 6.1 Introduction . 131 6.2 Preliminaries . 134 6.2.1 Petri nets . 135 6.3 Motivation for using Localities . 136 6.3.1 Max-O semantics and validity criteria using Processes . 141 6.4 Synchronous model description . 143 6.4.1 Net Transformations and notion of validity . 147 6.5 Petri nets with localities . 151 6.6 Notion of partitioning correctness . 153 6.7 Rules for Locality Allocation . 158 vi CONTENTS 6.8 Allocation of Localities . 160 6.8.1 Algorithm for locality allocation . 160 6.8.2 A simple Example . 165 6.9 Locality Optimisation . 169 6.10 GALSification . 173 6.11 Implementation of clock control . 174 6.12 Summary . 176 7 Conclusion 178 7.1 Summary of contribution . 179 7.2 Future Work . 181 Bibliography 184 Bibliography 184 vii List of Figures 2.1 Two and Four- phase handshake protocols . 14 2.2 Bundled data and dual rail channels . 14 2.3 A single port GALS System . 17 2.4 Clock generator circuit . 19 2.5 Mutual Exclusion Element . 19 2.6 ME with multiple port controllers . 20 2.7 A multi-port GALS system . 21 2.8 D-port Controller . 21 2.9 P-port Controller . 23 2.10 D-type and P-Type port controllers . 23 3.1 Standard synchronisers . 35 3.2 Standard two-flop synchroniser . 36 3.3 Adaptive synchroniser . 37 3.4 Mixed async-sync FIFOs . 39 3.5 Locally clocked module . 40 viii LIST OF FIGURES 3.6 Pausible clock controller . 41 3.7 Stretchable clock controller . 41 3.8 De-synchronisation technique . 45 3.9 System synchronisation . 50 3.10 Mapping of a synchronous program to LST S . 52 4.1 Design flow . 55 4.2 Overall system architecture for producer-consumer interface . 56 4.3 Circuit blocks and PN fragments . 59 4.4 Clock control circuits and their corresponding PN models . 60 4.5 PN model of pausible clocking scheme . 63 4.6 PN model of stretchable clock scheme . 64 4.7 PN model of data driven clock scheme . 64 4.8 Pausible clock circuit . 67 4.9 Stretchable clock circuit . 68 4.10 Data driven clock circuit . 69 4.11 Asynchronous communication-phase relationship at the producer block . 71 4.12 Best case req-ack latency in producer block . 75 4.13 Worst case req-ack latency in producer block . 76 4.14 Number of pauses for pausible clocking scheme . 78 4.15 Number of clock pauses for stretchable clock . 79 4.16 Pause latency in pausible clock . 80 ix LIST OF FIGURES 4.17 Pause latency in stretchable clock . 81 4.18 FIFO design . 81 4.19 Power analysis . 83 4.20 Throughput analysis . 84 4.21 Request delay analysis for pausible clock . 84 4.22 Request delay analysis for stretchable clock . 85 4.23 Throughput Analysis with varying FIFO sizes . 86 4.24 Throughput with FIFO size 0 . 88 5.1 Delay Insensitive System . 93 5.2 GALS wrappers constructed around the synchronous WE blocks . 94 5.3 Translation of a T S into a P N . 98 5.4 True vs interleaved concurrency . 110 5.5 clock assumptions . 111 5.6 Variable flow consistency . 111 5.7 Original model . 114 5.8 Model without stuttering steps . 114 5.9 Model partitioned into reactions . 115 5.10 Handshake refinement . 118 5.11 Partitioned model with states assigned to sets . 120 5.12 Final transformed model . 121 5.13 DLX architecture . 124 5.14 DLX-ID automaton . 124 x LIST OF FIGURES 5.15 Reaction refinement . 126 5.16 The final model . 127 5.17 Single rail FIFO model and implementation . 128 6.1 Synchronous system transformation into distributed architecture . 132 6.2 Flow diagram of the proposed methodology . 135 6.3 Simple synchronous block . 137 6.4 PN model of the synchronous block . ..