Self-Timed Asynchronous Architecture of an Advanced General Purpose Microprocessor

Self-Timed Asynchronous Architecture of an Advanced General Purpose Microprocessor Rakefet Kol Self-Timed Asynchronous Architecture of an Advanced General Purpose Microprocessor RESEARCH THESIS Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Science Rakefet Kol Submitted to the Senate of the Technion - Israel Institute of Technology Elul, 5757 Haifa September 1997 To my mother, with endless love. The research was done under the supervision of Dr. Ran Ginosar from the Department of Electrical Engineering, and Prof. Michael Yoeli from the Department of Computer Science. My deepest gratitude to Dr. Ran Ginosar and Prof. Michael Yoeli for their devoted guidance and invaluable help and comments. Their encouragement to explore new and exciting areas has taught me a lot and is greatly appreciated. Table of Contents Abstract ..................................................................1 Chapter 1 : Introduction ...................................................4 1.1 Future Technological Constraints and Asynchronous Design ...............5 1.2 Timing Disciplines ................................................7 1.3 Previous Asynchronous Processors ...................................9 1.4 Thesis Outline ...................................................10 Chapter 2 : Kin Architecture...............................................12 2.1 Microarchitecture ................................................12 2.1.1 General Description .......................................12 2.1.2 Processor Modules and Instruction Pruning ....................14 2.2 Race and Deadlock Problems .......................................17 2.3 Multi-Execution on Kin ...........................................21 2.4 Kin Model......................................................22 2.5 Implementation Methodologies .....................................23 Chapter 3 : Avid Execution and Instruction Pruning............................24 3.1 Introduction and Previous Work .....................................24 3.1.1 Execution Model .........................................24 3.1.2 Previous Work ...........................................29 3.2 Avid Execution ..................................................32 3.2.1 Avid Execution Concept ...................................32 3.2.2 Performance Analysis of Avid Execution ......................33 3.3 Pathmarks, Pruning Management, and Beheading Mechanism .............36 3.4 Asynchronous Architecture for Avid Execution ........................39 3.5 Simulation Results ...............................................40 3.6 Concluding Remarks ..............................................45 Chapter 4 : An Asynchronous Instruction Length Decoder .......................47 4.1 Introduction .....................................................47 4.1.1 Author’s Contribution .....................................49 4.2 AILD Architecture ...............................................50 4.2.1 General Description .......................................50 4.2.2 Handling Branch Instructions ...............................52 4.2.3 Handling Long Instructions .................................53 Table of Contents (cont.) 4.2.4 Handling Prefixes ........................................54 4.2.5 The Length Decoder Operation ..............................55 4.3 AILD Implementation .............................................57 4.4 Concluding Remarks ..............................................58 Chapter 5 : A Doubly-Latched Asynchronous Pipeline ..........................59 5.1 Introduction .....................................................59 5.2 The Doubly-Latched Asynchronous Pipeline ...........................60 5.3 Edge-Triggered DLAP ............................................62 5.4 Latched DLAP ..................................................63 5.5 Comparative Analysis .............................................64 5.6 Non-Linear DLAPs ...............................................68 5.7 Synchronous to Asynchronous Conversion ............................69 5.7.1 Motivation ..............................................69 5.7.2 Post-Synthesis Conversion Algorithm .........................71 5.8 Concluding Remarks ..............................................73 Chapter 6 : Adaptive Synchronization for Multi-Synchronous Systems ...............................75 6.1 Introduction .....................................................75 6.1.1 Previous work ...........................................76 6.1.2 Multi-Synchronous Systems ................................78 6.2 Data Adaptive Synchronization .....................................79 6.3 Data Adaptive Synchronization Circuit ...............................81 6.4 Training Sessions ................................................83 6.5 Probability of Synchronization Failure ................................84 6.6 Concluding Remarks ..............................................87 Chapter 7 : Adapting Statecharts Methodology for Asynchronous Design .................................89 7.1 Introduction .....................................................89 7.2 The Statechart-based Statemate MAGNUM CAD System ................90 7.3 The qDI FSM ...................................................91 7.4 Specifying the qDI FSM with Statecharts .............................91 7.5 Validation ......................................................93 7.6 Simulation ......................................................95 7.7 Concluding Remarks ..............................................96 Table of Contents (cont.) Chapter 8 : Summary and Further Research...................................97 Appendix A : Clock Coordination for Synchronization of Multi-Synchronous Systems .............................99 A.1 Introduction ....................................................99 A.2 Clock Coordination Circuit .......................................100 A.3 Performance Analysis of Clock Coordination.........................102 A.4 Limits of Clock Coordinated Synchronization ........................104 A.4.1 Metastability ...........................................104 A.4.2 Oscillation .............................................106 A.4.3 No Convergence ........................................107 A.5 Concluding Remarks............................................108 References..............................................................109 Extended Abstract in Hebrew ................................................. i List of Figures Figure 2-1: Kin asynchronous processor architecture ...........................14 Figure 2-2: Dynamic Instance Tag structure ..................................15 Figure 2-3: Inter-module communication .....................................19 Figure 2-4: FIFO control statechart .........................................19 Figure 2-5: Fair arbiter statechart...........................................19 Figure 3-1: Instruction execution rate........................................26 Figure 3-2: Number of executed instructions till misprediction ....................26 Figure 3-3: Studies of ILP as a function of window size .........................27 Figure 3-4: Cumulated executed instructions..................................27 Figure 3-5: Hardware parallelism and misprediction penalty effect on execution rate . 28 Figure 3-6: Tree of possible execution paths..................................28 Figure 3-7: Misprediction effects on performance..............................30 Figure 3-8: Examples of Avid Execution depth................................33 Figure 3-9: Misprediction penalties in Avid Execution ..........................34 Figure 3-10: Pathmarks based on prefix notation................................37 Figure 3-11: Synthetic traces simulation results (for w=20) .......................41 Figure 3-12: SpecInt95 simulation results (for w=40)............................43 Figure 4-1: Block diagram of the Asynchronous Instruction Length Decoder ........50 Figure 4-2: Length decoder interconnections and handshake signals ...............51 Figure 4-3: Marking unit interconnections and handshake signals .................52 Figure 4-4: A simplified statechart of length decoder behavior....................56 Figure 5-1: A Doubly-Latched Asynchronous Pipeline (DLAP) ...................61 Figure 5-2: A DLAP stage structure.........................................61 Figure 5-3: The DLAP test circuit ..........................................61 Figure 5-4: STG for a Master-Slave Edge Triggered stage controller...............62 Figure 5-5: A Master-Slave Edge Triggered stage controller circuit implementation . 62 Figure 5-6: STG for a Master-Slave Latch stage controller.......................64 Figure 5-7: A Master-Slave Latch stage controller circuit implementation...........64 Figure 5-8: Waveforms of Master-Slave Edge Triggered DLAP test circuit..........64 Figure 5-9: Waveforms of Master-Slave Latch DLAP test circuit..................64 Figure 5-10: Scheduling comparison of alternative pipelines ......................67 Figure 5-11: A Fork stage implementation.....................................69 Figure 5-12: A Join stage implementation.....................................69 Figure 5-13: A ring DLAP.................................................69 Figure 5-14: Synchronous-to-asynchronous conversion ..........................72 Figure 6-1: Arrival time distribution of inputs ................................79 Figure 6-2: A multi-synchronous system .....................................79 List of Figures (cont.) Figure 6-3:

Load more