Strategies for the Modelling and Simulation of Asynchronous Computer Architectures

STRATEGIES FOR THE MODELLING AND SIMULATION OF ASYNCHRONOUS COMPUTER ARCHITECTURES A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science and Engineering September 1995 By Georgios Theodoropoulos Department of Computer Science Contents Abstract 15 The author 19 Acknowledgements 22 1 Introduction 23 1.1 Background .............................. 23 1.2 Motivation and Objectives . 23 1.3 StructureoftheThesis . 24 1.3.1 Related Publications . 26 2 The Quest for High Performance 27 2.1 Introduction.............................. 27 2.2 Bit and Instruction Level Parallelism . 28 2.3 ReducedInstructionSetComputers . 29 2.4 The Limits of Sequential Computation . 30 2.5 ParallelComputerArchitectures. 31 2.5.1 SIMD ............................. 32 2.5.2 MIMD............................. 33 2.5.2.1 Shared Memory MIMD Architectures . 33 2.5.2.2 Distributed Memory MIMD Architectures . 35 2 2.5.3 Parallel Programming Models and Languages . 36 2.5.3.1 Communicating Sequential Processes . 37 2.6 OccamandtheTransputer. 38 2.6.1 The Occam Programming Language . 38 2.6.1.1 The SEQ and PAR Constructs . 40 2.6.1.2 The ALT Construct . 41 2.6.1.3 Timers........................ 42 2.6.1.4 Functions and Procedures . 45 2.6.2 TheTransputer ........................ 45 2.6.2.1 Conﬁguring Occam Programs . 48 2.6.2.2 The T9000 Transputer . 49 2.7 Summary ............................... 50 3 Modelling and Simulation 51 3.1 Introduction.............................. 51 3.2 Discrete Event Simulation Modelling . 55 3.3 The Need for Parallel Discrete Event Simulation . 57 3.3.1 Exploiting Parallelism . 58 3.4 The Logical Process Paradigm . 60 3.4.1 TimingIssues ......................... 61 3.5 Synchronous versus Asynchronous Simulation . 63 3.6 Time Driven Logical Process Simulation . 64 3.7 Event Driven Logical Process Simulation . 65 3.7.1 Conservative Techniques . 67 3.7.1.1 Deadlock Avoidance . 68 3.7.1.2 Deadlock Detection and Recovery . 70 3.7.1.3 Characteristics of Conservative Protocols . 70 3.8 Optimistic Synchronization Protocols . 71 3 3.8.1 TimeWarp .......................... 72 3.8.1.1 Global Virtual Time . 72 3.8.1.2 State Saving and Memory Management . 73 3.8.1.3 Characteristics of Optimistic Protocols . 74 3.9 Modelling and Simulation in Computer Architecture Research . 74 3.9.1 The Need for Improved Digital System Simulation Perfor- mance ............................. 78 3.9.1.1 Parallel Digital System Simulation . 78 3.10Summary ............................... 80 4 Asynchronous Systems 81 4.1 Introduction.............................. 81 4.2 Advantages of Asynchronous Systems . 82 4.2.1 Clock Distribution Problems . 83 4.2.2 Potential for Low Power . 83 4.2.3 Potential for High Performance . 84 4.2.4 Better Technology Migration Potential . 85 4.3 Basic Characteristics of Asynchronous Systems . 85 4.3.1 TimingModel......................... 85 4.3.2 Signalling Protocols . 86 4.3.2.1 Two-phase Signalling . 86 4.3.2.2 Four-phase Signalling . 87 4.3.3 Data Passing Techniques . 88 4.3.3.1 The Four-Wire Technique . 88 4.3.3.2 The Three-Wire Technique . 88 4.3.3.3 The Two-Plus-Wire Technique . 89 4.3.3.4 The Bundled Data Technique . 90 4.4 Micropipelines............................. 90 4 4.4.1 Event Control Elements . 91 4.4.2 Event Controlled Storage Element . 93 4.4.3 Micropipelines Without Processing . 94 4.4.4 Micropipelines With Processing . 95 4.5 AMULET ............................... 96 4.6 TheAMULET1Microprocessor . 98 4.6.1 The AMULET1 Interface . 98 4.6.2 The AMULET1 Internal Organization . 100 4.6.2.1 The Address Interface Unit . 100 4.6.2.2 The Data Interface Unit . 102 4.6.2.3 The Register Bank Unit . 103 4.6.2.4 The Execution Unit . 104 4.6.2.5 The Primary Decode Unit . 105 4.6.3 AMULET2 .......................... 105 4.7 Summary ............................... 106 5 Modelling Asynchronous Systems 107 5.1 Introduction.............................. 107 5.2 ModellingTechniques. 108 5.2.1 CSP-based Modelling Approaches . 108 5.3 Modelling Micropipelined Systems with Occam . 110 5.3.1 WhyOccam.......................... 111 5.3.1.1 The Deadlock Problem . 112 5.3.2 The Modelling Philosophy . 114 5.3.3 Modelling a Pipeline Without Processing . 115 5.3.4 Modelling a Pipeline With Processing . 116 5.3.5 Modelling Control Logic . 119 5.3.6 TimingIssues ......................... 119 5 5.3.6.1 Synchronous Merge . 120 5.3.6.2 Data Dependent Merge . 121 5.3.6.3 Arbitrated Merge . 122 5.3.6.4 Delay Independence . 125 5.4 Summary ............................... 126 6 Occarm: An Occam Model of AMULET1 127 6.1 Introduction.............................. 127 6.2 OccarmGeneralStructure . 128 6.2.1 Non-BundledSignals . 130 6.3 TheAddressInterface . 131 6.3.1 The Address Interface Internal Organization . 131 6.3.1.1 ThePCLoop.................... 131 6.3.1.2 ThePCPipe .................... 133 6.3.1.3 TheLSMLoop . 134 6.3.2 The Address Interface Occam Model . 135 6.4 TheDataInterface .......................... 137 6.5 InstructionFlowControl . 139 6.5.1 Condition Code Evaluation . 140 6.5.2 BranchExecution. 140 6.5.3 ExceptionHandling. 142 6.5.3.1 Software Interrupts . 143 6.5.3.2 Instruction Prefetch Aborts . 143 6.5.3.3 Hardware Interrupts . 143 6.5.3.4 Data Transfer Aborts . 144 6.6 ThePrimaryDecode . 147 6.6.1 The Dec1CtrlA Process . 148 6.6.1.1 Modelling of the Arbitration logic . 149 6 6.6.1.2 Detecting Data Aborts . 152 6.6.2 The Dec1CtrlB Process . 154 6.7 TheRegisterBank .......................... 155 6.7.1 Modelling the Register Bank . 157 6.8 TheExecutionUnitModel. 160 6.8.1 TheCPSRModel....................... 161 6.8.2 Decode2 ............................ 162 6.8.3 Decode3 ............................ 164 6.9 TheWriteBusControl . 165 6.10Summary ............................... 166 7 Simulation Issues 168 7.1 Introduction.............................. 168 7.2 The Host Machine: The ParSiFal T-Rack . 169 7.3 Monitoring............................... 171 7.3.1 MonitoringOccarm. 175 7.3.1.1 Debugging . 176 7.3.1.2 Performance Evaluation . 177 7.4 Termination .............................. 180 7.5 The Simulator Environment . 182 7.6 Multiprocessor Implementation . 183 7.6.1 Mapping Occarm onto the T-Rack . 184 7.6.1.1 Balancing the Workload . 186 7.6.1.2 Balancing the Communication Load . 187 7.6.1.3 The Monitoring Path . 190 7.6.1.4 The Generic Simulator Node . 191 7.7 Summary ............................... 191 7 8 Validation of the Occarm Model 192 8.1 Introduction.............................. 192 8.2 BenchmarkPrograms. 193 8.3 Accuracy................................ 195 8.4 Performance.............................. 204 8.5 Summary ............................... 208 9 Addressing the Time Modelling Problem 209 9.1 Introduction.............................. 209 9.2 Requirements ............................. 210 9.3 The Program Driven Synchronization Protocol (PDSP) . 211 9.3.1 TheBasis ........................... 211 9.3.2 TheRules ........................... 212 9.3.3 ThePDSPArbiterProcess. 213 9.3.3.1 Improving PDSP Performance . 214 9.3.4 TheLimitations. 216 9.4 ApplyingPDSPtoOccarm. 217 9.5 TheAddressInterfaceArbiter . 218 9.5.1 Providing Instruction Lookahead Information . 218 9.5.2 ThePCchLink ........................ 221 9.5.2.1 Filling of the Datapath . 224 9.5.2.2 Register Read Instructions . 230 9.5.2.3 Instructions Activating the ALUgo Signal . 232 9.5.2.4 Load/Store Multiple Instructions . 233 9.5.2.5 The Instruction Lookahead Table . 233 9.5.3 TheWchLink......................... 234 9.5.3.1 Colour Mismatch . 235 9.5.3.2 Condition Codes Failure . 236 8 9.6 ThePrimaryDecodeArbiter. 238 9.7 TheWriteControlArbiter . 242 9.7.1 TheDINchLink ....................... 242 9.7.2 TheDPchLink ........................ 244 9.8 Performance Evaluation of PDSP . 248 9.9 Summary ............................... 249 10 Conclusions and Further Work 250 10.1Background .............................. 250 10.2 ContributionoftheThesis . 252 10.2.1 Modelling ........................... 252 10.2.2 Simulation. 253 10.3 The Program Driven Synchronization Protocol . 255 10.4Performance.............................. 256 10.5 Occam as an Asynchronous Hardware Description Language . 257 10.6FurtherWork ............................. 258 10.6.1 Modelling and Simulation . 258 10.6.2 AutomaticSynthesis . 259 A The ARM6 Programmer’s Model 260 A.1 TheRegisters ............................. 260 A.2 TheInstructionSet.. .. .. .. .. ... .. .. .. .. .. ... 262 B Modelling the Control Logic of AMULET1 266 Bibliography 277 9 List of Tables 7.1 Communication Load on Occarm Links . 186 8.1 TimestampDrift ........................... 194 8.2 DhrystoneNumbers.......................... 194 8.3 AMULET1 Pipeline Occupancy (Dhrystone (1 loop)) . 196 8.4 AMULET1PipelineStalls(Dhrystone(1loop)) . 197 8.5 Asim versus Occarm (Single Transputer Implementation) . 206 8.6 PerformanceofOccarm. 206 9.1 PDSP: Number of Free Stages in the Datapath . 226 9.2 Performance of PDSP (Address Interface) . 248 10 List of Figures 2.1 TheUseoftheOccamSEQandPARConstructs . 40 2.2 TheOccamALTConstruct . 41 2.3 Introducing Delays with Occam Timers . 42 2.4 Programming Timeout Behaviour . 43 2.5 AnExampleOccamProgram . 44 2.6 TheArchitectureoftheT800Transputer . 46 3.1 ATaxonomyofModels........................ 53 3.2 Abstraction Levels in Digital Systems . 75 4.1 The Request-Acknowledge Interface . 86 4.2 Two-phase Signalling: Rising and Falling Edges Equivalent. ... 86 4.3 Two-phaseSignallingProtocol . 87 4.4 Four-phaseSignallingProtocol. 87 4.5 TheBundledDataInterface . 89 4.6 The Two-phase Bundled Data Protocol . 89 4.7 EventControlModules. 91 4.8 The Capture-Pass Storage Element . 93 4.9 MicropipelineWithoutProcessing . 94 4.10 MicropipelineWithProcessing. 96 4.11 TheAMULET1Interface. 99 4.12 The AMULET1 Internal Organization . 101 11 4.13 The AMULET1 Processor Physical Layout . 102 5.1 Micropipeline Without Processing: The Register Model . 116 5.2 Micropipeline With Processing: A High Level View . 117 5.3 Micropipeline With Processing: The Register Model . 118 5.4 SynchronousMerge.

Strategies for the Modelling and Simulation of Asynchronous Computer Architectures

Parallelization Hardware Architecture Type to Enter Text

DE86 006665 Comparison of the CRAY X-MP-4, Fujitsu VP-200, And

Performance of Various Computers Using Standard Linear Equations Software

Performance of Various Computers Using Standard Linear Equations Software

Fujitsu Data Book 2014

Efficient Large Electromagnetic Simulation Based on Hybrid TLM

Uniform Random Number Generators for Vector and Parallel Computers∗

Recent Supercomputing Development in Japan

ANL-85-19.Pdf

Vector Microprocessors

Fujitsu Group Sustainability Report [Detailed Version] 2012

The Impact of Memory and Architecture on Computer Performance