Strategies for the Modelling and Simulation of Asynchronous Computer Architectures

Strategies for the Modelling and Simulation of Asynchronous Computer Architectures

STRATEGIES FOR THE MODELLING AND SIMULATION OF ASYNCHRONOUS COMPUTER ARCHITECTURES A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science and Engineering September 1995 By Georgios Theodoropoulos Department of Computer Science Contents Abstract 15 The author 19 Acknowledgements 22 1 Introduction 23 1.1 Background .............................. 23 1.2 Motivation and Objectives . 23 1.3 StructureoftheThesis . 24 1.3.1 Related Publications . 26 2 The Quest for High Performance 27 2.1 Introduction.............................. 27 2.2 Bit and Instruction Level Parallelism . 28 2.3 ReducedInstructionSetComputers . 29 2.4 The Limits of Sequential Computation . 30 2.5 ParallelComputerArchitectures. 31 2.5.1 SIMD ............................. 32 2.5.2 MIMD............................. 33 2.5.2.1 Shared Memory MIMD Architectures . 33 2.5.2.2 Distributed Memory MIMD Architectures . 35 2 2.5.3 Parallel Programming Models and Languages . 36 2.5.3.1 Communicating Sequential Processes . 37 2.6 OccamandtheTransputer. 38 2.6.1 The Occam Programming Language . 38 2.6.1.1 The SEQ and PAR Constructs . 40 2.6.1.2 The ALT Construct . 41 2.6.1.3 Timers........................ 42 2.6.1.4 Functions and Procedures . 45 2.6.2 TheTransputer ........................ 45 2.6.2.1 Configuring Occam Programs . 48 2.6.2.2 The T9000 Transputer . 49 2.7 Summary ............................... 50 3 Modelling and Simulation 51 3.1 Introduction.............................. 51 3.2 Discrete Event Simulation Modelling . 55 3.3 The Need for Parallel Discrete Event Simulation . 57 3.3.1 Exploiting Parallelism . 58 3.4 The Logical Process Paradigm . 60 3.4.1 TimingIssues ......................... 61 3.5 Synchronous versus Asynchronous Simulation . 63 3.6 Time Driven Logical Process Simulation . 64 3.7 Event Driven Logical Process Simulation . 65 3.7.1 Conservative Techniques . 67 3.7.1.1 Deadlock Avoidance . 68 3.7.1.2 Deadlock Detection and Recovery . 70 3.7.1.3 Characteristics of Conservative Protocols . 70 3.8 Optimistic Synchronization Protocols . 71 3 3.8.1 TimeWarp .......................... 72 3.8.1.1 Global Virtual Time . 72 3.8.1.2 State Saving and Memory Management . 73 3.8.1.3 Characteristics of Optimistic Protocols . 74 3.9 Modelling and Simulation in Computer Architecture Research . 74 3.9.1 The Need for Improved Digital System Simulation Perfor- mance ............................. 78 3.9.1.1 Parallel Digital System Simulation . 78 3.10Summary ............................... 80 4 Asynchronous Systems 81 4.1 Introduction.............................. 81 4.2 Advantages of Asynchronous Systems . 82 4.2.1 Clock Distribution Problems . 83 4.2.2 Potential for Low Power . 83 4.2.3 Potential for High Performance . 84 4.2.4 Better Technology Migration Potential . 85 4.3 Basic Characteristics of Asynchronous Systems . 85 4.3.1 TimingModel......................... 85 4.3.2 Signalling Protocols . 86 4.3.2.1 Two-phase Signalling . 86 4.3.2.2 Four-phase Signalling . 87 4.3.3 Data Passing Techniques . 88 4.3.3.1 The Four-Wire Technique . 88 4.3.3.2 The Three-Wire Technique . 88 4.3.3.3 The Two-Plus-Wire Technique . 89 4.3.3.4 The Bundled Data Technique . 90 4.4 Micropipelines............................. 90 4 4.4.1 Event Control Elements . 91 4.4.2 Event Controlled Storage Element . 93 4.4.3 Micropipelines Without Processing . 94 4.4.4 Micropipelines With Processing . 95 4.5 AMULET ............................... 96 4.6 TheAMULET1Microprocessor . 98 4.6.1 The AMULET1 Interface . 98 4.6.2 The AMULET1 Internal Organization . 100 4.6.2.1 The Address Interface Unit . 100 4.6.2.2 The Data Interface Unit . 102 4.6.2.3 The Register Bank Unit . 103 4.6.2.4 The Execution Unit . 104 4.6.2.5 The Primary Decode Unit . 105 4.6.3 AMULET2 .......................... 105 4.7 Summary ............................... 106 5 Modelling Asynchronous Systems 107 5.1 Introduction.............................. 107 5.2 ModellingTechniques. 108 5.2.1 CSP-based Modelling Approaches . 108 5.3 Modelling Micropipelined Systems with Occam . 110 5.3.1 WhyOccam.......................... 111 5.3.1.1 The Deadlock Problem . 112 5.3.2 The Modelling Philosophy . 114 5.3.3 Modelling a Pipeline Without Processing . 115 5.3.4 Modelling a Pipeline With Processing . 116 5.3.5 Modelling Control Logic . 119 5.3.6 TimingIssues ......................... 119 5 5.3.6.1 Synchronous Merge . 120 5.3.6.2 Data Dependent Merge . 121 5.3.6.3 Arbitrated Merge . 122 5.3.6.4 Delay Independence . 125 5.4 Summary ............................... 126 6 Occarm: An Occam Model of AMULET1 127 6.1 Introduction.............................. 127 6.2 OccarmGeneralStructure . 128 6.2.1 Non-BundledSignals . 130 6.3 TheAddressInterface . 131 6.3.1 The Address Interface Internal Organization . 131 6.3.1.1 ThePCLoop.................... 131 6.3.1.2 ThePCPipe .................... 133 6.3.1.3 TheLSMLoop . 134 6.3.2 The Address Interface Occam Model . 135 6.4 TheDataInterface .......................... 137 6.5 InstructionFlowControl . 139 6.5.1 Condition Code Evaluation . 140 6.5.2 BranchExecution. 140 6.5.3 ExceptionHandling. 142 6.5.3.1 Software Interrupts . 143 6.5.3.2 Instruction Prefetch Aborts . 143 6.5.3.3 Hardware Interrupts . 143 6.5.3.4 Data Transfer Aborts . 144 6.6 ThePrimaryDecode . 147 6.6.1 The Dec1CtrlA Process . 148 6.6.1.1 Modelling of the Arbitration logic . 149 6 6.6.1.2 Detecting Data Aborts . 152 6.6.2 The Dec1CtrlB Process . 154 6.7 TheRegisterBank .......................... 155 6.7.1 Modelling the Register Bank . 157 6.8 TheExecutionUnitModel. 160 6.8.1 TheCPSRModel....................... 161 6.8.2 Decode2 ............................ 162 6.8.3 Decode3 ............................ 164 6.9 TheWriteBusControl . 165 6.10Summary ............................... 166 7 Simulation Issues 168 7.1 Introduction.............................. 168 7.2 The Host Machine: The ParSiFal T-Rack . 169 7.3 Monitoring............................... 171 7.3.1 MonitoringOccarm. 175 7.3.1.1 Debugging . 176 7.3.1.2 Performance Evaluation . 177 7.4 Termination .............................. 180 7.5 The Simulator Environment . 182 7.6 Multiprocessor Implementation . 183 7.6.1 Mapping Occarm onto the T-Rack . 184 7.6.1.1 Balancing the Workload . 186 7.6.1.2 Balancing the Communication Load . 187 7.6.1.3 The Monitoring Path . 190 7.6.1.4 The Generic Simulator Node . 191 7.7 Summary ............................... 191 7 8 Validation of the Occarm Model 192 8.1 Introduction.............................. 192 8.2 BenchmarkPrograms. 193 8.3 Accuracy................................ 195 8.4 Performance.............................. 204 8.5 Summary ............................... 208 9 Addressing the Time Modelling Problem 209 9.1 Introduction.............................. 209 9.2 Requirements ............................. 210 9.3 The Program Driven Synchronization Protocol (PDSP) . 211 9.3.1 TheBasis ........................... 211 9.3.2 TheRules ........................... 212 9.3.3 ThePDSPArbiterProcess. 213 9.3.3.1 Improving PDSP Performance . 214 9.3.4 TheLimitations. 216 9.4 ApplyingPDSPtoOccarm. 217 9.5 TheAddressInterfaceArbiter . 218 9.5.1 Providing Instruction Lookahead Information . 218 9.5.2 ThePCchLink ........................ 221 9.5.2.1 Filling of the Datapath . 224 9.5.2.2 Register Read Instructions . 230 9.5.2.3 Instructions Activating the ALUgo Signal . 232 9.5.2.4 Load/Store Multiple Instructions . 233 9.5.2.5 The Instruction Lookahead Table . 233 9.5.3 TheWchLink......................... 234 9.5.3.1 Colour Mismatch . 235 9.5.3.2 Condition Codes Failure . 236 8 9.6 ThePrimaryDecodeArbiter. 238 9.7 TheWriteControlArbiter . 242 9.7.1 TheDINchLink ....................... 242 9.7.2 TheDPchLink ........................ 244 9.8 Performance Evaluation of PDSP . 248 9.9 Summary ............................... 249 10 Conclusions and Further Work 250 10.1Background .............................. 250 10.2 ContributionoftheThesis . 252 10.2.1 Modelling ........................... 252 10.2.2 Simulation. 253 10.3 The Program Driven Synchronization Protocol . 255 10.4Performance.............................. 256 10.5 Occam as an Asynchronous Hardware Description Language . 257 10.6FurtherWork ............................. 258 10.6.1 Modelling and Simulation . 258 10.6.2 AutomaticSynthesis . 259 A The ARM6 Programmer’s Model 260 A.1 TheRegisters ............................. 260 A.2 TheInstructionSet.. .. .. .. .. ... .. .. .. .. .. ... 262 B Modelling the Control Logic of AMULET1 266 Bibliography 277 9 List of Tables 7.1 Communication Load on Occarm Links . 186 8.1 TimestampDrift ........................... 194 8.2 DhrystoneNumbers.......................... 194 8.3 AMULET1 Pipeline Occupancy (Dhrystone (1 loop)) . 196 8.4 AMULET1PipelineStalls(Dhrystone(1loop)) . 197 8.5 Asim versus Occarm (Single Transputer Implementation) . 206 8.6 PerformanceofOccarm. 206 9.1 PDSP: Number of Free Stages in the Datapath . 226 9.2 Performance of PDSP (Address Interface) . 248 10 List of Figures 2.1 TheUseoftheOccamSEQandPARConstructs . 40 2.2 TheOccamALTConstruct . 41 2.3 Introducing Delays with Occam Timers . 42 2.4 Programming Timeout Behaviour . 43 2.5 AnExampleOccamProgram . 44 2.6 TheArchitectureoftheT800Transputer . 46 3.1 ATaxonomyofModels........................ 53 3.2 Abstraction Levels in Digital Systems . 75 4.1 The Request-Acknowledge Interface . 86 4.2 Two-phase Signalling: Rising and Falling Edges Equivalent. ... 86 4.3 Two-phaseSignallingProtocol . 87 4.4 Four-phaseSignallingProtocol. 87 4.5 TheBundledDataInterface . 89 4.6 The Two-phase Bundled Data Protocol . 89 4.7 EventControlModules. 91 4.8 The Capture-Pass Storage Element . 93 4.9 MicropipelineWithoutProcessing . 94 4.10 MicropipelineWithProcessing. 96 4.11 TheAMULET1Interface. 99 4.12 The AMULET1 Internal Organization . 101 11 4.13 The AMULET1 Processor Physical Layout . 102 5.1 Micropipeline Without Processing: The Register Model . 116 5.2 Micropipeline With Processing: A High Level View . 117 5.3 Micropipeline With Processing: The Register Model . 118 5.4 SynchronousMerge.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    312 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us