Design and Implementation of an Asynchronous Version of the MIPS R3000 Microprocessor

Rochester Institute of Technology RIT Scholar Works Theses 1-1994 Design and implementation of an asynchronous version of the MIPS R3000 microprocessor Kevin Johnson Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Johnson, Kevin, "Design and implementation of an asynchronous version of the MIPS R3000 microprocessor" (1994). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. DESIGN AND IMPLEMENTATION OF AN ASYNCHRONOUS VERSION OF THE MIPS R3000 MICROPROCESSOR by Kevin Johnson A Thesis Submitted In Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE In Computer Engineering Approved by: Graduate Advisor - George A. Brown, Professor Roy S. Czemikowski, Professor and Department Head Tony H. Chang, Professor Department of Computer Engineering College of Engineering Rochester Institute of Technology Rochester, New York January, 1994 THESIS RELEASE PERMISSION FORM ROCHESTER INSTITUTE OF TECHNOLOGY COLLEGE OF ENGINEERING Title: Design and Implementation of an Asynchronous Version of the MIPS R3000 Microprocessor. I, Kevin Johnson, hereby deny permission to the Wallace Memorial Library of RIT to reproduce my thesis in whole or in part. Date: , II~-Ilf 'f ABSTRACT The purpose of this thesis is to show some of the advantages for the asynchronous implementation of a major synchronous structure. Three people were involved with the design of an asynchronous microprocessor modeled after the MIPS R3000 microprocessor. This microprocessor implemented all of the MIPS reduced instruction set, while eliminating the need for synchronous clocking throughout the chip. Paul Fanelli modeled the asynchronous processor using VHDL (hardware description language). Kevin Johnson created circuit level designs and layouts of the arithmetic logic unit and supporting hardware. Scott Siers created circuit level designs and layouts of the instruction fetch, write back, memory and instruction decode stages. Table Of Contents Abstract ii Table of Contents iii List of Figures iv List of Tables vi Glossary vii 1. Introduction 1 1.1 MIPSR3000 2 1.2 Pipeline 3 1.3 ALU stage 4 2. Registers 6 2.1 Data Dependencies 6 2.2 Register Design 13 3. Arithmetic Logic Unit 21 3.1 TTL Model 21 3.2 DataPath 28 3.3 Dual Purpose 28 3.4 Conditional Statements 29 3.5 Opcode Interfacing 32 4. Shifter Design 38 5 . Multiplier Divider in the Asynchronous Microprocessor 48 6. Bus Control 50 6.1 Inputs 50 6.2 Outputs 52 7. Exception Creation 57 8. Handshaking 61 8.1 Completion Signals 61 8.2 Handshaking Controller Circuit 67 9. Results 70 10. Conclusions 78 Bibliography 80 m List Of Figures Figure 1.1 Pipeline Stages 3 Figure 1.2 Top Level Design of the ALU stage 5 Figure 2.1 Simple Method For Adding Three Numbers 6 Figure 2.2 Possible Paths for the Valid Bits 9 Figure 2.3 Program Showing Data Dependency 9 Figure 2.4 Petri Net for the Data Dependency Problem 12 Figure 2.5 Two Phase Clocking Scheme..... 13 Figure 2.6 Bus Utilization Scheme 14 Figure 2.7 Example of a Load Word Left Instruction 16 Figure 2.8 Logic Diagram of One Register Bit 18 Figure 2.9 Layout of a One-Bit Register 19 Figure 2.10 Logic Diagram of One 32-Bit Register 20 Figure 3.1 Schematic of the Modified ALU 22 Figure 3.2 Equations for Intermediate Carry Signals 24 Figure 3.3 Look Ahead Circuit 26 Figure 3 .4 Schematic of a 32-bit Adder Using 4-bit Look Ahead Carry Adders 27 Figure 3.5 Sign Extension Circuit 30 Figure 3.6 Condition Analyzer 31 Figure 3.7 Condition Bit Generator 34 Figure 3.8 Instruction Interface Circuit 35 Figure 3.9 Bit Coding of the R3000 Instruction Set 36 Figure 4.1 Arithmetic and Logical Shifts 38 Figure 4.2 Basic Shifting Cell 39 Figure 4.3 Simple Connection For Shifting Cells 40 Figure 4.4 Circuit Diagram of A 3-1 Multiplexor 42 Figure 4.5 Layout of a 3-1 Multiplexor 43 Figure 4.6 Layout of a 2-1 Multiplexor 44 Figure 4.6 Example of a Driving Circuit 46 Figure 4.7 Analog Simulation of the Driving Circuit 47 Figure 6.1 Input Bus Control Block 51 Figure 6.2 A-Bus Selector 53 Figure 6.3 B-Bus Selector 54 Figure 6.4 Bus Selector Decoder 55 Figure 6.5 Output Selector 56 Figure 7.1 Example of Overflow Conditions 58 Figure 7.2 Schematic of the Overflow Logic 59 Figure 7.3 Layout of the Overflow Logic 60 IV Figure 8.1 DCVSL NOR Gate with Completion Signal 62 Figure 8.2 Paths of DCVSL Outputs 63 Figure 8.3 Comparison of DCVSL and Complementary CMOS Logic 65 Figure 8.4 Layout Comparison of DCVSL versus Complementary CMOS 66 Figure 8.5 Handshaking Controller Circuit 68 Figure 9.1 Simulation of Unsigned Addition 73 Figure 9.2 Simulation of a Branch Not Equal to Zero Instruction 74 Figure 9.3 Simulation of a Signed Instruction Causing an Exception 75 Figure 9.4 Simulation of a Shift Instruction 76 Figure 9.5 Simulation of a Move From HI Instruction 77 List of Tables Table 2.1 Incorrect Scheduling of Dependent Instructions 10 Table 9.1 Average Times of Various Operations 71 VI Glossary: ALU Arithmetic Logic Unit ALU Stage Part of the pipeline where all arithmetic operations takes place. Asynchronous Actions do not occur at specific time periods or when another action is scheduled to take place. CISC Complex instruction set computer. CMOS Complementary metal-oxide-semiconductor DCVSL Differential Cascade Voltage Switched Logic FPU Floating point unit. HCC Handshaking Controller Circuit. HI 32-bit register used to hold the high 32 bits in a multiply operation or the quotient in a divide operation. ID stage Instruction decode stage of the pipeline where the instruction is decoded into signals. IF stage Instruction fetch stage of the pipeline where the instruction is obtained from memory. LOW 32-bit register used to hold the low 32 bits in a multiply operation or the remainder in a divide operation. LWL Load word left. LWR Load word right. MEM stage Memory stage of the pipeline where all memory operations are performed. MIPS Company that makes microprocessors currently now a division of Sun Microsystems. Vll MIPS Millions of instructions per second. NMOS N-type metal-oxide-semiconductor. NOP Instruction that cuses nothing to happen. PC Program counter. PCL Signal name for the latched output of the program counter. R3000 A version of the MIPS architecture used in a particular 32-bit microprocessor. RISC Reduced instruction set computer. RIT Rochester Institute of Technology Synchronous Actions occur according to a deterministic schedule, e.g. Actions occur at specific phases of a clock pulse. TTL Transistor Transistor Logic VHDL VHSIC Hardware Description Language VHSIC Very High Speed Integrated Circuit VLSI Very Large Scale Integration WB stage Write back stage of the pipeline where the data is written into the registers. Vlll 1. Introduction The purpose of this project is to show some of the advantages of the asynchronous design methodology. The design should be large enough to allow the increase in speed caused by the asynchronous operation to be greater than the decrease in speed caused by the extra circuitry. A thirty-two bit microprocessor was chosen for this project. The asynchronous design methodology eliminates the need for clocking circuits by adding handshaking circuitry. Clock lines in a microprocessor need to be routed to every part of the chip. These lines cause a problem in the operation of the chip called clock skew. The clock lines, upon which the clock signal travels, spread out in many directions from the clock driver. This signal is supposed to change the state of many circuit elements throughout the chip simultaneously. Since these lines are not the same length, the clock pulses on one side of the chip may or may not be in phase with the pulses on the other side. In order to make sure that circuit elements on both sides of the chip have changed state using the same clock pulse, the period of the clock is extended. This makes sure that the clock pulse has had time to travel throughout the chip. The clock lines and the gates that are driven by the clock create a large capacitive load. This capacitance must be overcome in order for the clock signal to change state in a reasonable amount of time. This causes the clock driver to be necessarily large. The capacitance in the line must be charged in order for the voltage on the line to come to a true logical-one value. The capacitance also must be discharged in order for the value to return to the logical-zero state. The time needed to charge and discharge the large capacitance limits the maximum possible clock frequency. If the clock frequency is faster than the charging and discharging times, the clock pulses mid- will never reach their true values but will stay fluctuating around some level voltage. Within a large clocked circuit, the clock speed must be slow enough to allow the slowest part of the circuit to be able to complete its task before the beginning of the next clock cycle. In large designs, this minimum time between clock pulses becomes a problem. For example, in a microprocessor this minimum time is the time it takes for the slowest instruction to execute. This can be a long period of time because operations such as multiplication take longer than operations such as shifting.

Load more