ISSN 2322-0929 Vol.02,Issue.01, January-2014, Pages:0018-0025

ww.semargroup.org www.ijvdcs.org Speed Optimized Implementation of 32 Bit RISC (MIPS) Architecture 1 2 KURAMANA HARIKA , SOLOMON J V GOTHAM 1PG Scholar, Dept of ECE, Kaushik College of Engineering, Vishakapatnam, Andhrapradesh, India, E-mail: [email protected]. 2Asst Prof, Dept of ECE, Kaushik College of Engineering, Vishakapatnam, Andhrapradesh, India, E-mail: [email protected]. Abstract: The Implementation of 32 bit RISC processor with without interlocked pipeline stages (MIPS) is presented. It was implemented in VHDL so as to reduce the instruction set present in the programmable memory. As the result the processor will contain the necessary logics for the implementation that requires fewer gates to be synthesized in the programmable matrix and has the capability to increase the speed of the target processor with reduced memory. In this paper we propose a novel technique of run-time loading of for MIPS-32 soft-core processor. As we know, implementing fewer instructions on silicon reduces the complexity of the instruction decoder, the addressing logic, and the execution unit. This allows the machine to be clocked at a faster speed, since less work needs to be done each clock period. In this paper we used Xilinx-ISE tool for logical verification, and further synthesizing it on Xilinx-ISE tool using target technology and performing placing & routing operation for system verification.

Keywords: RISC Processor, MIPS-23, VHDL, Xilinx-ISE.

I. INTRODUCTION (almost) completed before the next can be issued for MIPS (originally an acronym for Microprocessor without execution; in a pipelined architecture, successive Interlocked Pipeline Stages) is a reduced instruction set instructions can instead overlap in execution. For instance, computing (RISC) instruction set architecture (ISA) at the same time a math instruction is fed into the floating developed by MIPS Computer Systems (now MIPS point unit, the load/store unit can fetch the next instruction. Technologies). MIPS RISC microprocessor architecture One major barrier to pipelining was that some instructions, characteristics include: like division, take longer to complete and the CPU  Fix-length straightforward decoded instruction therefore has to wait before passing the next instruction format into the pipeline. One solution to this problem is to use a  Memory accesses limited to load and store series of interlocks that allows stages to indicate that they instructions are busy, pausing the other stages upstream. A major  Hardwired control unit aspect of the MIPS design was to fit every sub-phase,  A large general purpose register file including cache-access, of all instructions into one cycle,  and All operations are done within the registers of thereby removing any needs for interlocking, and the microprocessor. permitting a single cycle throughput.

II. MIPS versus RISC Although this design eliminated a number of useful In 1981, a team led by John L. Hennessy at Stanford instructions such as multiply and divide. It was felt that the University started work on what would become the first overall performance of the system would be dramatically MIPS processor. The basic concept was to increase improved because the chips could run at much higher clock performance through the use of deep instruction pipelines. rates. This ramping of the speed would be difficult with Pipelining as a basic technique was well known before interlocking involved, as the time needed to set up locks is (IBM 801 for instance), but not developed into its full as much a function of die size as clock rate. The potential. CPUs are built up from a number of dedicated elimination of these instructions became a contentious sub-units such as instruction decoders, ALUs (integer point. The other difference between the MIPS design and arithmetic and logic), load/store units (handling memory), the competing Berkeley RISC involved the handling of and so on. In a traditional non-optimized design, a calls. RISC used a technique called register particular instruction in a program sequence must be windows to improve performance of these very common

Copyright @ 2013 SEMAR GROUPS TECHNICAL SOCIETY. All rights reserved.

KURAMANA HARIKA, SOLOMON J V GOTHAM tasks, but this limited the maximum depth of multi-level opcode; the rest may contain a single 26-bit jump address calls. or it may have up to four 5-bit fields specifying up to three registers plus a shift value combined with another 6-bits of In other ways the MIPS design was very much a typical opcode; another format, among several, specifies two RISC design. To save bits in the instruction word, RISC registers combined with a 16-bit immediate value, etc. This designs reduce the number of instructions to encode. The was one of the major performance improvements that MIPS design uses 6 bits of the 32-bit word for the basic RISC offered.

Figure1. Pipelined Datapath

III. THE MIPS INSTRUCTION SET instructions or R-type include: ALU Immediate (e.g. addi), ARCHITECTURE three-operand (e.g. add, and, slt), and shift instructions MIPS is a RISC microprocessor architecture. The MIPS (e.g. sll, srl). The J-type instructions are used for jump Architecture defines thirty-two; 32-bit general purpose instructions (e.g. j). Branch instructions (e.g. beq, bne) are registers (GPRs). Register $r0 is hard-wired and always I-type instructions which use the addition of an offset value contains the value zero. The CPU uses byte addressing for from the current address in the address/immediate field word accesses and must be aligned on a byte boundary along with the program counter (PC) to compute the divisible by four (0, 4, 8, …). MIPS only has three branch target address; this is considered PC-relative instruction types: I-type is used for the Load and Stores addressing. instructions, R-type is used for Arithmetic instructions, and Immediate (I-Type) CPU Instruction Format J-type is used for the Jump instructions. Table 1.3.1 provides a description of each of the fields used in the three different instruction types. Jump (J-Type) CPU Instruction Format MIPS is a load/store architecture, meaning that all operations are performed on operands held in the processor registers and the main memory can only be accessed Register (R-Type) CPU Instruction Format through the load and store instructions (e.g lw, sw). A load instruction loads a value from memory into a register. A store instruction stores a value from a register to memory. The load and store instructions use the sum of the offset value in the address/immediate field and the base register Figure2. Shows a summary of the MIPS Instruction in the $rs field to address the memory. Arithmetic Types.

International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025 Speed Optimized Implementation of 32 Bit RISC (MIPS) Architecture

Table1. MIPS Instruction Fields  Blocked RAM is a rich resource in modern FPGA chips.  It is much easier to design new CPU functions such as adding application specific instruction set and custom data path.  The complexity of the sequencing circuits does not increase with the complexity of sequencing logic.  By turning sequencing logic into software (micro program), it’s much easier to do a kernel context- switch or update sequencing logic at both design time and run-time. Micro program storage unit is implemented as a Blocked RAM primitive in ECOMIPS. If the microinstruction width is 18bits, a single blocked RAM is able to accommodate 256 micro-instructions on a Xilinx Spartan 3 chip. In most cases, one blocked RAM will be far sufficient for storing the entire micro program. So the sacrifice of one blocked RAM can replace all the sequencing logic which would otherwise use up a large amount of gate resources in FPGA. The in ECOMIPS maps directly to control signals in the data path. Therefore, no translation circuit is needed.

However, the speed of reading one word from a blocked RAM is not as fast as that from a combinatorial circuit and the maximum system clock rate of the ECOMIPS is almost linearly dependent on the speed of blocked RAM in the FPGA. Fortunately the speed of RAM access has been made very fast in modern FPGA chips, currently in less than 15 nanoseconds. If the clocking scheme is properly A. Ecomips Architecture designed, the maximum CPU clock rate can still be as high ECOMIPS is a compact 32-bit MIPs CPU module to as 64MHZ. Every MIPS instruction is 32 bits, i.e. one be embedded in a FPGA chip. By taking advantages of word, in length, and every instruction which does not modern FPGA chip, the system itself consumes very little access memory executes in one cycle. Those that do access chip resources and leaves abundant room for implementing memory are assumed to “usually” take two cycles, other specialized control and processing modules. Some of meaning that this will be the case if the desired memory these resources are general purpose ones and used by access results in a cache hit; otherwise, the CPU enters a almost all modules in the chip. However, some resources stall mode until the memory access is satisfied. like multipliers, wide multiplexers, blocked RAM and DCM (with high resolution phase shifting) is not B. Design Implementable Architecture extensively used by most computational modules. When a A 32-bit MIPs CPU module to be embedded in a FPGA specialized FPGA chip is at work, many of its dedicated chip is shown in figure 3.1. By taking advantages of resources are left unused. ECOMIPS takes advantages of modern FPGA chip, the system itself consumes very little those often unused resources and relieves resource racing chip resources and leaves abundant room for implementing on other logic cells on the chip. other specialized control and processing modules. In CPU, the two modules that consume most gate resources are In CPU, the two modules that consume most gate Arithmetic Logic Unit (ALU) and sequencing control unit resources are Arithmetic Logic Unit (ALU) and (or micro program controller). For speed concerns, most sequencing control unit (or micro program controller). For CPU systems derive sequencing control unit from well speed concerns, most CPU systems derive sequencing designed finite state machines. Another approach is to code control unit from well designed finite state machines. all sequencing logic into micro-program and store it into a Another approach is to code all sequencing logic into RAM/ROM block. Every MIPS instruction is 32 bits, i.e. micro-program and store it into a RAM/ROM block. Both one word, in length, and every instruction which does not designs have their own advantages over the other. access memory executes in one cycle. Those that do access ECOMIPS uses the RAM based micro program approach memory are assumed to “usually” take two cycles, for the following reasons: meaning that this will be the case if the desired memory International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025

KURAMANA HARIKA, SOLOMON J V GOTHAM access results in a cache hit; otherwise, the CPU enters a  32-Bit address stall mode until the memory access is satisfied.  32-bit instruction format length The features of 32 bit MIPS processor that we  5-bit opcode implemented are:  32-bit registers 1. Arithmetic and Logic unit(ALU), ALU Control, ALU  No interrupts control Mux, Branch Adder, PC Control Mux,  No conditional branches Register Control Mux, Register File, Left Extender,  15 instructions in instruction set Right Extender, Control Unit, Data Memory, Left  Register addressing and memory addressing Shift Register, Program Counter, PC Adder, Set formats Control Mux, Write Control Mux are identified as  Data- unsigned, signed integer type individual modules in the design. 2. The design aspects taken into consideration are:  32- Bit data

Figure3. Architecture of 32-bit MIPS RISC processor.

International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025 Speed Optimized Implementation of 32 Bit RISC (MIPS) Architecture

The architecture consists of the following modules:

Arithmetic and logic unit,alu control,alu control mux,branch adder,pc control mux,register control mux,register file,left extender,right extender, control unit,left shift register,program counter,pc adder,set control mux, write control mux,data memory

IV. RESULTS and DISCUSSION A. PROCESSOR SCHEMATIC:

Figure5. RTL Schematic view of processor

C. SYNTHESIS REPORT: Total power consumed 40.95 mw Time delay 32.867 ns Total memory usage 248984 kilobytes

D.SIMULATION OF MIPS INSTRUCTION TYPES 1. Immediate Type Arithmetic Addition: Mnemonic – addi $r1, $r2, 32. $r1 = $r2 + 32. Here the binary instruction for immediate addition is

Figure4. Properties of processor

B. PROCESSOR RTL VIEW

Figure6. Arithmetic Addition International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025

KURAMANA HARIKA, SOLOMON J V GOTHAM

2. Jump-Type: i. Unconditional Jump: Mnemonic – j 44 Here the binary instruction for store operation is Jump operation jumps to the target address as shown in Wave Form figure7

Figure7. Unconditional Jump

2. Conditional jump i. Jump if Zero (condition satisfied): Mnemonic- JZ 24 Here the binary instruction for store operation is If the result of previous instruction is zero then jump to the address 24 as shown in the Wave Form figure8

Figure8. Jump if Zero (condition satisfied)

International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025 Speed Optimized Implementation of 32 Bit RISC (MIPS) Architecture

ii. Jump if Zero (condition not satisfied):

Figure9. Jump if Zero (condition not satisfied)

From Figure9. Jump if non-zero (condition not satisfied)

3. Register- Type: 0 i. Complement Condition: Here the data 1 i.e, 00000000000000000000000000000010 Mnemonic- NOT1 $r1 $r2 $r3 is complemented as 1111111111111111111111111111111 $r1 = complement of $r2 01 as shown in Figure10. Here the binary instruction for store operation is

Figure10. Complement Condition

V. CONCLUSION applications. The advantage of the project is that the IP Modern FPGA chips have provided developers with core can be used anywhere, for example this architecture is rich resources for embedding a CPU module in the original generally used in RISC . So we can application specific circuit. ECOMIPS is designed to take develop our own RISC microprocessor and also develop full advantages of chip resources. It explores how a general different RISC microprocessors as per the client reconfigurable MIPS design can be implemented on requirements. And we can also easily develop an modern chip families with minimum resource application specific integrated circuit (ASIC). So the area consumption. We have included many design details and and power consumption are reduced. The hardware of the principles in this literature, hoping to provide a useful design will be reduced, and then cost will be reduced. So it guide for implementing integrated design in real time is called as ECONOMICAL MIPS (ECO MIPS). The International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025

KURAMANA HARIKA, SOLOMON J V GOTHAM project can be extended up to 64 bit sizes as per the client Author’s Profile: requirements. And also we can increase the lengths of instruction set and memory unit. Using this IP core we can KURAMANA HARIKA, has completed easily develop the MIPS application specific integrated B.Tech(E.C.E) from gayatri vidya circuit (ASIC). parishad college of engineering, pursuing M.Tech in kaushik college of VI. REFERENCES engineering, affiliated to JNTUK, Andhra [1] Xilinx Corp, Spartan 3 data sheet, 2003. Pradesh, India. Her main research interest includes in Electronics, Embedded & [2] Xilinx Corp, Xilinx ISE 6.1i Lib Guide, 2003. VLSI Systems.

[3] Zhao-yong Zhou and Li Tiecai, FPGA Realization of a High-performance Servo Controller for PMSM, APEC04, SOLOMON JV GOTHAM, did B.Tech 2004. from VR Siddhartha College of Engineering, affiliated to Acharya [4] John Reid Hauser, Augmenting a Microprocessor with Nagarjuna University, Vijayawada and Reconfigurable Hardware, PHD thesis. UC Berkeley, M.Tech from GITAM college of 2000. Engineering, affiliated to Andhra University, Andhra Pradesh, India. His [5] John R. Hauser and JohnWawrzynek, Garp: A MIPS main research interests include Processor with a Reconfigurable Coprocessor, FCCM '97. Communications, Signal Processing, VLSI and Biomedical Signal Processing. And having 22 yrs experience. [6] M. Gschwind and D.Maurer, An extendable MIPS-I processor kernel in VHDL for hardware/software co- design, Proceedings of the conference on European design automation, 1996.

[7] R. Hartenstein, A decade of reconfigurable computing: a visionary retrospective, Proceedings of the conference on Design, automation and test in Europe (March 2001).

[8] Jeffrey A. Jacob and Paul Chow, Memory Interfacing and Instruction Specification for Reconfigurable Processors, international symposium on Field programmable gate arrays.

[9] Kiran Puttegowda, Context Switching Strategies in a Run- Time Reconfigurable system, PHD thesis: Virginia Polytechnic Institute and State University, 2002.

[10] Xizhi Li and Tiecai Li, ECOMIPS: An Economic MIPS CPU Design on FPGA, Proceedings of the 4th IEEE International Workshop on System-on-Chip for Real-Time Applications, 2004.

International Journal of VLSI System Design and Communication Systems Volume.02, IssueNo.01, January-2014, Pages:0018-0025