Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA
Total Page:16
File Type:pdf, Size:1020Kb
I.J.Modern Education and Computer Science, 2013, 5, 49-59 Published Online June 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ ijmecs.2013.05.06 Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA Tina Daghooghi Shahid Chamran University, Ahvaz, Iran E-ma il: [email protected] Abstract— This paper presents the design and architecture must contain all of this state. Based on the development of a high performance and low power current architectural state, the processor executes a MIPS microprocessor and implementation on FPGA. In particular instruction with a particular set of data to this method we for achieving high performance and low produce a new architectural state. Some power in the operation of the proposed microprocessor microarchitectures contain additional nonarchitectural use different methods including, unfolding state to either simplify the logic or improve performance transformation (parallel processing), C-slow retiming [2]. The IBM System z10™ microprocessor is currently technique, and double edge registers are used to get even the fastest running 64-bit CISC (co mple x instruction set reduce power consumption. Also others blocks designed computer) microprocessor. This microprocessor operates based on high speed digital circuits. Because of at 4.4 GHz and provides up to two times performance feedback loop in the proposed architecture C-slo w improvement compared with its predecessor, the System retiming can enhance designs that contain feedback z9® microprocessor. In addition to its ultrahigh- loops. The C-slow retiming is well-known for frequency pipeline, the z10™ microprocessor offers optimization and high performance technique, it such performance enhancements as a sophisticated automatically rebalances the registers in the proposed branch-prediction structure, a large second-level private design. The proposed high performance microprocessor cache, a data-prefetch engine, and a hardwired decimal is modeled and verified using FPGA and simulation floating-point arithmetic unit. The z10 microprocessor results. The proposed methods in microprocessor have also implements new architectural features that allow been successfully synthesized and implemented in better software optimization across compiled Quartus II 9.1 and Stratix II FPGA, to target device applications. These features include new instructions EP2S15F484C3, and power is analyzed with Xpower that help shorten the code path lengths and new facilities analyzer. Results demonstrate that the proposed method for software-directed cache management and the use of has high performance. 1-M B v irtual pages [3]. In recent years, a number of researches have been Index Terms— High Performance, Lo w Power, FPGA , proposed for Implementation of microprocessor by using double edge register, Combinational Logic. the FPGA by some companies [4-11]. In continuing we present some researches, The Aviation Microelectronic Center of NPU (Northwestern Polytechnical University) I. INTRODUCTION has recently completed the development of a 32-bit super-scalar RISC microprocessor, called "Longtium" Microcontrollers and microprocessors are used in R2. In [4], the architecture of "Longtium" R2 is everyday systems. Basically, any electronic systems that presented. As explained in [5], the PowerPC 603e require computation or instruction execution require a microprocessor is a high performance, low cost, low microcontroller or microprocessor. Therefore, at the core power microprocessor designed for use in portable of electronic systems with computational capability is a computers. The 603e is an enhanced version of the microprocessor. Microprocessors have grown from 8 PowerPC 603 microprocessor and extends the bits to 16 bits, 32 bits, and currently to 64 bits. performance range of the PowerPC microprocessor Microprocessor architecture has also grown from family of portable products. The enhancements include complex instruction set computing (CISC) based on increasing the frequency to 100 MHZ doubling the on- reduced instruction set computing (RISC) based on a chip instruction and data caches to 16K bytes each, combination of RISC-CISC based and currently very increasing the cache associativity to 4-way set- long instruction word (VLIW) based [1]. So me associative, adding an extra integer unit, and increasing microprocessors are optimized for high-performance the throughput of stores and misaligned accesses. Three servers, whereas others are optimized for long battery new bus modes are added to allow for more flexibility in life in laptop computers. A computer architecture is system design. The estimated performance of the 603e at defined by its instruction set and architectural state. The 100 MHz is 120 SPECint92 and 105 SPECfp92. In [6], a architectural state of the MIPS processor consists of the new microprocessor design framework, called DOTTA program counter and the 32 registers. Any MIPS micro Copyright © 2013 M ECS I.J. Modern Education and Computer Science, 2013, 5, 49-59 50 Design and Development MIPS Processor Based on a High Performance and Low Power Architecture on FPGA (dynamic operation transport triggered array) is way on FPGA. We applied several methods to introduced. An FPGA (field programmable gate array) microarchitecture the performances and the implementation of DOTTA is presented. The aim of this effectiveness of the proposed methods of the throughput new processor framework is to eliminate bottlenecks rate and hardware cost for the proposed structure are introduced by traditional microprocessor architectures. given. This paper is organized as follows. An overview The framework defines a task-specific microprocessor of the MIPS processor will be given in section II. In that is application customisable on the target system it is section III description of the proposed methods is operating on; this has been achieved by using presented. A comparison of performance, power Xilinxtrade FPGA fabric. In [7], a 8-b it CISC consumption, chip utilization that verifies the microprocessor core development which is intended as performance of the proposed work is focus of section IV. an open core for teaching applications in the digital In section V conclusion of the paper is presented. systems laboratory. The core is fully open and therefore, the user can have access to all internal signals as well as the opportunity to make changes to the structure itself II. MIPS PROCESSOR which is very useful when lecturing microprocessor design. The main advantages of the present core, MIPS is a 32-bit architecture, so its a 32-bit data path. The control unit receives the current instruction from the compared with commercially available equivalent cores, data path and tells the datapath how to execute that are not being vendor sensitive allowing its implementation in almost any FPGA family and being instruction. Specifically, the control unit produces an open core, that can be fully monitored and modified multiplexer select, register enable, and memory writes signals to control the operation of the data path. The to fit specific design constraints. Several tests were program counter is an ordinary 32-bit register. Its output, performed to the microprocessor core, including an ’ embedded microcontroller with RAM, ROM and I/O PC, points to the current instruction. Its input, PC , indicates the address of the next instruction. The capabilities. The present development includes a meta- instruction memory has a single read port. It takes a 32- assembler and linker to embed user programs in a ROM, which is automatically generated as a VHDL description. bit instruction address input, A, and reads the 32-bit data (i.e., instruction) from that address onto the read data In [8], a new methodology based on practical sessions × with real devices and chips is proposed. Simple designs output, RD. The 32-element 32-bit register file has of microprocessors are exposed to the students at the two read ports and one write port. The read ports take 5- bit address inputs, A1 and A2, each specifying one of beginning, raising the complexity gradually toward a 5 final design with a multiprocessor integrated in a single 2 =32 reg isters as source operands. They read the 32-bit FPGA chip. In [9], the implementation and delivery of a register values onto read data outputs RD1 and RD2, microprocessor based on laboratory design is presented respectively. The write port takes a 5-bit address input, design laboratory, in an attempt to achieve tighter A3; a 32-bit write data input, WD; a write enable input, integration with theory and improve student's WE3; and a clock. If the write enable is 1, the register performance. The design process follows a hierarchical file writes the data into the specified register on the structure, requiring students to first build basic devices rising edge of the clock. The data memory has a single such as logic gates, multiplexers, one-bit me mo ry cells read/write port. If the write enable, WE, is 1, it writes etc. These basic devices are then used to build an ALU, data WD into address A on the rising edge of the clock. registers, (registers are used to build larger memories), a If the write enable is 0, it reads address A onto RD. The data path and a control unit. Designs are completed instruction memory, register file, and data memory are without any high level programming ensuring that all read comb inationally. In other words, if the address students cannot rely on the compiler to transform changes, the new data appears at RD after some specifications into imp le mentations. In [10], several case propagation delay; no clock is involved. They are studies that examine the effects of various embedded written only on the rising edge of the clock. In this processor memory strategies and peripheral sets. fashion, the state of the system is changed only at the Comparing the benchmark system to a real-world clock edge. The address, data, and write enable must system, the study examines techniques for optimizing setup sometime before the clock edge and must remain the performance and cost of an FPGA embedded stable until a hold time after the clock edge.