Introduction to the MIPS Processor
Total Page:16
File Type:pdf, Size:1020Kb
Introduction This tutorial was written by Edsko De Vries and makes use of the Vivio animation of a DLX/MIPS processor. The processor we will be considering in this tutorial is the MIPS processor. The MIPS processor, designed in 1984 by researchers at Stanford University, is a RISC (Reduced Instruction Set Computer) processor. Compared with their CISC (Complex Instruction Set Computer) counterparts (such as the Intel Pentium processors), RISC processors typically support fewer and much simpler instructions. The premise is, however, that a RISC processor can be made much faster than a CISC processor because of its simpler design. These days, it is generally accepted that RISC processors are more efficient than CISC processors; and even the only popular CISC processor that is still around (Intel Pentium) internally translates the CISC instructions into RISC instructions before they are executed [1]. RISC processors typically have a load store architecture. This means there are two instructions for accessing memory: a load (l) instruction to load data from memory and a store (s) instruction to write data to memory. It also means that none of the other instructions can access memory directly. So, an instruction like "add this byte from memory to register 1" from a CISC instruction set would need two instructions in a load store architecture: "load this byte from memory into register 2" and "add register 2 to register 1". CISC processors also offer many different addressing modes. Consider the following instruction from the Intel 80x86 instruction set (with simplified register names): add r1, [r2+r3*4+60] // i86 (not MIPS) example This instruction loads a value from memory and adds it to a register. The memory location is given in between the square brackets. As you can see, the Intel instruction set, as is typical for CISC architectures, allows for very complicated expressions for address calculations or "addressing modes". The MIPS processor, on the other hand, only allows for one, rather simple, addressing mode: to specify an address, you can specify a constant and a register. So, the above Intel instruction could be translated as: slli r3, r3, 2 // r3 := r3 << 2 (i.e. r3 := r3 * 4) add r2, r2, r3 // r2 := r2 + r3 l r4, 60(r2) // r4 := memory[60+r4] add r1, r1, r4 // r1 := r1 + r4 We need four instructions instead of one, and an extra register (r4) to do what can be done with one instruction in a CISC architecture. The internal circuitry of the RISC processor is much simpler, however, and can thus be made very fast. How this is done, is the topic of this tutorial. Basic Processor Architecture The execution of an instruction in a processor can be split up into a number of stages. How many stages there are, and the purpose of each stage is different for each processor design. Examples - 1 - includes 2 stages (Instruction Fetch / Instruction Execute) and 3 stages (Instruction Fetch, Instruction Decode, Instruction Execute). The MIPS processor has 5 stages: IF The Instruction Fetch stage fetches the next instruction from memory using the address in the PC (Program Counter) register and stores this instruction in the IR (Instruction Register) ID The Instruction Decode stage decodes the instruction in the IR, calculates the next PC, and reads any operands required from the register file. EX The Execute stage "executes" the instruction. In fact, all ALU operations are done in this stage. (The ALU is the Arithmetic and Logic Unit and performs operations such as addition, subtraction, shifts left and right, etc.) MA The Memory Access stage performs any memory access required by the current instruction, So, for loads, it would load an operand from memory. For stores, it would store an operand into memory. For all other instructions, it would do nothing. WB For instructions that have a result (a destination register), the Write Back writes this result back to the register file. Note that this includes nearly all instructions, except nops (a nop, no-op or no-operation instruction simply does nothing) and s (stores). Consider the execution of the following program and try to consider the operation of each of the 5 processor stages (IF, ID, EX, MA and WB) for each individual instruction. (Note that register r0 in the MIPS processor is always 0). l r1, 0(r0) // r1 := memory[0] l r2, 1(r0) // r2 := memory[1] add r3, r1, r2 // r3 := r1 + r2 s 2(r0), r3 // memory[2] := r3 Try it out. Follow the execution of this program in the MIPS animation and check if you were right. If you don't understand each and every bit of the diagram yet, don't despair - each stage will be explained separately and in detail in the next section. Details of the Processor Stages - 2 - As I said before, the Instruction Fetch phase fetches the next instruction. First, it sends the contents of the PC register, which contains the address for the next instruction, to the instruction memory (1). The instruction memory will then respond by sending the correct instruction. This instruction is sent on to the next (instruction decode) phase (2). The instruction decode phase will calculate the next PC and will send it back to the IF phase (4) so that the IF phase knows which instruction to fetch next. To be able to execute a jr instruction (which changes the PC to the contents of a register), we also need a connection from the register file to the PC (5). One of these (4 or 5) is then selected (MUX1) to update the PC on the next rising clock edge. The control line for MUX1 (not shown on the diagram) would also come from the ID phase, and is based on instruction type. - 3 - The Instruction Decode phase then has two main tasks: calculate the next PC and fetch the operands for the current instruction. There are three possibilities for the next PC. For the most common case, for all "normal" (meaning instructions that are not branches or jumps), we must simply calculate PC+4 to get the address of the next instruction (3, ADD4). For jumps and branches (if the branch is taken), we might also have to add some immediate value (the branch offset) to the PC (ADDi). This branch offset is encoded in the instruction itself (6). For the jr and jalr instructions, we need to use the value of a register instead. MUX3 selects between the first two posibilities (+4 or +immediate), and MUX1 (in the IF phase) selects between MUX3 and the third posibility (value from the register file). The control line for MUX3 comes from a zero-detector (not shown), which in turn gets its input from the register file (thin line marked "a"). The control line for MUX1 is not shown and is based on the instruction type. Then, the second main task is to fetch the operands for the next instruction. These operands come from the register file for all but two instructions (7, 8). The two instructions that are the exception are jal and jalr. These two jump instructions save the address of the next instruction in a destination register, so instead of sending an operand from the register file, we need to send the contents of the PC+4 (10). MUX5 selects between (7) and (10). Again, the control line for this MUX (not shown) would also come from the ID phase, and is be based on the instruction type. - 4 - The Execution phase "executes" the instruction. Any calculations necessary are done in this phase. These calculations are all done by the ALU (Arithmetic and Logic Unit). The internals of these unit is beyond the scope of this tutorial, but you should find an explanation of this unit in any elementary book on digital logic, see for example [2]. Conceptually this stage is very simple. The ALU needs two operands. These either come from the ID phase (8, 9) and thus in turn from the register file (or PC+4 for jump and link instructions), or one operand comes from the ID phase (8) and the other comes from the instruction register (6) to supply an immediate value. MUX7 selects between (9) and (6). The control line for this MUX is not shown, but would come from the ID phase and is based on the instruction type (it will select (9) for instructions that operate on two registers, and (6) for instructions that work one one register and one immediate value). (11) is used for the s (store instruction). Consider the following instruction: s 10(r0), r2 // memory[10+r0] := r2 The calculation 10+r0 will be done by the ALU (so which source does MUX7 select?), and the result of this calculation will be passed on to the MA stage (12). In the MA phase, however, we will also need to know the contents of r2. This is why we need (11). - 5 - The purpose of the Memory access phase is to store operands into memory or load operands from memory. So, for all instructions except loads and stores, the MA simply passes on the value from the ALU on to the WB stage (12, 16). For loads and stores, the MA needs to send the effective address calculated by the ALU to memory (13). For loads, the memory will then respond by sending the requested data back (15). For stores, we need to send the data to be stored along with the effective address (11, 14), and the memory will respond by updating itself. Finally, MUX9 will select between (12) and (15) for the input for (16). As per usual, this decision would be based on the instruction type and the control line would come from the ID phase (not shown).