Computer Architecture TDTS10

Outline ! Internal Structure of the CPU Computer Architecture ! Instruction Pipelining TDTS10 Erik Larsson Department of Computer Science Linköping University Sweden 2 arrows->buses CPU data and control CPU Input Main Output CPU device memory device Secondary memory 3 4 Program execution Registers 0000101110001011 = MOVE Y,R3 ! Program Counter (PC): holds the address of the instruction to be fetched. ! Instruction Register (IR): holds the last instruction fetched. (5) Store the data in register ! Memory Address Register (MAR): holds the address of a (3) Decode instruction; number 3 00001 – MOVE memory location that is to be read or written. 01110001 - Address (1) Get the 011 – Reg3 ! Memory Buffer Register (MBR): holds the data to be written to instruction at memory or the data most recently read. 00001000 ! Program Status Word (PSW): Condition Code Flags + other bits (2) Move the instruction defining the status of the CPU (interrupt enabled/disabled, 0000101110001011 supervisor, etc.) to CPU (4) Get the data at 01110001 5 6 Registers Registers ! Some architectures provide a set of registers which can be ! The registers in the CPU - top level of the memory hierarchy used without restrictions as operands for any opcode and as ! User visible registers: can be accessed by assembly language address registers; these are so called general-purpose programmers. registers. ! Control and Status registers: used by the Control Unit to control the ! Often the architecture creates a separation between: operation of the CPU; not directly accessible by the programmer. ! data registers: can be used to hold only data. Some architectures ! Many general purpose registers -> large number of bits for encoding impose restrictions to the use of data registers: for example there register operands; specialization of registers reduces this need. can be disjoint sets of registers for integer and for floating point computation. ! Too small number of registers creates problems to the programmer and leads to an increased memory traffic. ! address registers: registers used only for address representation and computation: base registers, index registers, stack pointer, etc. ! The number of general-purpose or data registers is often between 8 In some architectures address registers can be specialized for - 32. some of the previous functions. ! RISC processors often have a very large number of registers (~ 100) 7 8 Some Examples of Register Instruction cycle Organizations ! Z8000: 16 General purpose registers; no restrictions in use FI EI FI EI FI EI ! Intel 80X86, Pentium: FI ! 4 Data registers Instr. 1 Instr. 2 Instr. 3 ! 4 Index&address registers ! 4 Base (segment) registers EI ! Some of the Address registers can also be used for general Each instruction takes T time; purpose total time for 3 instructions is 3*T ! PowerPC: ! 2 groups of General purpose registers, each of 32 registers; one Main group is for integer (fixed point) computation, the other one for CPU floating point computation. memory 9 10 Pipelining Pipelining FI DI FI CO FI EI FI EI FI EI FI EI 3*T FI EI FO FI EI EI EI 2*T WO FI: Fetch Instruction Main CPU DI: Decode Instruction memory Main CO: Calculate operand CPU memory FO: Fetch Operand EI: Execute Instruction WO: Write Operand 11 12 FI: Fetch Instruction DI: Decode Instruction Pipelining CO: Calculate operand Pipelining FO: Fetch Operand EI: Execute Instruction WO: Write Operand FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO FI DI CO FO EI WO 2*T 13 14 Pipelining Pipelining ! Instruction execution is complex and several operations are ! After N-1 instructions, all N stages are working: now, the executed successively. pipeline works providing maximal parallelism. ! This implies much hardware, but only one part of the hardware ! Many stages provides better performance. works at a given moment. ! However many stages: ! In pipelining instructions are overlapped in execution but: ! increases the overhead ! no additional hardware ! increases CPU complexity ! different parts of the hardware work for different instructions ! makes it difficult to keep pipeline full ! The pipeline is similar to an assembly line: ! 80486 and Pentium: ! the work of in instruction is broken into smaller steps ! five-stage pipeline for integer instr. ! each step is a pipe stage ! eight-stage pipeline for FP instr. ! The time required for moving an instruction from one stage to the next: a machine cycle (often this is one clock cycle). The ! PowerPC: execution of one instruction takes several machine cycles as it ! four-stage pipeline for integer instr. passes through the pipeline. ! six-stage pipeline for FP instr. 15 16 Program execution Pipeline Hazards 0000101110001011 = MOVE Y,R3 ! Structural hazards ! Data hazards (5) EI -WO ! Control hazards Store the data (3) DI in register 3 00001 – MOVE 01110001 - Address (1) - FI - Get 011 – Reg3 the instruction at 00001000 Pipeline hazards prevent the next instruction (2) - FI - The instruction is said to be stalled. Move the instruction When an instruction is stalled, all instructions later in the 0000101110001011 pipeline than the stalled instruction are also stalled. (4) - FO - Get Instructions earlier than the stalled one can continue. the data at No new instructions are fetched during the stall. 01110001 17 18 Structural hazards Structural hazards ADD R4, X FI DI CO FO EI WO ADD R4, X FI DI CO FO EI WO Instruction 2 FI DI CO FO EI WO Instruction 2 FI DI CO FO EI WO Instruction 3 FI DI CO FO EI WO Instruction 3 FI DI CO FO EI WO Instruction 4 FI DI CO FO EI WO Instruction 4 FI FI DI CO FO EI WO Instruction 5 FI DI CO FO EI WO Instruction 5 FI DI CO FO EI WO Instruction 6 FI DI CO FO EI WO Instruction 7 FI DI CO FO EI WO Penalty: 1 cycle Structural hazards occur when a certain resource (memory, functional unit) is requested by more than one instruction at 19 20 the same time. Structural hazards Data hazards MUL R2,R3 // R2=R2*R3 FI DI CO FO EI WO ADD R1,R2 //R1=R1+R2 FI DI CO FO EI WO Instruction 3 FI DI CO FO EI WO R2 needed before data is done! 21 22 Forwarding (bypassing) can Data hazards Data Hazards handle some hazards Skips WO MUL R2,R3 // R2=R2*R3 FI DI CO FO EI WO ADD R1,R2 //R1=R1+R2 FI DI CO DI CO FO EI WO Instruction 3 FI DI FI DI CO FO EI WO Penalty: 2 cycles 23 24 Data hazards Control hazards BR - change value of program counter - make CPU to execute instructions instruction1 at another part of the program instruction2 MUL R2,R3 // R2=R2*R3 FI DI CO FO EI WO BR target ADD R1,R2 //R1=R1+R2 FI DI CO DI FO EI WO instruction4 Instruction 3 FI DI FI CO FO EI WO instruction5 instruction6 target: instruction7 instruction8 Penalty: 1 cycles 25 26 Control hazards Control Hazards Target known ! Conditional branch ADD R1,R2 R1 <- R1 + R2 BR target FI DI CO FO EI WO BEZ TARGET branch if zero instruction 4 FI DI CO FI DI CO FO EI WO instruction i+1 Instruction 5 DI FI FI DI CO FO EI WO - - - - - - - - - - - - - TARGET - - - - - - - - - - - - - ! Two alternatives How solve conditional jumps? Something is wrong Control register + update PC! instruction 7 ! branch is taken and instruction 8 ! branch is not taken. Penalty: 3 cycles 27 28 instruction 4 Control hazards Control hazards Evaluation ok Evaluation ok Assumption: Branch is taken Assumption: Branch is not taken Target known Target known ADD R1,R2 FI DI CO FO EI WO ADD R1,R2 FI DI CO FO EI WO BEZ TARGET FI DI CO FO EI WO BEZ TARGET FI DI CO FO EI WO Instruction i+1 FI DI CO FI DI CO FO EI WO Instruction i+1 FI DI CO DI CO FO EI WO Something can be wrong Something can be wrong instruction at target With conditional branch - penalty even if the branch has not been taken. This is because we have to wait until the branch condition is available. Penalty: 3 cycles Penalty: 2 cycles Branch instructions represent a major problem in assuring an optimal flow through the pipeline. Several approaches have been taken for reducing 29 branch penalties. 30 instruction 4 instruction 4 Reducing Pipeline Branch Penalties Instruction Fetch Units and Instruction Queues Most processors employ sophisticated fetch units that fetch instructions before they are needed ! Branch instructions can dramatically affect pipeline performance. and store them in a queue. ! Some statistics: ! 20% - 35% of the instructions executed are branches ! ~ 65% of the branches actually take the branch ! Conditional branches are more frequent than unconditional ones ! Techniques: ! Delayed branch ! Branch prediction The fetch unit also has the ability to recognize branch instructions and to generate the target address. Control operations (conditional and unconditional Thus, penalty produced by unconditional branches can be drastically reduced: the fetch unit branch) are very frequent in current programs. computes the target address and continues to fetch instructions from that address, which are Techniques: stop pipleline -> preformance goes sent to the queue. down. Thus, the rest of the pipeline gets a continuous stream of instructions, without stalling. The rate at which instructions can be read (from the instruction cache) must be sufficiently high to avoid an empty queue. With conditional branches penalties can not be avoided. The branch condition, which usually depends 31 on the result of the preceding instruction, has to be known in order to determine 32 Delayed branching Delayed branching Assumption: Branch is taken ! The idea is to let the CPU do some useful work during stalling ADD R1,R2 FI DI CO FO EI WO ! The CPU always executes the instruction that follows after the branch and only then alters (if necessary) the sequence of BEZ TARGET FI DI CO FO EI WO execution.

Load more