MIPS R-Format Instructions
Total Page:16
File Type:pdf, Size:1020Kb
MIPS R-format Instructions op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits n Instruction fields n op: operation code (opcode) n rs: first source register number n rt: second source register number n rd: destination register number n shamt: shift amount (00000 for now) n funct: function code (extends opcode) Chapter 2 — Instructions: Language of the Computer — 1 R-format Example op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add $t0, $s1, $s2 special $s1 $s2 $t0 0 add 0 17 18 8 0 32 000000 10001 10010 01000 00000 100000 000000100011001001000000001000002 = 0232402016 Chapter 2 — Instructions: Language of the Computer — 2 MIPS I-format Instructions op rs rt constant or address 6 bits 5 bits 5 bits 16 bits n Immediate arithmetic and load/store instructions n rt: destination or source register number 15 15 n Constant: –2 to +2 – 1 n Address: offset added to base address in rs n Design Principle 4: Good design demands good compromises n Different formats complicate decoding, but allow 32-bit instructions uniformly n Keep formats as similar as possible Chapter 2 — Instructions: Language of the Computer — 3 Branch Addressing n Branch instructions specify n Opcode, two registers, target address n Most branch targets are near branch n Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits n PC-relative addressing n Target address = PC + offset × 4 n PC already incremented by 4 by this time Chapter 2 — Instructions: Language of the Computer — 4 Jump Addressing n Jump (j and jal) targets could be anywhere in text segment n Encode full address in instruction op address 6 bits 26 bits n (Pseudo)Direct jump addressing n Target address = PC31…28 : (address × 4) Chapter 2 — Instructions: Language of the Computer — 5 Addressing Mode Summary Chapter 2 — Instructions: Language of the Computer — 6 §2.11 Parallelism and Instructions:Synchronization §2.11 Synchronization n Two processors sharing an area of memory n P1 writes, then P2 reads n Data race if P1 and P2 don’t synchronize n Result depends of order of accesses n Hardware support required n Atomic read/write memory operation n No other access to the location allowed between the read and write n Could be a single instruction n E.g., atomic swap of register ↔ memory n Or an atomic pair of instructions Chapter 2 — Instructions: Language of the Computer — 7 Synchronization in MIPS n Load linked: ll rt, offset(rs) n Store conditional: sc rt, offset(rs) n Succeeds if location not changed since the ll n Returns 1 in rt n Fails if location is changed n Returns 0 in rt n Example: atomic swap (to test/set lock variable) try: add $t0,$zero,$s4 ;copy exchange value ll $t1,0($s1) ;load linked sc $t0,0($s1) ;store conditional beq $t0,$zero,try ;branch store fails add $s4,$zero,$t1 ;put load value in $s4 Chapter 2 — Instructions: Language of the Computer — 8 §2.16 Real Stuff: ARM Instructions §2.16 Real Stuff: ARM & MIPS Similarities n ARM: the most popular embedded core n Similar basic set of instructions to MIPS ARM MIPS Date announced 1985 1985 Instruction size 32 bits 32 bits Address space 32-bit flat 32-bit flat Data alignment Aligned Aligned Data addressing modes 9 3 Registers 15 × 32-bit 31 × 32-bit Input/output Memory Memory mapped mapped Chapter 2 — Instructions: Language of the Computer — 9 Compare and Branch in ARM n Uses condition codes for result of an arithmetic/logical instruction n Negative, zero, carry, overflow n Compare instructions to set condition codes without keeping the result n Each instruction can be conditional n Top 4 bits of instruction word: condition value n Can avoid branches over single instructions Chapter 2 — Instructions: Language of the Computer — 10 Instruction Encoding Chapter 2 — Instructions: Language of the Computer — 11 §2.17 Real Stuff: x86Instructions §2.17 Real Stuff: The Intel x86 ISA n Evolution with backward compatibility n 8080 (1974): 8-bit microprocessor n Accumulator, plus 3 index-register pairs n 8086 (1978): 16-bit extension to 8080 n Complex instruction set (CISC) n 8087 (1980): floating-point coprocessor n Adds FP instructions and register stack n 80286 (1982): 24-bit addresses, MMU n Segmented memory mapping and protection n 80386 (1985): 32-bit extension (now IA-32) n Additional addressing modes and operations n Paged memory mapping as well as segments Chapter 2 — Instructions: Language of the Computer — 12 The Intel x86 ISA n Further evolution… n i486 (1989): pipelined, on-chip caches and FPU n Compatible competitors: AMD, Cyrix, … n Pentium (1993): superscalar, 64-bit datapath n Later versions added MMX (Multi-Media eXtension) instructions n The infamous FDIV bug n Pentium Pro (1995), Pentium II (1997) n New microarchitecture (see Colwell, The Pentium Chronicles) n Pentium III (1999) n Added SSE (Streaming SIMD Extensions) and associated registers n Pentium 4 (2001) n New microarchitecture n Added SSE2 instructions Chapter 2 — Instructions: Language of the Computer — 13 The Intel x86 ISA n And further… n AMD64 (2003): extended architecture to 64 bits n EM64T – Extended Memory 64 Technology (2004) n AMD64 adopted by Intel (with refinements) n Added SSE3 instructions n Intel Core (2006) n Added SSE4 instructions, virtual machine support n AMD64 (announced 2007): SSE5 instructions n Intel declined to follow, instead… n Advanced Vector Extension (announced 2008) n Longer SSE registers, more instructions n If Intel didn’t extend with compatibility, its competitors would! n Technical elegance ≠ market success Chapter 2 — Instructions: Language of the Computer — 14 x86 Instruction Encoding n Variable length encoding n Postfix bytes specify addressing mode n Prefix bytes modify operation n Operand length, repetition, locking, … Chapter 2 — Instructions: Language of the Computer — 15 §2.18 Fallaciesand Pitfalls Fallacies n Powerful instruction ⇒ higher performance n Fewer instructions required n But complex instructions are hard to implement n May slow down all instructions, including simple ones n Compilers are good at making fast code from simple instructions n Use assembly code for high performance n But modern compilers are better at dealing with modern processors n More lines of code ⇒ more errors and less productivity Chapter 2 — Instructions: Language of the Computer — 16 Fallacies n Backward compatibility ⇒ instruction set doesn’t change n But they do accrete more instructions x86 instruction set Chapter 2 — Instructions: Language of the Computer — 17 Concluding Remarks n Measure MIPS instruction executions in benchmark programs n Consider making the common case fast n Consider compromises Instruction class MIPS examples SPEC2006 Int SPEC2006 FP Arithmetic add, sub, addi 16% 48% Data transfer lw, sw, lb, lbu, 35% 36% lh, lhu, sb, lui Logical and, or, nor, andi, 12% 4% ori, sll, srl Cond. Branch beq, bne, slt, 34% 8% slti, sltiu Jump j, jr, jal 2% 0% Chapter 2 — Instructions: Language of the Computer — 18 Chapter 3 Arithmetic for Computers §3.1Introduction Arithmetic for Computers n Operations on integers n Addition and subtraction n Multiplication and division n Dealing with overflow n Floating-point real numbers n Representation and operations Chapter 3 — Arithmetic for Computers — 20 §3.2 Addition andSubtraction §3.2 Integer Addition n Example: 7 + 6 n Overflow if result out of range n Adding +ve and –ve operands, no overflow n Adding two +ve operands n Overflow if result sign is 1 n Adding two –ve operands n Overflow if result sign is 0 Chapter 3 — Arithmetic for Computers — 21 Integer Subtraction n Add negation of second operand n Example: 7 – 6 = 7 + (–6) +7: 0000 0000 … 0000 0111 –6: 1111 1111 … 1111 1010 +1: 0000 0000 … 0000 0001 n Overflow if result out of range n Subtracting two +ve or two –ve operands, no overflow n Subtracting +ve from –ve operand n Overflow if result sign is 0 n Subtracting –ve from +ve operand n Overflow if result sign is 1 Chapter 3 — Arithmetic for Computers — 22 Dealing with Overflow n Some languages (e.g., C) ignore overflow n Use MIPS addu, addui, subu instructions n Other languages (e.g., Ada, Fortran) require raising an exception n Use MIPS add, addi, sub instructions n On overflow, invoke exception handler n Save PC in exception program counter (EPC) register n Jump to predefined handler address n mfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action Chapter 3 — Arithmetic for Computers — 23 Other Adders n BASICS of ADDING LOGIC n Carry-out n Sum Generation n Ripple Add n Carry Bypass n Carry Select n Carry Lookahead Chapter 3 — Arithmetic for Computers — 24 The Ripple-Carry Adder A0 B0 A1 B1 A2 B2 A3 B3 Ci,0 Co,0 Co,1 Co,2 Co,3 FA FA FA FA (= Ci,1) S0 S1 S2 S3 Worst case delay linear with the number of bits td = O(N) tadder = (N-1)tcarry + tsum Goal: Make the fastest possible carry path circuit 25 Carry-Bypass Adder P0 G1 P0 G1 P2 G2 P3 G3 Also called Carry-Skip Ci,0 Co,0 Co,1 Co,2 Co,3 FA FA FA FA P0 G1 P0 G1 P2 G2 P3 G3 BP=PoP1P2P3 Ci,0 Co,0 Co,1 Co,2 r FA FA FA FA e x e l Co,3 p i t l u M Idea: If (P0 and P1 and P2 and P3 = 1) then Co3 = C0, else “kill” or “generate”. 26 Linear Carry Select Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15 Setup Setup Setup Setup (1) "0" Carry "0" Carry "0" Carry "0" Carry "0" "0" "0" "0" (1) "1" Carry "1" Carry "1" Carry "1" Carry "1" "1" "1" "1" (5) (5) (5) (5) (5) (6) (7) (8) Multiplexer Multiplexer Multiplexer Multiplexer Ci,0 (9) Sum Generation Sum Generation Sum Generation Sum Generation S0-3 S4-7 S8-11 S12-15 (10) 27 LookAhead - Basic Idea A0, B0 A1, B1 • • • AN-1, BN-1 Ci,0 P Ci,1 P 0 1 C i, N-1 PN-1 S0 S1 • • • SN-1 Cok, ==fA()k,,Bk Cok, – 1 Gk + PkCok, – 1 28 Look-Ahead: Topology Expanding Lookahead equations: VDD Cok, = Gk + Pk ()Gk1– + Pk1– Cok, – 2 G3 G2 All the way: G1 C = G + P ()G + P ()… + P ()G + P C ok, k k k1– k1– 1 0 0 i0, G0 Ci,0 Co,3 P0 P1 P2 P3 29 Carry Lookahead Trees Co0, = G0 + P0Ci0, Co1, = G1 ++P1 G0 P1P0 Ci0, Co2, = G2 ++P2G1 P2 P1G0 + P2 P1P0Ci0, = ()G2 + P2G1 + ()P2P1 ()G0 + P0Ci0, = G 2:1 + P2:1Co0, Can continue building the tree hierarchically.