RM5271™ Microprocessor with External Cache Interface
Total Page:16
File Type:pdf, Size:1020Kb
RM5271™ Microprocessor with External Cache Interface Document Rev. 1.3 Date: 02/2000 FEATURES • Dual Issue superscalar microprocessor • High-performance floating point unit - up to 700 MFLOPS — 200, 250, 266, 300, 350 MHz operating frequencies — Single cycle repeat rate for common single precision opera- — 420 Dhrystone 2.1 MIPS maximum tions and some double precision operations • High-performance system interface — Two cycle repeat rate for double precision multiply and dou- — 64-bitmultiplexed system address/data bus for optimum ble precision combined multiply-add operations price/performance with up to 125MHz operation frequency — Single cycle repeat rate for single precision combined multi- — High-performance write protocols to maximize uncached ply-add operation write bandwidth • MIPS IV instruction set — Processor clock multipliers 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9 — Floating point multiply-add instruction increases perfor- — IEEE 1149.1 JTAG boundary scan mance in signal processing and graphics applications • Integrated on-chip caches — Conditional moves to reduce branch frequency — 32KB instruction and 32KB data - 2-way set associative — Index address modes (register + register) — Virtually indexed, physically tagged • Embedded application enhancements — Write-back and write-through on a per-page basis — Specialized DSP integer Multiply-Accumulate instructions — Pipeline restart on first doubleword for data cache misses and 3-operand multiply instruction • Integrated secondary cache controller (R5000 compatible) — Instruction and Data cache locking by set — Supports 512K or 2MByte block write-through secondary — Optional dedicated exception vector for interrupts • Integrated memory management unit • Fully static CMOS design with power down logic — Fully associative joint TLB (shared by I and D translations) — Standby reduced power mode with WAIT instruction — 48 dual-entries map 96 pages — 2.5V core with 3.3V IO’s — Variable page size (4KB to 16MB in 4x increments) • 304-pin SBGA package (31x31mm) BLOCK DIAGRAM Extenal Cache Controller Primary Data Cache DTag ITag Primary Instruction Cache 2-way Set Associative DTLB ITLB 2-way Set Associative A/D Bus Pad Bus Store Buffer Pad Buffer Instruction Dispatch Unit Write Buffer Address Buffer FP Integer Read Buffer Instruction Instruction Register Register FP Bus Integer Bus D Bus Floating-Point Joint TLB Load Aligner Load/Align DVA Coprocessor 0 Integer Register File Floating-Point Register File Integer Address/Adder System/Memory Packer/Unpacker Shifter/Store Aligner Control IVA Logic Unit PC Incrementer Integer Control Floating-Point FA Bus MultAdd, Add, Sub, Branch PC Adder Cvt, Div, Sqrt Floating-Point Control ITLB Virtual DTLB Virtual Program Counter PLL/Clocks Int Mult, Div, Madd Quantum Effect Devices RM5271 Microprocessor, Document Rev. 1.3 1 www.qedinc.com DESCRIPTION The QED RM5271 is a highly integrated superscalar micro- combination with its high-throughput fully pipelined floating- processor that is ideally suited for high-end embedded con- point execution unit, the superscalar capability of the trol applications such as internetworking, high-performance RM5271 provides unparalleled price/performance in com- image manipulation, high-speed printing, and 3-D visual- putationally intensive embedded applications. ization.The RM5271 is also applicable to the low end work- station market where its balanced integer and floating-point CPU Registers performance and direct support for a large secondary cache (up to 2MB) provide outstanding price/performance The RM5271 CPU contains 32 general purpose registers, two special purpose registers for integer multiplication and division, a program counter, and no condition code bits. HARDWARE OVERVIEW Figure 1 shows the user visible state. The RM5271 offers a high-level of integration targeted at Pipeline high-performance embedded applications. The key ele- ments of the RM5271 are briefly described below. For integer operations, loads, stores, and other non-float- ing-point operations, the RM5271 uses a 5-stage pipeline. Superscalar Dispatch In addition to the 5-stage integer pipeline, the RM5271 uses an extended 7-stage pipeline for floating-point opera- The RM5271 has an asymmetric superscalar dispatch unit tions. which allows it to issue an integer instruction and a floating- point computation instruction simultaneously. With respect Figure 2 shows the RM5271 integer pipeline. Up to five to superscalar issue, integer instructions include alu, integer instructions can be executing simultaneously. branch, load/store, and floating-point load/store, while float- ing-point computation instructions include floating-point add, subtract, combined multiply-add, converts, etc. In General Purpose Registers 63 0 Multiply/Divide Registers 0630 r1 HI r2 63 0 •LO • • Program Counter •630 r29 PC r30 r31 Figure 1 CPU Registers 2 RM5271 Microprocessor, Document Rev. 1.3 Quantum Effect Devices www.qedinc.com I0 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W I1 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W I2 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W I3 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W I4 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W one cycle 1I-1R: Instruction cache access 2I: Instruction virtual to physical address translation 2R: Register file read, Bypass calculation, Instruction decode, Branch address calculation 1A: Issue or slip decision, Branch decision 1A: Data virtual address calculation 1A-2A: Integer add, logical, shift 2A: Store Align 2A-2D: Data cache access and load align 1D: Data virtual to physical address translation 2W: Register file write Figure 2 Pipeline Integer Unit performs shifts and store alignment operations. Each of these units is optimized to perform all operations in a single The RM5271 implements the MIPS IV Instruction Set processor cycle. Architecture, and is therefore fully upward compatible with applications that run on processors implementing the ear- Integer Multiply/Divide lier generation MIPS I-III instruction sets. Additionally, the RM5271 includes two implementation-specific instructions The RM5271 has a dedicated integer multiply/divide unit not found in the baseline MIPS IV ISA but that are useful in optimized for high-speed multiply and multiply-accumulate the embedded market place. These instructions are integer operations. Table 1 shows the performance of the multiply/ multiply-accumulate (MAD) and 3-operand integer multiply divide unit on each operation. (MUL). Table 1: Integer Multiply/Divide Operations The RM5271 integer unit includes thirty-two general pur- Operand Repeat Stall pose 64-bit registers, a load/store architecture with single Opcode Size Latency Rate Cycles cycle ALU operations (add, sub, logical, shift) and an autonomous multiply/divide unit. Additional register MULT/U, 16 bit 3 2 0 MAD/U resources include: the HI/LO result registers for the two- 32 bit 4 3 0 operand integer multiply/divide operations, and the pro- MUL 16 bit 3 2 1 gram counter (PC). 32 bit 4 3 2 DMULT, any760 Register File DMULTU DIV, DIVD any 36 36 0 The RM5271 has thirty-two general purpose registers with register location 0 (r0) hard wired to a zero value. These DDIV, any68680 registers are used for scalar integer operations and DDIVU address calculation. The register file has two read ports and one write port and is fully bypassed to minimize opera- The baseline MIPS IV ISA specifies that the results of a tion latency in the pipeline. multiply or divide operation be placed in the Hi and Lo reg- isters. These values can then be transferred to the general ALU purpose register file using the Move-from-Hi and Move- from-Lo (MFHI/MFLO) instructions. The RM5271 ALU consists of an integer adder/subtractor, a In addition to the baseline MIPS IV integer multiply instruc- logic unit, and a shifter. The adder performs address calcu- tions, the RM5271 also implements the 3-operand multiply lations in addition to arithmetic operations. The logic unit instruction, MUL. This instruction specifies that the multiply performs all logical and zero shift data moves. The shifter Quantum Effect Devices RM5271 Microprocessor, Document Rev. 1.3 3 www.qedinc.com result go directly to the integer register file rather than the Table 2 gives the latencies of the floating-point instructions Lo register. The portion of the multiply that would have nor- in internal processor cycles. mally gone into the Hi register is discarded. For applica- tions where it is known that the high half of the multiply Table 2: Floating-Point Instruction Cycles result is not required, using the MUL instruction eliminates Operation Latency Repeat Rate the necessity of executing an explicit MFLO instruction. fadd 4 1 The multiply-add instructions (MAD) multiplies two oper- fsub 4 1 ands and adds the resulting product to the current contents fmult 4/5 1/2 of the Hi and Lo registers. The multiply-accumulate opera- fmadd 4/5 1/2 tion is the core primitive of almost all signal processing algorithms, allowing the RM5271 to eliminate the need for a fmsub 4/5 1/2 separate DSP engine in many embedded applications. fdiv 21/36 19/34 fsqrt 21/36 19/34 Floating-Point Co-Processor frecip 21/36 19/34 frsqrt 38/68 36/66 The RM5271 incorporates a high-performance fully pipe- fcvt.s.d 4 1 lined floating-point coprocessor which includes a floating- point register file and autonomous execution units for multi- fcvt.s.w 6 3 ply/add/convert and divide/square root. The floating-point fcvt.s.l 6 3 coprocessor is a tightly coupled execution unit, decoding fcvt.d.s 4 1 and executing instructions in parallel with, and in the case fcvt.d.w 4 1 of floating-point loads and stores, in cooperation with the integer unit. The superscalar capabilities of the RM5271 fcvt.d.l 4 1 allow floating-point computation instructions to be issued fcvt.w.s 4 1 concurrently with integer instructions. fcvt.w.d 4 1 fcvt.l.s 4 1 Floating-Point Unit fcvt.l.d 4 1 The RM5271 floating-point execution unit supports single fcmp 1 1 and double precision arithmetic, as specified in the IEEE fmov 1 1 Standard 754.