6X86 PROCESSOR
Total Page:16
File Type:pdf, Size:1020Kb
Abbreviated Data Book Version 1.1 Contains selected pages from: 6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Advancing the Standards Introduction ¨ Sixth-Generation Superscalar ¨ Best-in-Class Performance Superpipelined Architecture Through Superior Architecture - Dual 7-stage integer pipelines - Intelligent instruction dispatch - High performance 80-bit FPU with 64-bit interface - Register renaming - Operating frequencies of 100,110, 120, 133 MHz - Out-of-order completion - 16-KByte unified write-back L1 cache - Data dependency removal - Multi-branch prediction ¨ X86 Instruction Set Compatible - Speculative execution - Runs Windows 95, Windows 3.x, Windows NT, ¨ 64-Bit Data Bus DOS, UNIX, OS/2, Solaris, and others - P54C socket compatible - Optimized to run both 16-bit and 32-bit - Supports “one-plus-four” and linear burst modes software applications The Cyrix 6x86™ processor is a superscalar, and an on-chip floating point unit. The superpipelined superpipelined sixth generation CPU that offers the architecture reduces timing constraints and increases highest level of performance available for desktop frequency scalability to 150 MHz and beyond. personal computers. Optimized to run both 16-bit and Additionally, the integer and floating point units are 32-bit software applications, the 6x86 processor is fully optimized for maximum instruction throughput by compatible with the x86 instruction set and delivers using advanced architectural techniques including industry-leading performance running Windows® 95, register renaming, out-of-order completion, data Windows, Windows NT, OS/2®, DOS Solaris, UNIX® dependency removal, branch prediction and and other operating systems. speculative execution. These design innovations eliminate many data dependencies and resource The 6x86 processor achieves top performance through conflicts to achieve high performance when executing the use of two optimized superpipelined integer units existing 16-bit and future 32-bit software applications. Instruction Address IF ID1 Sequence Instruction Data 128 32 Control ID2 ID2 Lines AC1 AC1 Address X Data A31-A3 AC2 AC2 Bus 32 BE7#-BE0# EX EX 256-Byte Instruction Interface 32 WB WB Line Cache Unit FPU OpCode X Pipe Y Pipe Y Data Floating Point Queue Integer Unit 32 16- KByte Unified Cache Data 32 32 D63-D0 Cache Unit 64 64 Floating Point X Linear Y Linear Processor Address Address CLK Floating Point Unit 64 Y Physical 32 X Physical Address Address Control Memory Management Unit 32 Bus Interface 1738502 PRELIMINARY March 1996 Order Number: 94175-01 1- 6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Advancing the Standards Introduction Product Overview 1. ARCHITECTURE The on-chip FPU allows floating point instruc- OVERVIEW tions to execute in parallel with integer instructions and features a 64-bit data inter- The Cyrix 6x86 CPU is a leader in the sixth face. The FPU incorporates a four-deep generation of high performance, x86-compat- instruction queue and a four-deep store queue ible processors. Increased performance is to facilitate parallel execution. accomplished by the use of superscalar and superpipelined design techniques. The 6x86 CPU operates from a 3.3 volt power supply resulting in reasonable power The 6x86 CPU is superscalar in that it contains consumption at all frequencies. In addition, two separate pipelines that allow multiple the 6x86 CPU incorporates a low power instructions to be processed at the same time. suspend mode, stop clock capability, and The use of advanced processing technology system management mode (SMM) for power and the increased number of pipeline stages sensitive applications. (superpipelining) allow the 6x86 CPU to achieve clocks rates of 100 MHz and above. Through the use of unique architectural 1.1 Major Functional features, the 6x86 processor eliminates many Blocks data dependencies and resource conflicts, resulting in optimal performance for both The 6x86 processor consists of five major 16-bit and 32-bit x86 software. functional blocks, as shown in the overall block diagram on the first page of this manual: The 6x86 CPU contains two caches: a 16-KByte dual-ported unified cache and a • Integer Unit 256-byte instruction line cache. Since the • Cache Unit unified cache can store instructions and data in • Memory Management Unit any ratio, the unified cache offers a higher hit • Floating Point Unit rate than separate data and instruction caches • Bus Interface Unit of equal size. An increase in overall cache-to-integer unit bandwidth is achieved by Instructions are executed in the X and Y pipe- supplementing the unified cache with a small, lines within the Integer Unit and also in the high-speed, fully associative instruction line Floating Point Unit (FPU). The Cache Unit cache. The inclusion of the instruction line stores the most recently used data and instruc- cache avoids excessive conflicts between code and data accesses in the unified cache. PRELIMINARY 1-1 Integer Unit Advancing the Standards tions to allow fast access to the information 1.2 Integer Unit by the Integer Unit and FPU. The Integer Unit (Figure 1-1) provides parallel Physical addresses are calculated by the instruction execution using two seven-stage Memory Management Unit and passed to integer pipelines. Each of the two pipelines, the Cache Unit and the Bus Interface Unit X and Y, can process several instructions (BIU). The BIU provides the interface simultaneously. between the external system board and the processor’s internal execution units. Instruction Fetch Instruction Decode 1 In-Order Instruction Instruction Processing Decode 2 Decode 2 Address Address Calculation 1 Calculation 1 Address Address Calculation 2 Calculation 2 Execution Execution Out-of-Order Completion Write-Back Write-Back X Pipeline Y Pipeline 1727301 Figure 1-1. Integer Unit 1-2 PRELIMINARY Integer Unit 1 The Integer Unit consists of the following already in each pipeline and how fast they are pipeline stages: expected to flow through the remaining pipe- line stages. • Instruction Fetch (IF) • Instruction Decode 1 (ID1) The Address Calculation function contains • Instruction Decode 2 (ID2) two stages, AC1 and AC2. If the instruction • Address Calculation 1 (AC1) refers to a memory operand, the AC1 calcu- • Address Calculation 2 (AC2) lates a linear memory address for the instruc- • Execute (EX) tion. • Write-Back (WB) The AC2 stage performs any required memory The instruction decode and address calcula- management functions, cache accesses, and tion functions are both divided into superpipe- register file accesses. If a floating point instruc- lined stages. tion is detected by AC2, the instruction is sent to the FPU for processing. 1.2.1 Pipeline Stages The Execute (EX) stage executes instructions The Instruction Fetch (IF) stage, shared by using the operands provided by the address both the X and Y pipelines, fetches 16 bytes of calculation stage. code from the cache unit in a single clock The Write-Back (WB) stage is the last IU cycle. Within this section, the code stream is stage. The WB stage stores execution results checked for any branch instructions that could either to a register file within the IU or to a affect normal program sequencing. write buffer in the cache control unit. If an unconditional or conditional branch is detected, branch prediction logic within the IF 1.2.2 Out-of-Order stage generates a predicted target address for Processing the instruction. The IF stage then begins If an instruction executes faster than the fetching instructions at the predicted address. previous instruction in the other pipeline, the The superpipelined Instruction Decode instructions may complete out of order. All function contains the ID1 and ID2 stages. instructions are processed in order, up to the ID1, shared by both pipelines, evaluates the EX stage. While in the EX and WB stages, code stream provided by the IF stage and instructions may be completed out of order. determines the number of bytes in each If there is a data dependency between two instruction. Up to two instructions per clock instructions, the necessary hardware interlocks are delivered to the ID2 stages, one in each are enforced to ensure correct program pipeline. execution. Even though instructions may The ID2 stages decode instructions and send complete out of order, exceptions and writes the decoded instructions to either the X or Y resulting from the instructions are always pipeline for execution. The particular pipeline issued in program order. is chosen, based on which instructions are PRELIMINARY 1-3 Integer Unit Advancing the Standards 1.2.3 Pipeline Selection 1.2.4 Data Dependency Solutions In most cases, instructions are processed in either pipeline and without pairing constraints When two instructions that are executing in on the instructions. However, certain instruc- parallel require access to the same data or tions are processed only in the X pipeline: register, one of the following types of data dependencies may occur: • Branch instructions • Floating point instructions • Read-After-Write (RAW) • Exclusive instructions • Write-After-Read (WAR) • Write-After-Write (WAW) Branch and floating point instructions may be paired with a second instruction in the Y pipe- Data dependencies typically force serialized line. execution of instructions. However, the 6x86 CPU implements three mechanisms that allow Exclusive Instructions cannot be paired with parallel execution of instructions containing instructions in the Y pipeline.