POWER CHALLENGE Technical Report

POWER CHALLENGE ™ Technical Report Chapter 1Chapter 1Overview ............................................................................... 1 1.1 1.1Innovative Supercomputing Design .......................................................... 1 1.2 1.2History of High-End Computing at Silicon Graphics ................................. 2 1.3 1.3Product Line Overview ............................................................................. 4 1.4 1.4Usage Environments ................................................................................ 5 1.4.1 1.4.1Desktop Supercomputing .......................................................................................6 1.4.2 1.4.2Dedicated Project Supercomputing ........................................................................6 1.4.3 1.4.3Departmental Supercomputing ..............................................................................7 1.4.4 1.4.4Enterprise Supercomputing ....................................................................................8 1.4.5 1.4.5Grand Challenge Supercomputing .........................................................................9 1.4.6 1.4.6Interactive Visual Supercomputing ......................................................................10 Chapter 2Chapter 2The Silicon Graphics Supercomputing Philosophy ............. 13 1.5 2.5Trends in Integrated Circuit Technology ................................................ 13 1.6 2.6High-Performance Microprocessors ....................................................... 15 1.7 2.7Meeting Memory Demands .................................................................... 18 1.8 2.8Shared Memory Programming ............................................................... 21 1.8.1 2.8.1Throughput Processing ........................................................................................21 1.8.2 2.8.2Parallel Processing ...............................................................................................22 1.8.3 2.8.3Parallel Throughput Processing ...........................................................................23 1.9 2.9Designing for Economic Scalability ........................................................ 24 Chapter 3Chapter 3POWER CHALLENGE Hardware Architecture .................. 27 1.10 3.10R10000™ Processor Architecture ...................................................... 28 1.10.1 3.10.1Modern Computing Challenges .......................................................................29 1.10.1.1 3.10.1.1Memory and Secondary Cache Latencies ................30 1.10.1.2 3.10.1.2Data Dependencies ..................................................31 1.10.1.3 3.10.1.3Branches ...................................................................32 1.10.2 3.10.2Tolerating Memory Latency ............................................................................32 1.10.2.1 3.10.2.1High-Bandwidth Secondary Cache Interface ............33 1.10.2.2 3.10.2.2Block Accesses .........................................................34 1.10.2.3 3.10.2.3Interleaved Memory ..................................................34 1.10.2.4 3.10.2.4Non-Blocking Cache .................................................35 1.10.2.5 3.10.2.5Prefetch .....................................................................35 1.10.3 3.10.3Data Dependency .............................................................................................36 1.10.3.1 3.10.3.1Register Renaming ...................................................36 1.10.3.2 3.10.3.2Out-of-Order Execution .............................................37 1.10.4 3.10.4Branch Prediction .............................................................................................37 1.10.5 3.10.5R10000 Product Overview ...............................................................................38 1.10.5.1 3.10.5.1Primary Data Cache ..................................................38 1.10.5.2 3.10.5.2Secondary Data Cache .............................................39 1.10.5.3 3.10.5.3Instruction Cache ......................................................39 1.10.5.4 3.10.5.4Branch Prediction ......................................................39 1.10.6 3.10.6Queueing Structures .........................................................................................40 1.10.6.1 3.10.6.1Integer Queue ...........................................................40 1.10.6.2 3.10.6.2Floating-Point Queue ................................................40 1.10.6.3 3.10.6.3Address Queue .........................................................40 iii 1.10.7 3.10.7Register Renaming ...........................................................................................40 1.10.7.1 3.10.7.1Mapping Tables ........................................................ 41 1.10.8 3.10.8Execution Units ................................................................................................42 1.10.8.1 3.10.8.1Integer ALUs ............................................................ 42 1.10.8.2 3.10.8.2Floating-Point Units .................................................. 43 1.10.9 3.10.9Load/Store Units and the TLB .........................................................................44 1.10.9.1 3.10.9.1Secondary Cache Interface ...................................... 44 1.10.9.2 3.10.9.2System Interface ...................................................... 45 1.10.9.3 3.10.9.3Multiprocessor Support ............................................ 45 1.11 3.11R8000™ Processor Architecture ........................................................46 1.11.1 3.11.1Superscalar Implementation .............................................................................48 1.11.1.1 3.11.1.1Balanced Memory Throughput Implementation ....... 49 1.11.1.2 3.11.1.2Instruction and Addressing Mode Extensions .......... 49 1.11.1.3 3.11.1.3Data Streaming Cache Architecture ......................... 50 1.11.2 3.11.2R8000 Integer Unit Organization .....................................................................50 1.11.2.1 3.11.2.1Instruction Cache and Instruction Cache Tag RAM . 50 1.11.2.2 3.11.2.2Instruction/Tag Queues and Dispatching ................. 52 1.11.2.3 3.11.2.3Branch Prediction Cache and Branch Prediction ..... 52 1.11.2.4 3.11.2.4Integer Register File ................................................. 54 1.11.2.5 3.11.2.5Arithmetic Logic Units ............................................... 54 1.11.2.6 3.11.2.6Translation Lookaside Buffer (TLB) Organization .... 55 1.11.2.7 3.11.2.7Data Cache and Data Cache Tag ............................ 55 1.11.3 3.11.3Integer Operations ............................................................................................56 1.11.4 3.11.4R8000 Floating-Point Unit Organization .........................................................57 1.11.5 3.11.5Data Streaming Cache and Tag RAM .............................................................59 1.11.6 3.11.6FPU Operations ................................................................................................61 1.12 3.12POWERpath-2™ Coherent Interconnect ............................................62 1.12.1 3.12.1POWERpath-2 Protocol ...................................................................................63 1.12.2 3.12.2Cache Coherency .............................................................................................63 1.12.3 3.12.3Bus Timing ......................................................................................................65 1.12.3.1 3.12.3.1 ................................................................................. 65 1.12.3.2 3.12.3.2Minimizing Read Latency ......................................... 66 1.12.3.3 3.12.3.3Memory Read Response Latency ............................ 66 1.12.3.4 3.12.3.4MP System Latency Reduction ................................ 67 1.13 3.13Synchronization ..................................................................................68 1.13.1 3.13.1Semaphores and Locks ....................................................................................68 1.13.2 3.13.2Fine-Grain Parallel Dispatch ............................................................................69 1.13.2.1 3.13.2.1Piggyback Reads ..................................................... 69 1.13.2.2 3.13.2.2Synchronization Counter .......................................... 69 1.13.3 3.13.3Tolerating High Bus Loads ..............................................................................69 1.14 3.14Memory Subsystem ............................................................................70 1.15 3.15I/O Subsystem ....................................................................................71 1.16 3.16HIO Modules .......................................................................................72 1.17 3.17VME Bus .............................................................................................72 1.18 3.18Peripherals ..........................................................................................73 1.19 3.19References .........................................................................................73 Chapter 4Chapter 464-bit IRIX™ Architecture and Standards ........................... 75 1.20 4.20Standards ...........................................................................................75 iv 1.20.1 4.20.1X/Open .............................................................................................................75

Load more