Xbox 360 System Architecture

XBOX 360 SYSTEM ARCHITECTURE THIS ARTICLE COVERS THE XBOX 360’SHIGH-LEVEL TECHNICAL REQUIREMENTS, A SHORT SYSTEM OVERVIEW, AND DETAILS OF THE CPU AND THE GPU. THE AUTHORS DESCRIBE THEIR ARCHITECTURAL TRADE-OFFS AND SUMMARIZE THE SYSTEM’S SOFTWARE PROGRAMMING SUPPORT. Microsoft’s Xbox 360 game console to translate the next-generation gaming prin- is the first of the latest generation of game con- ciple into useful feature requirements and soles. Historically, game console architecture next-generation game workloads. For the and design implementations have provided game workloads, the designers’ direction came large discrete jumps in system performance, from interaction with game developers, approximately at five-year intervals. Over the including game engine developers, middle- last several generations, game console systems ware developers, tool developers, API and dri- have increasingly become graphics supercom- ver developers, and game performance puters in their own right, particularly at the experts, both inside and outside Microsoft. launch of a given game console generation. One key next-generation game feature The Xbox 360, pictured in Figure 1, contains requirement was that the Xbox 360 system an aggressive hardware architecture and imple- must implement a 720p (progressive scan) mentation targeted at game console workloads. pervasive high-definition (HD), 16:9 aspect The core silicon implements the product ratio screen in all Xbox 360 games. This fea- Jeff Andrews designers’ goal of providing game developers a ture’s architectural implication was that the hardware platform to implement their next-gen- Xbox 360 required a huge, reliable fill rate. Nick Baker eration game ambitions. The core chips include Another design principle of the Xbox 360 the standard conceptual blocks of CPU, graph- architecture was that it must be flexible to suit Microsoft Corp. ics processing unit (GPU), memory, and I/O. the dynamic range of game engines and game Each of these components and their intercon- developers. The Xbox 360 has a balanced nections are customized to provide a user- hardware architecture for the software game friendly game console product. pipeline, with homogeneous, reallocatable hardware resources that adapt to different Design principles game genres, different developer emphases, One of the Xbox 360’s main design princi- and even to varying workloads within a frame ples is the next-generation gaming principle— of a game. In contrast, heterogeneous hard- that is, a new game console must provide value ware resources lock software game pipeline to customers for five to seven years. Thus, as performance in each stage and are not reallo- for any true next-generation game console catable. Flexibility helps make the design hardware, the Xbox 360 delivers a huge discrete “futureproof.” The Xbox 360’s three CPU jump in hardware performance for gaming. cores, 48 unified shaders, and 512-Mbyte The Xbox 360 hardware design team had DRAM main memory will enable developers 0272-1732/06/$20.00 © 2006 IEEE Published by the IEEE Computer Society 25 HOT CHIPS 17 sent the simplest APIs and programming models to let game developers use hardware resources effectively. We extended programming models that developers liked. Because software developers liked the first Xbox, using it as a working model was nat- ural for the teams. In listening to developers, we did not repackage or include hardware features that developers did not like, even though that may have simplified the hardware implementation. We considered the software tool chain from the very beginning of the project. Another major design principle was that the Xbox 360 hardware be optimized for achiev- able performance. To that end, we designed a scalable architecture that provides the great- est usable performance per square millimeter while remaining within the console’s system power envelope. As we continued to work with game developers, we scaled chip implementations to result in balanced hardware for the software game pipeline. Examples of higher-level implementation scalability include the number of CPU cores, the number of GPU shaders, CPU L2 size, bus bandwidths, and Figure 1. Xbox 360 game console and wireless controller. main memory size. Other scalable items rep- resented smaller optimizations in each chip. to create innovative games for the next five to Hardware designed for games seven years. Figure 2 shows a top-level diagram of the A third design principle was programma- Xbox 360 system’s core silicon components. bility; that is, the Xbox 360 architecture must The three identical CPU cores share an 8-way be easy to program and develop software for. set-associative, 1-Mbyte L2 cache and run at The silicon development team spent much 3.2 GHz. Each core contains a complement of time listening to software developers (we are four-way single-instruction, multiple data hardware folks at a software company, after (SIMD) vector units.1 The CPU L2 cache, all). There was constant interaction and iter- cores, and vector units are customized for ation with software developers at the very Xbox 360 game and 3D graphics workloads. beginning of the project and all along the The front-side bus (FSB) runs at architecture and implementation phases. 5.4 Gbit/pin/s, with 16 logical pins in each This interaction had an interesting dynam- direction, giving a 10.8-Gbyte/s read and a ic. The software developers weren’t shy about 10.8-Gbyte/s write bandwidth. The bus their hardware likes and dislikes. Likewise, the design and the CPU L2 provide added sup- hardware team wasn’t shy about where next- port that allows the GPU to read directly from generation hardware architecture and design the CPU L2 cache. were going as a result of changes in silicon As Figure 2 shows, the I/O chip supports processes, hardware architecture, and system abundant I/O components. The Xbox media design. What followed was further iteration audio (XMA) decoder, custom-designed by on planned and potential workloads. Microsoft, provides on-the-fly decoding of a An important part of Xbox 360 pro- large number of compressed audio streams in grammability is that the hardware must pre- hardware. Other custom I/O features include 26 IEEE MICRO CPU I/O chip DVD (SATA) Core 0 Core 1 Core 2 HDD port (SATA) Front controllers (2 USB) L1D L1I L1D L1I L1D L1I Wireless controllers 1 Mbyte L2 MU ports (2 USB) Rear panel USB Memory GPU Ethernet IR BIU/IO interface XMA decoder MC1 Audio out 512 Mbyte DRAM Flash 3D core SMC System control MC0 10 Mbytes Video EDRAM out Analog Video out chip BIU Bus interface unit MC Memory controller HDD Hard disk drive MU Memory unit IR Infrared receiver SMC System management controller XMA Xbox media audio Figure 2. Xbox 360 system block diagram. the NAND flash controller and the system The shared L2 allows fine-grained, dynamic management controller (SMC). allocation of cache lines between the six threads. The GPU 3D core has 48 parallel, unified Commonly, game workloads significantly vary shaders. The GPU also includes 10 Mbytes of in working-set size. For example, scene man- embedded DRAM (EDRAM), which runs at agement requires walking larger, random-miss- 256 Gbytes/s for reliable frame and z-buffer dominated data structures, similar to database bandwidth. The GPU includes interfaces searches. At the same time, audio, Xbox proce- between the CPU, I/O chip, and the GPU dural synthesis (described later), and many other internals. game processes that require smaller working sets The 512-Mbyte unified main memory con- can run concurrently. The shared L2 allows trolled by the GPU is a 700-MHz graphics- workloads needing larger working sets to allo- double-data-rate-3 (GDDR3) memory, cate significantly more of the L2 than would be which operates at 1.4 Gbit/pin/s and provides available if the system used private L2s (of the a total main memory bandwidth of 22.4 same total L2 size) instead. Gbytes/s. The CPU core has two-per-cycle, in-order The DVD and HDD ports are serial ATA instruction issuance. A separate vector/scalar (SATA) interfaces. The analog chip drives the issue queue (VIQ) decouples instruction HD video out. issuance between integer and vector instruc- tions for nondependent work. There are two CPU chip symmetric multithreading (SMT),5 fine- Figure 3 shows the CPU chip in greater grained hardware threads per core. The L1 detail. Microsoft’s partner for the Xbox 360 caches include a two-way set-associative, 32- CPU is IBM. The CPU implements the Pow- Kbyte L1 instruction cache and a four-way erPC instruction set architecture,2-4 with the set-associative, 32-Kbyte L1 data cache. The VMX SIMD vector instruction set (VMX128) write-through data cache does not allocate customized for graphics workloads. cache lines on writes. MARCH–APRIL 2006 27 HOT CHIPS 17 Core 2 Core 1 L1I 32 L1I Instruction unit Core 0 Kbytes 32 Instruction unit L1D L1I Load/ Kbytes Branch VIQ Int 32 32 Instruction unit L1D Store Load/ Kbytes Kbytes Branch VIQ Int 32 L1D Store Load/ Kbytes Branch VIQ Int 32 Store Kbytes VMX VMX VMX FPU MMU VSU FP perm simp VMX VMX VMX FPU MMU VSU FP perm simp VMX VMX VMX FPU MMU VSU FP perm simp L2 Node crossbar/queuing Uncached L2 L2 PIC UncachedUnit2 L2 data UncachedUnit2 directory directory Unit2 Test, Bus interface debug, clocks, temperature sensor. Front side bus (FSB) VSU Vector/scalar unit Perm Permute Simp Simple MMU Main-memory unit Int Integer PIC Programmable interrupt controller FPU Floating point unit VIQ Vector/scalar issue queue Figure 3. Xbox 360 CPU block diagram. The integer execution pipelines include graphics applications. The dot product branch, integer, and load/store units.

Load more