Microprocessor

MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE CELL MOVES INTO THE LIMELIGHT ISSCC Begins the Rollout of Cell Architecture By Kevin Krewell {2/14/05-01} Cell is real. Cell is out—finally! IBM has been eager to reveal more about a project at which it has been hard at work for almost five years. The Cell processor has been shrouded in secrecy, necessitated by the competitive nature of the multibillion-dollar game console market. With the first production units still at least a year away, in an Austin, Texas, facility called the STI Design Center. The Sony, Toshiba, and IBM now feel comfortable to begin reveal- team had some very lofty goals and a tight schedule to exe- ing the nature of the processor at the 2005 International Solid- cute against. It started in mid-2000, when IBM, Sony, and State Circuit Conference (ISSCC) but there was little informa- Toshiba began exploring the idea of working together. tion on the system design. What the announcement at ISSCC Toshiba had been Sony’s partner in the PlayStation 2 chip set did not lack was publicity—the presentations got extensive and brought its experience in large-scale integration and vol- national and international press coverage. Some of the stories ume manufacturing. By fall 2000, the three partners had hyped the Cell processor as competition for Intel, but this chip refined the concepts, and by March 2001, the contracts for is designed for Sony’s next-generation gaming console, not the design center were signed and work began to accelerate. PCs (although there is the open question of would Apple use Cell is a mix of ideas from the partners that has evolved Cell). The chip designers did tweak Intel by showing that the since the initial patent application, referenced in a previous prototype chips can operate at clock frequencies up to 4.6GHz story, was submitted. While all partners contributed and are (Intel had the very public problems reaching 4.0GHz with the involved with the management of the program, IBM will be Pentium 4). We’ve made a conscious effort to ignore the hype able to sell the Cell processor to others. So we could see and focus on the technical aspects of the chip. devices using Cell coming from companies beyond IBM, Early concepts of the Cell processor have already been Sony, and Toshiba. glimpsed in patent reports (see MPR 1/03/05-01,“New The overarching goal was to create a new architecture Patent Reveals Cell Secrets”), but, for the most part, the that could process the next generation of broadband media details of the architecture’s real implementation were hidden and graphics with greater efficiency than the traditional until the ISSCC abstract was released in December. (It was approaches of ultradeep pipelines and the ganging of numer- somewhat like trying to predict what a new production car ous complex and power-inefficient out-of-order RISC or would look like on the basis of the concept car.) At the 2005 CISC cores. The designers took a clean-slate approach to the ISSCC, the design began to come out, and we can reveal more problem but ultimately based the main core on the familiar details about why we chose the Cell Processor for the MPR Power architecture. Using Power gave them a well-known Analysts’ Choice Award for Best Technology of 2004. (See base to extend upon and could jump-start software develop- MPR 1/31/05-01, “Chips, Software, and Systems.”) ment. In addition to the Power core, the design includes eight The Cell development program has involved people other processing cores called the Synergistic Processor Ele- located around the world, with a large core team of designers ment (SPE), shown in Figure 1. © IN-STAT FEBRUARY 14, 2005 MICROPROCESSOR REPORT 2 Cell Moves Into the Limelight The Cell processor is technically a family of processors stream, multiple datastream (MIMD) processor. Although compliant to the specifications of the Broadband Processor SIMD and vector processors have been built in the past, and Architecture (BPA), the new architecture designed to process none too successfully, IBM believes this is the best architecture media data. Future implementations could have differing to process a great deal of media and graphics data with a min- numbers of Power and Synergistic Processor cores. (IBM loves imum of power and a reasonable die size. Programming tech- three-letter acronyms, and the company spread plenty of love niques for a processor such as Cell will take some time to on the Cell processor—or should I say the BPA design.) In develop, but IBM hopes that starting with a Power core and designing the BPA, IBM looked at different workloads in areas building on that foundation will speed things along. of cryptography, graphics transform and lighting, physics, The general architecture seems in parts inspired by fast-Fourier transforms, matrix math, and other more scien- Sony’s experience with the PlayStation 2 processor, IBM’s tific workloads. experience with network processors, and a number of vector BPA (Cell) design features include the following: computers dating back to the mid-1970’s. In the ‘70’s and 80’s • Extension of the Power Architecture a number of computer systems were built that attached a • Coherent and cooperative off-load processing series of processing functions around a ring network, includ- • Enhanced SIMD architecture ing the Programmable Signal Processor (PSP) for the Air • Power efficiency improved over that of conventional Force JSTARs program, which was inspired by the CDC architectures Advanced Flexible Processor (AFP). The Project MAC data • Linux port derived from work on PowerPC flow processor developed at M.I.T. in the 1970’s used a packet • Resource allocation management communications architecture that passed a bundle of an • Locking caches (via replacement management tables) instruction and two operands called an “Instruction Cell” to • Multiple memory page table sizes an array of processing elements through a routing network. • Isolation mechanism for secure code execution The complexity of these system designs also meant that programming them was equally complex. So while the Cell Getting Inside the Cell Processor processor design is unique today for a proposed mainstream Fundamentally, the Cell processor consists of three main units processor, its design is built on research and development supported by two Rambus interfaces. There is a single Power that is decades old. processor element (PPE) that acts as the main host processor, The goal of building a chip with a large amount of par- eight single-instruction, multiple-datastream (SIMD) proces- allel processing, and still being able to offer deterministic pro- sors, and a highly programmable DMA controller (see Figure cessing times, meant that some conventional microprocessor 1). Cell has eight SIMD processing elements and the Power concepts had to be discarded. That included out-of-order core, so Cell can also be considered a multiple instruction execution by the Power core, and local store memory for the SPEs do not use hardware cache-coherency snooping protocols avoiding the indeterminate Synergistic Processor Elements for High (Fl)ops/Watt nature of cache misses. The Cell processor supports Rambus’s SPU SPU SPU SPU SPU SPU SPU SPU 16B/ XDR interface for memory and the Redwood cycle interface for I/O (see MPR 8/04/03-02,“Rambus LS LS LS LS LS LS LS LS Yellowstone Becomes XDR”) and multiproces- 16B/ sor link. Cell doesn’t integrate networking, cycle peripherals, or large memory arrays (unlike the EIB (up to 96B/cycle) IBM BlueGene/L processor) on chip. The rea- 16B/cycle 16B/ 16B/cycle son for this design difference between Cell and cycle (2x) the IBM BlueGene chip is that BlueGene could L2 MIC BIC use an older process technology that had embedded DRAM (eDRAM) available; it 32B/cycle worked for BlueGene because the goal was to L1 PPU Dual RRAC I/O add many thousands of modestly clocked 16B/cycle XDR processors (700MHz) into one system to build a 64-bit Power Architecture w/VMX for powerful supercomputer system. Each Cell traditional computation processor chip needs to have greater performance, because each system is expected to contain Figure 1. Block diagram of the Cell processor. The synergistic processor element (SPE) is only one or a very few chips. The BPA architec- the combination of the synergistic processor unit (SPU) with its local store (LS) memory and ture was designed with certain performance memory flow controller (MFC). The Power processor unit and the L2 cache form the Power processor element. The element interconnect bus (EIB) connects the SPE and PPE blocks and die-size goals. To reach the price points for together. The Cell processor uses I/O and memory interfaces licensed from Rambus. the Cell processor, IBM had to consider die size. © IN-STAT FEBRUARY 14, 2005 MICROPROCESSOR REPORT Cell Moves Into the Limelight 3 The die size of the Cell processor is actually slightly smaller media capabilities of the Mac with this much processing- than that of the original Emotion Engine in the PlayStation 2 power capability! (240mm2 in a 0.25-micron process). To reach both the right Although the team did look at other processor cores, die size and performance, IBM also needed to use its 90nm using Power was the obvious solution for IBM because of SOI technology, which at present doesn’t support eDRAM. the company’s investment in the architecture and experi- Placing the I/O off chip gave the designers more flexibility ence with it. Development-time constraints were also a fac- with system design. A game console contains different pe- tor in the choice.

Microprocessor

Wind Rose Data Comes in the Form >200,000 Wind Rose Images

Vxworks Architecture Supplement, 6.2

Quietrock Case Study | Sony Computer Entertainment America

List of Notable Handheld Game Consoles (Source

Power4 Focuses on Memory Bandwidth IBM Confronts IA-64, Says ISA Not Important

SIMD Extensions

14789093.Pdf

POWER® Processor-Based Systems

Memory Map: 68HC12 CPU and MC9S12DP256B Evaluation Board

From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005

IBM Powerpc 970 (A.K.A. G5)

Chapter 1-Introduction to Microprocessors File