An Introduction to the Cell Broadband Engine Architecture

Total Page:16

File Type:pdf, Size:1020Kb

An Introduction to the Cell Broadband Engine Architecture IBM Systems & Technology Group An Introduction to the Cell Broadband Engine Architecture Owen Callanan IBM Ireland, Dublin Software Lab. Email: [email protected] 1 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Agenda About IBM Ireland development Cell Introduction Cell Architecture Programming the Cell – SPE programming – Controlling data movement – Tips and tricks Trademarks – Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. 2 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Introduction - Development in IBM Ireland Lotus Software Research Tivoli Industry Assets & Models RFID Center of Competence High Performance Computing 3 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Introduction – HPC in IBM Dublin Lab Blue Gene team Deep Computing Visualization (DCV) Dynamic Application Virtualization (DAV) Other areas also – E.g. RDMA communications systems 4 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell Introduction 5 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell Broadband Engine History IBM, SCEI/Sony, Toshiba Alliance formed in 2000 Design Center opens March 2001 ~$400M Investment, 5 years, 600 people February 7, 2005: First technical disclosures January 12, 2006: Alliance extended 5 additional years YKT, EFK, BURLINGTON, ENDICOTT ROCHESTER BOEBLINGEN TOKYO SAN JOSE RTP AUSTIN INDIA ISRAEL 6 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group First implementation of Cell B.E. Architecture 7 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM PowerXCell™ 8i – QS22 PowerXCellPowerXCell 8i 8i processorprocessor 6565 nmnm 99 cores,cores, 1010 threadsthreads 230.4230.4 GFlopsGFlops peak peak (SP)(SP) atat 3.2GHz3.2GHz 108.8108.8 GFlopsGFlops peak peak (DP)(DP) atat 3.2GHz3.2GHz UpUp toto 2525 GB/sGB/s memorymemory bandwidthbandwidth UpUp toto 7575 GB/sGB/s I/OI/O bandwidthbandwidth 8 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group 25.6 GB/sec Cell Processor Memory Inteface Synergistic Processor Elements “Supercomputer-on-a-Chip”“ (SPE): •8 per chip Power Processor Element (PPE): •128-bit wide SIMD Units N N LocalStore LocalStore •General Purpose, 64-bit RISC SPU SPU •Integer and Floating Point capable Processor (PowerPC 2.02) AUC AUC •256KB Local Store N N MFC MFC •2-Way Hardware Multithreaded Element Interconnect Bus •Up to 25.6 GF/s per SPE --- •L1 : 32KB I ; 32KB D AUC Local Store 200GF/s total * Local Store AUC N N NCU MFC •L2 : 512KB MFC SPU SPU •Coherent load/store Power Core •VMX (PPE) Internal Interconnect: AUC Local Store Local Store AUC •Coherent ring structure •3.2 GHz L2 Cache N N MFC MFC SPU SPU •300+ GB/s total internal interconnect bandwidth •DMA control to/from SPEs N N MFC MFC AUC 20 GB/sec AUC supports >100 outstanding Coherent memory requests SPU SPU LocalStore LocalStore N N Interconnect 5 GB/sec I/O Bus External Interconnects: Memory Management & Mapping •25.6 GB/sec BW memory interface •SPE Local Store aliased into PPE system memory •2 Configurable I/O Interfaces •MFC/MMU controls SPE DMA accesses •Coherent interface (SMP) •Compatible with PowerPC Virtual Memory •Normal I/O interface (I/O & Graphics) architecture •Total BW configurable between interfaces •S/W controllable from PPE MMIO •Up to 35 GB/s out •Hardware or Software TLB management •Up to 25 GB/s in •SPE DMA access protected by MFC/MMU 9 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM BladeCenter QS21 Announcement: August 28, 2007 IBM BladeCenter QS22 Announcement: May 13, 2008 10 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM BladeCenter QS21 Core Electronics – Dual 3.2GHz Cell/B.E. Processor Configuration Memory Memory – 2GB XDRAM (1GB per processor) 1GB 1GB – Dual Gigabit Ethernet (GbE) controllers (18x XDR) (18x XDR) – Single-wide blade (uses 1 BladeCenter H slot) Rambus XIO – Infiniband 4x channel adapters / (optional) • Cisco Systems 4X InfiniBand HCA Expansion Card for Cell/B.E. Cell/B.E. BladeCenter (32R1760) Rambus FlexIO ™ – Serial Attached SCSI (SAS) daughter card (39Y9190) / (optional) Flash, RTC DD DD BC Chassis Configuration & NVRAM DD DD R South R South – Standard IBM BladeCenter H R R 2 UART, SPI 2 Bridge 2 Bridge 2 2 Con Legacy – Max. 14 QS21 per chassis SPI PCI-X – 2 Gigabit Ethernet switches PCI PCI-E x16 PCI-E x8 – External IB switches required for IB option 4x HSC: HSDC 2x USB 1GbE • Cisco Systems 4X InfiniBand Switch Module for 2.0 2x PCI-E BladeCenter (32R1756) x16 Optional IB 2 port Peak Performance IB x4 HCA – Up to 460 GFLOPS per blade USB to GbE to – Up to 6.4 TFLOPS (peak) in a single BladeCenter H chassis BC-H mid plane BC-H mid plane IB-4x to – Up to 25.8 TFLOPS in a standard 42U rack BC-H high speed fabric/mid plane 11 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group BladeCenter® QS22 – PowerXCell 8i Core Electronics – Two 3.2GHz PowerXCell 8i Processors D D D D D D D D – SP: 460 GFlops peak per blade D D D D D D D D R R R R R R R R – DP: 217 GFlops peak per blade 2 2 2 2 2 2 2 2 – Up to 32GB DDR2 800MHz DDR2 – Standard blade form factor PowerXCell 8i PowerXCell 8i – Support BladeCenter H chassis Rambus® FlexIO ™ Flash, RTC Integrated features D D & NVRAM D IBM D IBM – Dual 1Gb Ethernet (BCM5704) R South R South 2 UART, SPI 2 2 Con Legacy – Serial/Console port, 4x USB on PCI Bridge Bridge SPI PCI-X PCI PCI-E x16 PCI-E x8 Optional 4x HSC *1 2x USB HSDC 1GbE 2.0 2x PCI-E – Pair 1GB DDR2 VLP DIMMs as I/O buffer x16 Optional IB (2GB total) (46C0501) 2 port Flash IB x4 HCA – 4x SDR InfiniBand adapter (32R1760) Drive USB to IB-4x to GbE to – SAS expansion card (39Y9190) BC mid plane BC-H high speed BC mid plane fabric/mid plane – 8GB Flash Drive (43W3934) *The HSC interface is not enabled on the standard products. This interface can be enabled on “custom” system implementations for clients by working with the Cell services organization in IBM Industry Systems. 12 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM to Build World's First Cell Broadband Engine™ Based Supercomputer Revolutionary Hybrid Supercomputer at Los Alamos National Laboratory Will Harness Cell PowerXCell 8i and AMD Opteron™ Technology x86 Linux PowerXCell 8i Master Nodes Accelerators AMD Opteron™ 12,960 PowerXCell 8i chips 6,912 Dual Core chips ~1,330 TFlop/s ~50 TFlop/s 52 TB Memory 55 TB Memory Goal 1 PetaFlop Double Precision Floating Point Sustained – 1.4 PetaFlop Peak DP Floating point (2.8 SP) Roadrunner Building Blocks – 216 GB/s sustained File System I/O: – 107 TB of Memory Roadrunner Tri-blade Node IBM Blade Center H – 296 server racks that take up around 5,000 square feet- 3.9 MW power – Hybrid of Opteron X64 AMD processors (System x3755 servers) and Cell BE Blade Servers connected via high speed network – Modular Approach means the Master Cluster could be made up of Any Type System – including Power, Intel 13 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Roadrunner At a Glance Cluster of 18 Connected Units RHEL & Fedora Linux – 6,912 AMD dual-core Opterons – 12,960 IBM Cell eDP accelerators SDK for Multicore Acceleration – 49.8 Teraflops peak (Opteron) – 1.33 Petaflops peak (Cell eDP) xCAT Cluster Management – 1PF sustained Linpack – System-wide GigEnet network InfiniBand 4x DDR fabric – 2-stage fat-tree; all-optical cables 3.9 MW Power – Full bi-section BW within each CU – 0.35 GF/Watt • 384 GB/s (bi-directional) – Half bi-section BW among CUs Area • 3.45 TB/s (bi-directional) – 296 racks – Non-disruptive expansion to 24 CUs – 5500 ft 2 80 TB aggregate memory – 28 TB Opteron – 52 TB Cell 216 GB/s sustained File System I/O – 216x2 10G Ethernets to Panasas 14 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group IBM iRT - Interactive Ray-tracer Ray Trace Images in Interactive Time SPE Partitions Blade 0 Blade 1 Blade 2 Blade 3 Blade 4 Blade 5 Blade 6 GDC 07 Configuration, Seven QS20 Blades, 3.2 TFLOPS Also Shown at GDC Gigabit Network March 2007 Composite image including • PS3 head node reflection, transparency, Leveraging detailed shadows, ambient seven QS20 iRT Blade Performance Scaling occlusion, and 4x4 jittered blades 20 multi-sampling, totaling up • 3 Million 18 Triangle Urban 16 to 288 rays per pixel. The 14 car model is comprised of 1.6 Landscape 12 • >100 Textures 10 1080p 1.6Mmillion Triangles polygons and the 8 • 1080p HDTV Frames/Sec 6 image resolution is 1080p hi- 4 def. 2 0 1 2 3 4 5 6 7 Demo Blades 15 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group Cell Architecture 16 Cell Programming Workshop 5/7/2009 © 2009 IBM Corporation IBM Systems & Technology Group The new reality of microprocessors Moore’s law, the doubling of transistors on a chip every 18 months (top curve), continues unabated. Three other metrics impacting computer performance (bottom three curves) have leveled off since 2002: the maximum electric power a microprocessor can use, the clock speed, and performance (operations) per clock tick. “The biggest reason for the leveling off is the heat dissipation problem.
Recommended publications
  • Wind Rose Data Comes in the Form >200,000 Wind Rose Images
    Making Wind Speed and Direction Maps Rich Stromberg Alaska Energy Authority [email protected]/907-771-3053 6/30/2011 Wind Direction Maps 1 Wind rose data comes in the form of >200,000 wind rose images across Alaska 6/30/2011 Wind Direction Maps 2 Wind rose data is quantified in very large Excel™ spreadsheets for each region of the state • Fields: X Y X_1 Y_1 FILE FREQ1 FREQ2 FREQ3 FREQ4 FREQ5 FREQ6 FREQ7 FREQ8 FREQ9 FREQ10 FREQ11 FREQ12 FREQ13 FREQ14 FREQ15 FREQ16 SPEED1 SPEED2 SPEED3 SPEED4 SPEED5 SPEED6 SPEED7 SPEED8 SPEED9 SPEED10 SPEED11 SPEED12 SPEED13 SPEED14 SPEED15 SPEED16 POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 WEIBC1 WEIBC2 WEIBC3 WEIBC4 WEIBC5 WEIBC6 WEIBC7 WEIBC8 WEIBC9 WEIBC10 WEIBC11 WEIBC12 WEIBC13 WEIBC14 WEIBC15 WEIBC16 WEIBK1 WEIBK2 WEIBK3 WEIBK4 WEIBK5 WEIBK6 WEIBK7 WEIBK8 WEIBK9 WEIBK10 WEIBK11 WEIBK12 WEIBK13 WEIBK14 WEIBK15 WEIBK16 6/30/2011 Wind Direction Maps 3 Data set is thinned down to wind power density • Fields: X Y • POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 • Power1 is the wind power density coming from the north (0 degrees). Power 2 is wind power from 22.5 deg.,…Power 9 is south (180 deg.), etc… 6/30/2011 Wind Direction Maps 4 Spreadsheet calculations X Y POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 POWER11 POWER12 POWER13 POWER14 POWER15 POWER16 Max Wind Dir Prim 2nd Wind Dir Sec -132.7365 54.4833 0.643 0.767 1.911 4.083
    [Show full text]
  • History of AI at IBM and How IBM Is Leveraging Watson for Intellectual Property
    History of AI at IBM and How IBM is Leveraging Watson for Intellectual Property 2019 ECC Conference June 9-11 at Marist College IBM Intellectual Property Management Solutions 1 IBM Intellectual Property Management Solutions © 2017-2019 IBM Corporation Who are We? At IBM for 37 years I currently work in the Technology and Intellectual Property organization, a combination of CHQ and Research. I have worked as an engineer in Procurement, Testing, MLC Packaging, and now T&IP. Currently Lead Architect on IP Advisor with Watson, a Watson based Patent and Intellectual Property Analytics tool. • Master Inventor • Number of patents filed ~ 24+ • Number of submissions in progress ~ 4+ • Consult/Educate outside companies on all things IP (from strategy to commercialization, including IP 101) • Technical background: Semiconductors, Computers, Programming/Software, Tom Fleischman Intellectual Property and Analytics [email protected] Is the manager of the Intellectual Property Management Solutions team in CHQ under the Technology and Intellectual Property group. Current OM for IP Advisor with Watson application, used internally and externally. Past Global Business Services in the PLM and Supply Chain practices. • Number of patents filed – 2 (2018) • Number of submissions in progress - 2 • Consult/Educate outside companies on all things IP (from strategy to commercialization, including IP 101) • Schaumburg SLE Sue Hallen • Technical background: Registered Professional Engineer in Illinois, Structural Engineer by [email protected] degree, lots of software development and implementation for PLM clients 2 IBM Intellectual Property Management Solutions © 2017-2019 IBM Corporation How does IBM define AI? IBM refers to it as Augmented Intelligence…. • Not artificial or meant to replace Human Thinking…augments your work AI Terminology Machine Learning • Provides computers with the ability to continuing learning without being pre-programmed.
    [Show full text]
  • Ray Tracing on the Cell Processor
    Ray Tracing on the Cell Processor Carsten Benthin† Ingo Wald Michael Scherbaum† Heiko Friedrich‡ †inTrace Realtime Ray Tracing GmbH SCI Institute, University of Utah ‡Saarland University {benthin, scherbaum}@intrace.com, [email protected], [email protected] Abstract band”) architectures1 to hide memory latencies, at exposing par- Over the last three decades, higher CPU performance has been allelism through SIMD units, and at multi-core architectures. At achieved almost exclusively by raising the CPU’s clock rate. Today, least for specialized tasks such as triangle rasterization, these con- the resulting power consumption and heat dissipation threaten to cepts have been proven as very powerful, and have made modern end this trend, and CPU designers are looking for alternative ways GPUs as fast as they are; for example, a Nvidia 7800 GTX offers of providing more compute power. In particular, they are looking 313 GFlops [16], 35 times more than a 2.2 GHz AMD Opteron towards three concepts: a streaming compute model, vector-like CPU. Today, there seems to be a convergence between CPUs and SIMD units, and multi-core architectures. One particular example GPUs, with GPUs—already using the streaming compute model, of such an architecture is the Cell Broadband Engine Architecture SIMD, and multi-core—become increasingly programmable, and (CBEA), a multi-core processor that offers a raw compute power CPUs are getting equipped with more and more cores and stream- of up to 200 GFlops per 3.2 GHz chip. The Cell bears a huge po- ing functionalities. Most commodity CPUs already offer 2–8 tential for compute-intensive applications like ray tracing, but also cores [1, 9, 10, 25], and desktop PCs with more cores can be built requires addressing the challenges caused by this processor’s un- by stacking and interconnecting smaller multi-core systems (mostly conventional architecture.
    [Show full text]
  • IBM System P5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction
    Giuliano Anselmi Redpaper Bernard Filhol SahngShin Kim Gregor Linzmeier Ondrej Plachy Scott Vetter IBM System p5 Quad-Core Module Based on POWER5+ Technology: Technical Overview and Introduction The quad-core module (QCM) is based on the well-known POWER5™ dual-core module (DCM) technology. The dual-core POWER5 processor and the dual-core POWER5+™ processor are packaged with the L3 cache chip into a cost-effective DCM package. The QCM is a package that enables entry-level or midrange IBM® System p5™ servers to achieve additional processing density without increasing the footprint. Figure 1 shows the DCM and QCM physical views and the basic internal architecture. to I/O to I/O Core Core 1.5 GHz 1.5 GHz Enhanced Enhanced 1.9 MB MB 1.9 1.9 MB MB 1.9 L2 cache L2 L2 cache switch distributed switch distributed switch 1.9 GHz POWER5 Core Core core or CPU or core 1.5 GHz L3 Mem 1.5 GHz L3 Mem L2 cache L2 ctrl ctrl ctrl ctrl Enhanced distributed Enhanced 1.9 MB Shared Shared MB 1.9 Ctrl 1.9 GHz L3 Mem Ctrl POWER5 core or CPU or core 36 MB 36 MB 36 L3 cache L3 cache L3 DCM 36 MB QCM L3 cache L3 to memory to memory DIMMs DIMMs POWER5+ Dual-Core Module POWER5+ Quad-Core Module One Dual-Core-chip Two Dual-Core-chips plus two L3-cache-chips plus one L3-cache-chip Figure 1 DCM and QCM physical views and basic internal architecture © Copyright IBM Corp. 2006.
    [Show full text]
  • Big Blue in the Bottomless Pit: the Early Years of IBM Chile
    Big Blue in the Bottomless Pit: The Early Years of IBM Chile Eden Medina Indiana University In examining the history of IBM in Chile, this article asks how IBM came to dominate Chile’s computer market and, to address this question, emphasizes the importance of studying both IBM corporate strategy and Chilean national history. The article also examines how IBM reproduced its corporate culture in Latin America and used it to accommodate the region’s political and economic changes. Thomas J. Watson Jr. was skeptical when he The history of IBM has been documented first heard his father’s plan to create an from a number of perspectives. Former em- international subsidiary. ‘‘We had endless ployees, management experts, journalists, and opportunityandlittleriskintheUS,’’he historians of business, technology, and com- wrote, ‘‘while it was hard to imagine us getting puting have all made important contributions anywhere abroad. Latin America, for example to our understanding of IBM’s past.3 Some seemed like a bottomless pit.’’1 However, the works have explored company operations senior Watson had a different sense of the outside the US in detail.4 However, most of potential for profit within the world market these studies do not address company activi- and believed that one day IBM’s sales abroad ties in regions of the developing world, such as would surpass its growing domestic business. Latin America.5 Chile, a slender South Amer- In 1949, he created the IBM World Trade ican country bordered by the Pacific Ocean on Corporation to coordinate the company’s one side and the Andean cordillera on the activities outside the US and appointed his other, offers a rich site for studying IBM younger son, Arthur K.
    [Show full text]
  • The Evolution of Ibm Research Looking Back at 50 Years of Scientific Achievements and Innovations
    FEATURES THE EVOLUTION OF IBM RESEARCH LOOKING BACK AT 50 YEARS OF SCIENTIFIC ACHIEVEMENTS AND INNOVATIONS l Chris Sciacca and Christophe Rossel – IBM Research – Zurich, Switzerland – DOI: 10.1051/epn/2014201 By the mid-1950s IBM had established laboratories in New York City and in San Jose, California, with San Jose being the first one apart from headquarters. This provided considerable freedom to the scientists and with its success IBM executives gained the confidence they needed to look beyond the United States for a third lab. The choice wasn’t easy, but Switzerland was eventually selected based on the same blend of talent, skills and academia that IBM uses today — most recently for its decision to open new labs in Ireland, Brazil and Australia. 16 EPN 45/2 Article available at http://www.europhysicsnews.org or http://dx.doi.org/10.1051/epn/2014201 THE evolution OF IBM RESEARCH FEATURES he Computing-Tabulating-Recording Com- sorting and disseminating information was going to pany (C-T-R), the precursor to IBM, was be a big business, requiring investment in research founded on 16 June 1911. It was initially a and development. Tmerger of three manufacturing businesses, He began hiring the country’s top engineers, led which were eventually molded into the $100 billion in- by one of world’s most prolific inventors at the time: novator in technology, science, management and culture James Wares Bryce. Bryce was given the task to in- known as IBM. vent and build the best tabulating, sorting and key- With the success of C-T-R after World War I came punch machines.
    [Show full text]
  • POWER® Processor-Based Systems
    IBM® Power® Systems RAS Introduction to IBM® Power® Reliability, Availability, and Serviceability for POWER9® processor-based systems using IBM PowerVM™ With Updates covering the latest 4+ Socket Power10 processor-based systems IBM Systems Group Daniel Henderson, Irving Baysah Trademarks, Copyrights, Notices and Acknowledgements Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Active AIX® POWER® POWER Power Power Systems Memory™ Hypervisor™ Systems™ Software™ Power® POWER POWER7 POWER8™ POWER® PowerLinux™ 7® +™ POWER® PowerHA® POWER6 ® PowerVM System System PowerVC™ POWER Power Architecture™ ® x® z® Hypervisor™ Additional Trademarks may be identified in the body of this document. Other company, product, or service names may be trademarks or service marks of others. Notices The last page of this document contains copyright information, important notices, and other information. Acknowledgements While this whitepaper has two principal authors/editors it is the culmination of the work of a number of different subject matter experts within IBM who contributed ideas, detailed technical information, and the occasional photograph and section of description.
    [Show full text]
  • Openpower AI CERN V1.Pdf
    Moore’s Law Processor Technology Firmware / OS Linux Accelerator sSoftware OpenStack Storage Network ... Price/Performance POWER8 2000 2020 DRAM Memory Chips Buffer Power8: Up to 12 Cores, up to 96 Threads L1, L2, L3 + L4 Caches Up to 1 TB per socket https://www.ibm.com/blogs/syst Up to 230 GB/s sustained memory ems/power-systems- openpower-enable- bandwidth acceleration/ System System Memory Memory 115 GB/s 115 GB/s POWER8 POWER8 CPU CPU NVLink NVLink 80 GB/s 80 GB/s P100 P100 P100 P100 GPU GPU GPU GPU GPU GPU GPU GPU Memory Memory Memory Memory GPU PCIe CPU 16 GB/s System bottleneck Graphics System Memory Memory IBM aDVantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 baseD 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 ADD: Coherent Accelerator Processor Interface (CAPI) FPGA CAPP PCIe POWER8 Processor ...FPGAs, networking, memory... Typical I/O MoDel Flow Copy or Pin MMIO Notify Poll / Int Copy or Unpin Ret. From DD DD Call Acceleration Source Data Accelerator Completion Result Data Completion Flow with a Coherent MoDel ShareD Mem. ShareD Memory Acceleration Notify Accelerator Completion Focus on Enterprise Scale-Up Focus on Scale-Out and Enterprise Future Technology and Performance DriVen Cost and Acceleration DriVen Partner Chip POWER6 Architecture POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10 POWER8/9 2007 2008 2010 2012 2014 2016 2017 TBD 2018 - 20 2020+ POWER6 POWER6+ POWER7 POWER7+ POWER8 POWER8 P9 SO P9 SU P9 SO 2 cores 2 cores 8 cores 8 cores 12 cores w/ NVLink
    [Show full text]
  • Implementing Powerpc Linux on System I Platform
    Front cover Implementing POWER Linux on IBM System i Platform Planning and configuring Linux servers on IBM System i platform Linux distribution on IBM System i Platform installation guide Tips to run Linux servers on IBM System i platform Yessong Johng Erwin Earley Rico Franke Vlatko Kosturjak ibm.com/redbooks International Technical Support Organization Implementing POWER Linux on IBM System i Platform February 2007 SG24-6388-01 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. Second Edition (February 2007) This edition applies to i5/OS V5R4, SLES10 and RHEL4. © Copyright International Business Machines Corporation 2005, 2007. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix The team that wrote this redbook. ix Become a published author . xi Comments welcome. xi Chapter 1. Introduction to Linux on System i platform . 1 1.1 Concepts and terminology . 2 1.1.1 System i platform . 2 1.1.2 Hardware management console . 4 1.1.3 Virtual Partition Manager (VPM) . 10 1.2 Brief introduction to Linux and Linux on System i platform . 12 1.2.1 Linux on System i platform . 12 1.3 Differences between existing Power5-based System i and previous System i models 13 1.3.1 Linux enhancements on Power5 / Power5+ . 14 1.4 Where to go for more information . 15 Chapter 2. Configuration planning . 17 2.1 Concepts and terminology . 18 2.1.1 Processor concepts .
    [Show full text]
  • From Blue Gene to Cell Power.Org Moscow, JSCC Technical Day November 30, 2005
    IBM eServer pSeries™ From Blue Gene to Cell Power.org Moscow, JSCC Technical Day November 30, 2005 Dr. Luigi Brochard IBM Distinguished Engineer Deep Computing Architect [email protected] © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As frequency increase is limited due to power limitation Dual core is a way to : 2 x Peak Performance per chip (and per cycle) But at the expense of frequency (around 20% down) Another way is to increase Flop/cycle © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations POWER : FMA in 1990 with POWER: 2 Flop/cycle/chip Double FMA in 1992 with POWER2 : 4 Flop/cycle/chip Dual core in 2001 with POWER4: 8 Flop/cycle/chip Quadruple core modules in Oct 2005 with POWER5: 16 Flop/cycle/module PowerPC: VMX in 2003 with ppc970FX : 8 Flops/cycle/core, 32bit only Dual VMX+ FMA with pp970MP in 1Q06 Blue Gene: Low frequency , system on a chip, tight integration of thousands of cpus Cell : 8 SIMD units and a ppc970 core on a chip : 64 Flop/cycle/chip © 2004 IBM Corporation IBM eServer pSeries™ Technology Trends As needs diversify, systems are heterogeneous and distributed GRID technologies are an essential part to create cooperative environments based on standards © 2004 IBM Corporation IBM eServer pSeries™ IBM innovations IBM is : a sponsor of Globus Alliances contributing to Globus Tool Kit open souce a founding member of Globus Consortium IBM is extending its products Global file systems : – Multi platform and multi cluster GPFS Meta schedulers : – Multi platform
    [Show full text]
  • IBM Powerpc 970 (A.K.A. G5)
    IBM PowerPC 970 (a.k.a. G5) Ref 1 David Benham and Yu-Chung Chen UIC – Department of Computer Science CS 466 PPC 970FX overview ● 64-bit RISC ● 58 million transistors ● 512 KB of L2 cache and 96KB of L1 cache ● 90um process with a die size of 65 sq. mm ● Native 32 bit compatibility ● Maximum clock speed of 2.7 Ghz ● SIMD instruction set (Altivec) ● 42 watts @ 1.8 Ghz (1.3 volts) ● Peak data bandwidth of 6.4 GB per second A picture is worth a 2^10 words (approx.) Ref 2 A little history ● PowerPC processor line is a product of the AIM alliance formed in 1991. (Apple, IBM, and Motorola) ● PPC 601 (G1) - 1993 ● PPC 603 (G2) - 1995 ● PPC 750 (G3) - 1997 ● PPC 7400 (G4) - 1999 ● PPC 970 (G5) - 2002 ● AIM alliance dissolved in 2005 Processor Ref 3 Ref 3 Core details ● 16(int)-25(vector) stage pipeline ● Large number of 'in flight' instructions (various stages of execution) - theoretical limit of 215 instructions ● 512 KB L2 cache ● 96 KB L1 cache – 64 KB I-Cache – 32 KB D-Cache Core details continued ● 10 execution units – 2 load/store operations – 2 fixed-point register-register operations – 2 floating-point operations – 1 branch operation – 1 condition register operation – 1 vector permute operation – 1 vector ALU operation ● 32 64 bit general purpose registers, 32 64 bit floating point registers, 32 128 vector registers Pipeline Ref 4 Benchmarks ● SPEC2000 ● BLAST – Bioinformatics ● Amber / jac - Structure biology ● CFD lab code SPEC CPU2000 ● IBM eServer BladeCenter JS20 ● PPC 970 2.2Ghz ● SPECint2000 ● Base: 986 Peak: 1040 ● SPECfp2000 ● Base: 1178 Peak: 1241 ● Dell PowerEdge 1750 Xeon 3.06Ghz ● SPECint2000 ● Base: 1031 Peak: 1067 Apple’s SPEC Results*2 ● SPECfp2000 ● Base: 1030 Peak: 1044 BLAST Ref.
    [Show full text]
  • Treatment and Differential Diagnosis Insights for the Physician's
    Treatment and differential diagnosis insights for the physician’s consideration in the moments that matter most The role of medical imaging in global health systems is literally fundamental. Like labs, medical images are used at one point or another in almost every high cost, high value episode of care. Echocardiograms, CT scans, mammograms, and x-rays, for example, “atlas” the body and help chart a course forward for a patient’s care team. Imaging precision has improved as a result of technological advancements and breakthroughs in related medical research. Those advancements also bring with them exponential growth in medical imaging data. The capabilities referenced throughout this document are in the research and development phase and are not available for any use, commercial or non-commercial. Any statements and claims related to the capabilities referenced are aspirational only. There were roughly 800 million multi-slice exams performed in the United States in 2015 alone. Those studies generated approximately 60 billion medical images. At those volumes, each of the roughly 31,000 radiologists in the U.S. would have to view an image every two seconds of every working day for an entire year in order to extract potentially life-saving information from a handful of images hidden in a sea of data. 31K 800MM 60B radiologists exams medical images What’s worse, medical images remain largely disconnected from the rest of the relevant data (lab results, patient-similar cases, medical research) inside medical records (and beyond them), making it difficult for physicians to place medical imaging in the context of patient histories that may unlock clues to previously unconsidered treatment pathways.
    [Show full text]