Introduction to Cell Broadband Engine Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Cell Broadband Engine Architecture Introduction to Cell Broadband Engine Architecture PRACE Petascale Computing Winter School Gabriele Carteni 10-13 February 2009 Operations Department Athens - Greece Barcelona Supercomputing Center 1 A brief history... • The Cell Broadband Engine Architecture (CBEA) defines a processor structure directed toward distributed processing • Jointly developed by Sony Computer Entertainment, Toshiba and IBM • Architecture Design and First Implementation: - 4 Years (2001-2005) - Over 400 engineers from the STI Alliance - Enhanced versions of the design tools for the POWER4 processor • SOI process from 90 nm to 65 nm (2007) to 45 nm (2008) • In May 2008, IBM introduced the high-performance double-precision floating- point version of the Cell processor: the PowerXCell 8i PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 2 Towards an hardware accelerated concept • Moore’s law continues unabated. • Three other metrics impacting computer performance: - the maximum electric power a microprocessor can use - the clock speed - performance (operations) per clock tick • “The biggest reason for the leveling off is the heat dissipation problem. With transistors at 65-nanometer sizes, the heating rate would increase 8 times whenever the clock speed was doubled” - Ken Koch, Roadrunner project (Los Alamos National Laboratory - USA) source: “Roadrunner Computing in the fast lane” 1663 Los Alamos Science and Technology Magazine, May 2008 PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 3 Towards an hardware accelerated concept • Specialized co-processor Increase standard CPU with many (8-32) small, efficient, - L2 CACHE hardware accelerators (SIMD or vector) with private memories, visible to applications. • Storage hierarchy - Rewrite code to highly parallel model with explicit on- chip vs. off-chip memory. L2 CACHE • Applications coded for parallelism - Stream data between cores on same chip to reduce off- chip accesses. PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 4 Cell BE Components • PowerPC Processor Element (PPE) - used for operating systems and the management and allocation of tasks for the SPEs in a system - 64-bit Power-Architecture compliant core - compatibility with the Power Architecture provides a base for porting existing software to Cell • Synergistic Processor Element (SPE) - share Cell system functions provided by Power Architecture - less complex computational units - Single Instruction, Multiple Data (SIMD) capability - enable applications that require a higher computational unit density - cost-effective processing over a wide range of applications • Bus Subsystem (EIB) - provides (@3.2GHz): ๏ aggregate main memory bandwidth: ∼25.6GB/s ๏ I/O bandwidth: 35GB/s (inbound) 40GB/s (outbound) ๏ fair amount of bandwidth left over for moving data within the processor PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 5 Cell BE Components: depth view on the SPE • 2 storage domains: main and local • implements a new instruction-set architecture optimized for power and performance on computing-intensive applications • operates on a local store memory (256 KB) that stores instructions and data • data and instructions are transferred between this local memory and system memory by asynchronous coherent DMA commands, executed by the memory flow control unit (MFC) included in each SPE • a new level of memory hierarchy beyond the registers that provide local storage of data in most processor architectures. This is to provide a mechanism to combat the ”memory wall” limit PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 6 CELL/B.E. from consumer products to HPC Roadrunner (16,000 PowerXCell 8i. IBM BladeServer + AMD) (2 Cell/B.E. or PowerXCell 8i) Mercury 1u Dual Cell Sony Cell/B.E. PowerXCell 8i Computing Unit PCI card (Cell/B.E. + Host) (Cell/B.E. + GPU + AV I/O) SCE PS3 Toshiba SpursEngine (Cell/B.E. + GPU) (SPU’s. + Host) MariCel BSC Prototype (144 PowerXCell 8i) Consumer Business Enterprise HPC PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 7 The IBM PowerXCell™ 8i Processor • CMOS SOI @65 nm • 9 cores, 10 threads (2 threads on PPE, simultaneous multithreading, SMT) • 230.4 GFlops peak (SP) @3.2GHz • 108.8 GFlops peak (DP) @3.2GHz • Up to 25 GB/s memory bandwidth Testo • Up to 75 GB/s I/O bandwidth • Die size 212 square mm PowerXCell 8i @3.2GHz: Die Photo • Maximum power dissipation (est.) 92 W PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 8 The IBM BladeCenter® QS22 • Core Electronics IBM BladeCenter® QS22 - Two 3.2GHz PowerXCell 8i Processors announced on May 13, 2008 - SP: 460 GFlops peak per blade - DP: 217 GFlops peak per blade - Up to 32GB DDR2 800MHz - Standard blade form factor - Support BladeCenter H chassis • Integrated features - Dual 1Gb Ethernet (BCM5704) - Serial/Console port, 4x USB on PCI - 2x1GB DDR2 VLP DIMMs as I/O buffer (optional) - 4x DDR InfiniBand adapter (optional) - SAS expansion card (optional) PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 9 The IBM BladeCenter® QS22 • Core Electronics IBM BladeCenter® QS22 - Two 3.2GHz PowerXCell 8i Processors announced on May 13, 2008 - SP: 460 GFlops peak per blade D D D D D D D D DP: 217 GFlops peak per blade D D D D D D D D - R R R R R R R R - Up to 32GB DDR2 800MHz 2 2 2 2 2 2 2 2 DDR2 - Standard blade form factor PowerXCell 8i PowerXCell 8i - Support BladeCenter H chassis Rambus® FlexIO ™ Flash, RTC D D & NVRAM • Integrated features D IBM D IBM R South R South 2 UART, SPI - Dual 1Gb Ethernet (BCM5704) 2 Bridge 2 Bridge Legacy Con SPI PCI-X PCI - Serial/Console port, 4x USB on PCI PCI-E x16 PCI-E x8 *1 4x HSC HSDC 2x USB 2x PCI-E 1GbE 2x1GB DDR2 VLP DIMMs as I/O buffer (optional) 2.0 - x16 Optional IB 2 port Flash IB x4 HCA - 4x DDR InfiniBand adapter (optional) Drive USB to GbE to - SAS expansion card (optional) BC mid plane BC mid plane PRACE Winter School, February 2009 Introduction to Cell Broadband Engine™ Architecture 10 M a r i C e l The PRACE Prototype @BSC PRACE Petascale Computing Winter School Gabriele Carteni 10-13 February 2009 Operations Department Athens - Greece Barcelona Supercomputing Center 11 The PRACE Project • 16 Partners from 14 countries • Establish the PRACE permanent Research Infrastructure as a single Legal Entity in 2010 • Perform the technical work to prepare operation of the Tier-0 systems in 2009/2010 (deployment and benchmarking of prototypes for Petaflops systems Tier-0 and porting, optimising, peta-scaling of applications) European Centres • Six prototypes for petaflops systems: - IBM Blue Gene/P at FZJ Tier-1 National - IBM POWER6 at SARA Centres - Cray XT5 at CSC (joint proposal with ETHZ-CSCS) - IBM Cell/POWER6 at BSC Tier-2 Regional/University - NEC SX9/x86 system at HLRS Centres - Intel Nehalem/Xeon IB cluster at FZJ/GENCI-CEA. PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 12 MariCel: Specs and Performance • Vendor/Integrator: IBM • CPUs: IBM PowerXCell 8i 3.2Ghz + IBM Power6 4.0Ghz • 6 BladeCenter H • 12 QS22 + 2 JS22 for each BC-H (84 nodes - 1344 cores) • 960 GB of memory (12 nodes with 32GB, 72 nodes with 8GB) • InfiniBand 4x DDR Direct Connection Network • Peak performance: 15.6 Teraflops (LINPACK: 10 Teraflops) • SAN 870GB for global file systems (GPFS) • SAS external 90GB per node • SAS internal 146 GB per JS22 node • Energy consumption: ∼20kW • Cooling type: air cooling front to back PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 13 MariCel: Hardware overview • 6 BladeCenter H with 2 JS22 (management) and 12 QS22 (computing) • Hypernode concept: 1 JS22 + 6 QS22 = 1 Hypernode • 3 IBM System Storage DS3200 (1 DS3200 per 2 BC-H) • 2 IBM System P5 Server: - Head Node - Virtual I/O Server (Login Node and OS Masters) • 1 System X3650 Storage Server (Intel Xeon quad core @2.50GHz) • 1 Voltaire ISR2004 96 port 4x DDR Infiniband Switch • 1 Force 10 S50N 48 port Gigabit switch • 1 10/100/1000 Standard Ethernet Switch PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 14 MariCel: Operational Model • MariCel cluster can be characterised as a standard HPC cluster of thin shared memory nodes • Key differences from a standard homogeneous cluster: - The use of the Cell processor in the compute nodes in which a Cell aware code running on the single PPU core is accelerated by the SPU cores (fine grain hybrid). - The mixture of 2 different nodes, the JS22s to provide system services and the QS22s for compute tasks. • Actually the JS22 nodes can be considered as part of the I/O subsystem in that they export GPFS over NFS to the QS22's and do not provide any other services. • The original design of the MariCel cluster by IBM was based on exporting more system services from the JS22 to QS22s in a hypernode (project called Virtual PowerXCell Environment) PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 15 MariCel: The BladeCenter H • The BC-H manages a set of 14 blade server nodes, providing power and connectivity via a backplane • The BC-H is split into 2 halves for power supply • The prototype has 6 such blade centers • Specification: - 4 redundant power supplies each 2900W - Management Module with Ethernet - Nortel Gb Ethernet switch with 6 external ports - InfiniBand pass through module with 14 external ports - 2 SAS switches - 9U height (4 units in a 42U rack) PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 16 MariCel: The BladeCenter H PRACE Winter School, February 2009 MariCel: the PRACE prototype @BSC 17 MariCel: The Infiniband Network • A Voltaire ISR2004 96 port 4x DDR switch runs the InfiniBand network connecting the JS22 and QS22 blade servers and the GPFS file servers • The network uses a switched fabric (point to point serial link) topology.
Recommended publications
  • モバイル性能を向上させたAVノートPC Qosmiotm V65
    一般論文 FEATURE ARTICLES モバイル性能を向上させたAVノートPC QosmioTM V65 QosmioTM V65/F60 High-Mobility AV Notebook PC 界 陽夫 板倉 広明 佐藤 重信 ■ SAKAI Akio ■ ITAKURA Hiroaki ■ SATO Shigenobu これまで東芝は,デジタル AV 機器と高性能ノートPC(パソコン)の機能を併せ持った AVノート PCとして, Qosmioシリーズ の開発を進めてきた。近年,モバイル性能を向上させた AVノートPC へのユーザーニーズが高まっており,今回,モバイル性能 を向上させたモバイルAVノートPC Qosmio V65(海外向けモデルはF60)を開発した。 第6世代となるQosmio V65/F60では,HD(High Definition)の映像データを高速処理できる映像処理専用エンジン SpursEngineTMとデジタルテレビ(TV)チューナを搭載しながら,当社のモバイルノートPCで培った高密度実装技術により 薄型化を実現した。更に,IMF(In Mold Forming)による新加飾技術で,深みのある色を表現しながら耐久性も向上させた。 一方, SpursEngineとブルーレイ(注1)ディスクドライブの組合せによる新たなアプリケーションを開発することで,AVノート PCとしての機能も向上させた。 Toshiba has developed the Qosmio series of audiovisual (AV) notebook PCs, which successfully balance the functionality of a notebook PC and that of digital AV equipment. In recent years, demand has been growing for AV notebook PCs that are sufficiently thin and light to be easily carried. To meet the demand for enhanced mobility, we have released the Qosmio V65/F60, the sixth-generation AV notebook PC, incorporating the SpursEngineTM high-performance stream processor, a Blu-ray Disc drive, and a digital terrestrial TV tuner for high-speed processing of high-definition (HD) contents, with a reduction in thickness achieved by the use of high-density mounting technology cultivated in the development of our mobile notebook PCs. Moreover, the Qosmio V65/F60 offers improved durability and increased depth of color by a new decoration technique using in-mold forming (IMF). 1 まえがき Qosmioシリーズは,デジタルAV機器の機能を備えるAV ノートPC として商品化を進めてきた。従来から,高性能
    [Show full text]
  • Scheduling Tasks Over Multicore Machines Enhanced with Acelerators: a Runtime System's Perspective
    Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System’s Perspective Cédric Augonnet To cite this version: Cédric Augonnet. Scheduling Tasks over Multicore machines enhanced with acelerators: a Runtime System’s Perspective. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Bordeaux 1, 2011. English. tel-00777154 HAL Id: tel-00777154 https://tel.archives-ouvertes.fr/tel-00777154 Submitted on 17 Jan 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. N◦ d’ordre: 4460 UNIVERSITE´ DE BORDEAUX 1 ECOLE´ DOCTORALE DE MATHEMATIQUES´ ET INFORMATIQUE THESE` present´ ee´ pour obtenir le grade de DOCTEUR Sp´ecialit´e: Informatique Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System’s Perspective par C´edricAUGONNET soutenue le 9 Decembre´ 2011 Apres` avis de : M. Albert COHEN Directeur de recherche INRIA Rapporteur M. Jean-Franc¸ois MEHAUT Professeur des Universites´ Rapporteur Devant la commission d’examen composee´ de : M. Henri BAL Professeur des Universites´ Examinateur M. Albert COHEN Directeur de recherche INRIA Rapporteur M. David GOUDIN Ingenieur´ Chercheur au CEA Examinateur M. Jean-Franc¸ois MEHAUT Professeur des Universites´ Rapporteur M. Raymond NAMYST Professeur des Universites´ Directeur de These` M.
    [Show full text]
  • Toshiba Group's Environmental Vision 2050
    Toshiba Group’s Environmental Vision 2050 Meeting Environmental challenges – The first stage Environmental Vision 2010 Toshiba Group Slogan Committed to People, Committed to the Future. TOSHIBA Contribute to sustainable development of the Earth throughout our business processes and products Double overall eco-efficiency FY2000 FY2010 Product Business process eco-efficiency eco-efficiency Factor T - Products Factor T- Process 2.2 1.2 Value of a product/ Sales/Environmental impact of Environmental impact of a the entire business process product throughout its lifecycle 2 Copyright © Toshiba. 2007. All rights reserved. Meeting environmental challenges – The first stage Double Overall Eco-Efficiency FY2000 FY2010 Overall eco-efficiency 2.00 FY2006 2.00 FY2010 1.75 1.59 1.86 Result: 1.59 times 1.49 1.74 Target: 2 times 1.50 1.63 1.32 1.52 1.68 time (product) 2.2 times (product) 1.25 1.42 1.22 times (business process) 1.00 1.2 times (business process) 1.00 Compared with FY2000 Result Plan Compared with FY2000 0 2000 2004 2005 2006 2007 2008 2009 2010 (FY) ProductProduct EcoEco--efficiencyefficiency BusinessBusiness ProcessProcess EcoEco--efficiencyefficiency Value of products (performance improvement) Sales Environmental impacts throughout Environmental impacts of products Composition of environmental impacts business processes 2.2 times during product life cycles 1.2 times (Average of all products of Toshiba Group) FY2006 result 1.22 times FY2006 result 1.30 2.25 Environmental 1.68 times Environmental 2.20 impacts of 1.22 2.00 impacts of 1.20 2.03 products 1.68 business 1.20 1.17 1.75 1.88 (from procurement 1.56 1.19 1.19 1.20 1.20 1.20 1.74 of raw materials to processes 1.18 1.50 1.36 FY2010 target of raw materials to 1.60 final disposal) (manufacturing) 1.10 1.48 2.2 times final disposal) 1.25 FY2010 target 1.0 1.00 Result Plan 1.2 times 1.00 0 80% 20% 1.00 Result Plan 0 0 20002004200520062007 200820092010(FY) 20002004200520062007200820092010(FY) 3 Copyright © Toshiba.
    [Show full text]
  • Fast Elliptic-Curve Cryptography on the Cell Broadband Engine
    Fast elliptic-curve cryptography on the Cell Broadband Engine Neil Costigan1 and Peter Schwabe2 ⋆ 1 School of Computing Dublin City University, Glasnevin, Dublin 9, Ireland [email protected] 2 Department of Mathematics and Computer Science Technische Universiteit Eindhoven, P.O. Box 513, 5600 MB Eindhoven, Netherlands [email protected] Abstract. This paper is the first to investigate the power of the Cell Broadband Engine for state-of-the-art public-key cryptography. We pre- sent a high-speed implementation of elliptic-curve Diffie-Hellman (ECDH) key exchange for this processor, which needs 777000 cycles on one Syn- ergistic Processor Unit for a scalar multiplication on a 255-bit elliptic curve, including the costs for key verification and key compression. This cycle count is independent of inputs therefore protecting against timing attacks. This speed relies on a new representation of elements of the underlying finite field suited for the unconventional instruction set of this architec- ture. Furthermore we demonstrate that an implementation based on the multi- precision integer arithmetic functions provided by IBM’s multi-precision math (MPM) library would take at least 9660640 cycles. Comparison with implementations of the same function for other archi- tectures shows that the Cell Broadband Engine is competitive in terms of cost-performance ratio to other recent processors such as the Core 2 for public-key cryptography. Specifically, the state-of-the-art Galbraith-Lin-Scott ECDH software per- forms 27370 scalar multiplications per second using all four cores of a 2.5GHz Intel Core 2 Quad Q9300 inside a $400 computer, while the new software reported in this paper performs 24528 scalar multiplications per second on a Playstation 3 that costs just $279.
    [Show full text]
  • 事業編 財務ハイライト・株式会社東芝(連結) 2009 年、2008 年および 2007 年の各 3 月 31 日に終了した事業年度
    本冊子には、2015年9月に行った過年度決算修正の内容が反映されておりません。 その結果、投資判断に利用するには不適当な情報が含まれていますので、それに依 拠して投資判断を下すことはお控えください。当社は、本冊子の情報を使用された ことにより生じるいかなる事項についても、理由の如何を問わず一切責任を負うも のではないことをあらかじめご了承ください。 Toshiba Corporation アニュアルレポート 2009年 3 月期・事業編 財務ハイライト・株式会社東芝(連結) 2009 年、2008 年および 2007 年の各 3 月 31 日に終了した事業年度 単位:百万円 2009 年 2008 年(注記 4) 2007 年(注記 4) 売上高(日本) ¥ 3,230,840 ¥ 3,702,474 ¥ 3,599,385 (海外) 3,423,678 3,962,858 3,516,965 売上高(合計) 6,654,518 7,665,332 7,116,350 営業利益(損失)(注記 1) (250,186) 246,393 258,364 継続事業税金等調整前当期純利益(損失) (279,252) 265,049 327,131 当期純利益(損失) (343,559) 127,413 137,429 総資産 5,453,225 5,935,637 5,931,962 自己資本(注記 2) 447,346 1,022,265 1,108,321 設備投資額(有形固定資産) 357,111 465,044 375,335 研究開発費 378,261 393,293 393,987 自己資本利益率(ROE)(%)(注記 2) (46.8) 12.0 13.0 総資産利益率(ROA)(%) (6.0) 2.1 2.6 単位:円 1 株当たり: 当期純利益(損失)(注記 3) 一基本的 ¥(106.18) ¥ 39.46 ¥ 42.76 一希薄化後 (106.18) 36.59 39.45 配当金 5.00 12.00 11.00 従業員数(千人) 199 198 191 注記:1. 営業利益(損失)は、日本の会計慣行に従い、売上高から売上原価、販売費および一般管理費を控除して計算されています。 2. 自己資本および自己資本利益率は、米国会計基準に基づく資本合計を使用しています。 3. 基本的1株当たり当期純利益(損失)は、期中の加重平均発行済普通株式数に基づき計算されています。希薄化後1 株当たり当期純利益(損失)は、 逆希薄化効果のある場合を除き、新株予約権の行使により普通株式が発行されることになった場合に生じる希薄化効果を前提として計算され ています。 4. 2009 年 3 月期から、米国財務会計基準審議会基準書第 144 号「長期性資産の減損または処分に関する会計処理」に従い、モバイル放送(株)に係 る経営成績を連結損益計算書上、非継続事業として区分表示しています。そのため、2008 年 3 月期および 2007 年 3 月期の数値は、2009 年 3 月期にあわせて組替再表示しています。 売上高(億円) 営業利益(損失)(億円) 当期純利益(損失)(億円) 売上高営業利益率(%) 2,584 76,653 2,406 2,464 71,164 66,545 63,435 1,548 1,374 58,361 3.8 1,274 3.6 3.2 782 2.7 460 △3.8 △2,502 △3,436 05 06 07 08 09 05 06 07 08 09 05 06 07 08 09 東芝コーポレートブランド 私たち、東芝の使命は、 お客さまに、まだ見ぬ感動や驚きを、
    [Show full text]
  • Ideal Gflops/Sec for This Code – 178.2 Gflops/Sec Using 4 Qs20s (Net-Linux Layer) Projections Why Heterogeneous?
    Intro • This talk will focus on Cell processor – Cell Broadband Engine Architecture (CBEA) • Power Processing Element (PPE) • Synergistic Processing Element (SPE) – Current implementations • Sony Playstation 3 (1 chip with 6 SPEs) • IBM Blades (2 chips with 8 SPEs each) • Toshiba SpursEngine (1 chip with 4 SPES) • Future work will try to include GPUs & Larrabee Two Topics in One • Accelerators (Accel) …this is going to hurt… • Heterogeneous systems (Hetero) …kill me now… • Goal of work… take away the pain and make code portable • Code examples Why Use Accelerators? • Performance Why Not Use Accelerators? • Hard to program – Many architecturally specific details • Different ISAs between core types • Explicit DMA transactions to transfer data to/from the SPEs’ local stores • Scheduling of work and communication – Code is not trivially portable • Structure of code on an accelerator often does not match that of a commodity architecture • Simple re-compile not sufficient Extensions Charm++ • Added extensions – Accelerated entry methods – Accelerated blocks – SIMD instruction abstraction • Extensions should be portable between architectures Accelerated Entry Methods • Executed on accelerator if present • Targets computationally intensive code • Structure based on standard entry methods – Data dependencies expressed via messages – Code is self-contained • Managed by the runtime system – DMAs automatically overlapped with work on the SPEs – Scheduled (based on data dependencies: messages, objects) – Multiple independently written portions of
    [Show full text]
  • 2005 Cell Broadband Engine (Group Consisting of Toshiba, IBM, and Sony Corporation) ~ Integrated Circuit ~
    Semiconductor History Museum of Japan To search page 2005 Cell Broadband Engine (group consisting of Toshiba, IBM, and Sony Corporation) ~ Integrated Circuit ~ (Cell Broadband Engine)TM (Cell/B.E.TM) is a multi-core processor jointly developed by Sony Computer Entertainment, Sony Corporation, IBM, and Toshiba. Starting from the summer of 2000, they began studying with the goal of realizing processing performance of 100 to 1000 times that of PlayStation 2, which was just developed at the time. In March 2001, four companies opened STI Design Center in IBM's development center in Austin, Texas, USA, and gathered engineers and started the full-fledged development. And in February 2005, details were announced at ISSCC. In architecture design, in order to cover from home game machines to supercomputers, it was required to satisfy "three P", namely "high performance (Performance)", low power consumption (Power) and “low price (Price)”. The most significant feature for realizing this is the adoption of an asymmetric multi- core configuration. At the time of the development start, an architecture of a single processor operating with a high-speed clock was mainstream, as typified by Intel's Pentium 4 processor, and it was generally thought that the clock frequency would continue to improve until around 2010. However, the increase in power consumption accompanying the improvement of the clock frequency became a problem, and thereafter mass production of processors with clock frequencies exceeding 4 GHz did not appear except for server processors that could use large-scale cooling devices. In Cell/B.E., in order to maximize the performance per transistor number, they adopted a multicore configuration consisting of a total of nine processors, each of which was a simplified processor core, as a vanguard of multicore design.
    [Show full text]
  • メディアストリーミングプロセッサspursengine TM とその
    特 集 SPECIAL REPORTS メディア ストリーミング プロセッサ SpursEngineTM とその応用例 特 SpursEngineSE1000 High-Performance Stream Processor 集 檜田 和浩 近藤 伸宏 ■ HIWADA Kazuhiro ■ KONDOH Nobuhiro 東芝は,CPU及び GPU(Graphics Processing Unit)に続く第 3のプロセッサであるメディア ストリーミング プロセッサ SpursEngineTM を開発した。 TM(注1) SpursEngineTM は,Cell Broadband Engine の演算コアであるSPE(Synergistic Processing Element) を利用したシステム LSI であり,メディアストリーミング処理において特に高い性能 対 消費電力を実現するプロセッサである。 また,SpursEngineTM の特長の一つである高い画像処理性能を生かしたアプリケーションを開発し,従来はCell Broadband EngineTM などの高性能プロセッサを必要とした高精度なリアルタイムフェイストラッキングを,市販のパソコン(PC)と SpursEngineTM の組合せにより実現した。 Toshiba has developed SpursEngineSE1000, a third processor operating in cooperation with a central processor unit( CPU) and a graphics processing unit( GPU), which utilizes multiple synergistic processor element( SPE) cores derived from the Cell Broadband EngineTM. SpursEngineSE1000 has been designed to achieve a good power-performance ratio, especially in the area of streaming-media processing. Utilizing the advanced image-processing performance of SpursEngineSE1000, we have also developed“ FACEMATION”, an image manipulation program that realizes high-performance real-time 3D face tracking on a consumer PC in conjunction with SpursEngineSE1000. 1 まえがき 2 SpursEngineTM 開発の背景 東芝は,Cell Broadband EngineTM ⑴(以下,Cell/B.E.TM(注1) 近年,高精細な映像などHD(High Definition)コンテンツ と記す)の演算コアであるSPE(Synergistic Processing Element) の普及が急速に進んでいる。わが国では,2003 年12月に地 を継承したシステムLSI SpursEngine TM を開発した(図 1)。こ 上デジタル放送が開始され,地上波でHDコンテンツが放映さ こでは,SpursEng ine TM の概要を述べるとともに,そのアプリ れるようになった。それと同期して,液晶テレビ(TV)やプラ ケーションの一つであるFACEMATIONの紹介を通して
    [Show full text]
  • Spursenginetm a High-Performance Stream Processor Derived from Cell/B.E.TM for Media Processing Acceleration
    SpursEngineTM A High-performance Stream Processor Derived from Cell/B.E.TM for Media Processing Acceleration Hiroo Hayashi Toshiba Corporation Semiconductor Company Advanced SoC Development Center SpursEngine and the logo are trademarks of Toshiba Corporation in Japan, the United States and other countries. Cell Broadband Engine and Cell/B.E. are trademarks of Sony Computer Entertainment Incorporated. Copyright 2008, Toshiba Corporation. Outline • Background • SpursEngineTM Architecture Overview • Implementation • Programming Model • Application and Performance • Conclusion Copyright 2008, Toshiba Corporation 2 Outline • Background • SpursEngineTM Architecture Overview • Implementation • Programming Model • Application and Performance • Conclusion Copyright 2008, Toshiba Corporation 3 What is the major target for PC? Answer is HD. • Emergence of HD Contents – High Definition (HD) era has been quickly emerging. For instance, video content from digital terrestrial broadcasting, digital video cameras, and optical disks are all HD ready. • HD contents require much more processing power than SD contents. – HD needs 6 times higher bandwidth than Standard Definition (SD). Conventional PC architecture of CPU and GPU can only decode HD video in real-time. – CPU, even though it keeps getting faster, will not be capable of real-time encoding HD video in the near future. Digital/Analog Terrestrial Broadcasting HD HD Importing Burning Conventional PC Sharing Playback (Decode) Streaming Break Through Scaling Users demand an innovative HD solution! Authoring Transcoding Editing Copyright 2008, Toshiba Corporation 4 Future Demand: Video Indexing and Searching Next generation HD video solutions require intelligent software capabilities. -Advanced and Flexible algorithms for RMS (Recognition, Mining, Synthesis) are hard to be provided by H/W solutions. Live Broadcasting PC Digital/Analog Terrestrial Broadcasting Short movies using DSC will be much Home Videos more popular and casual in the near future.
    [Show full text]
  • A Processor Simulator for Multi-Core Architecture Research
    Information and Media Technologies 4(2): 270-281 (2009) reprinted from: IPSJ Transactions on Advanced Computing Systems 2(1): 146-157 (2009) © Information Processing Society of Japan Regular Paper SimCell: A Processor Simulator for Multi-Core Architecture Research Shimpei Sato,†1 Naoki Fujieda,†1 Akira Moriya†1 and Kenji Kise†1 We developed a new open source multi-core processor simulator SimCell from scratch. SimCell is modeled around the Cell Broadband Engine. In this paper, we describe the advantages of the functional level simulator SimCell. From Fig. 1 The block diagram of Cell/B.E. architecture. the verification of the simulation speed, we confirm that SimCell achieves a practical simulation speed. And, we show the features of a cycle-accurate ver- sion of SimCell called SimCell/CA (CA stands for cycle accurate). The gap of execution cycles between SimCell/CA and IBM simulator is 0.8% on average. Element (PPE) is a general purpose processor. Synergistic Processor Element Through a real case study using SimCell, we clarify the usefulness of SimCell (SPE) is an original SIMD style processor 3). They are all connected by a ring for processor architecture research. bus named Element Interconnect Bus (EIB). In this paper, first, we describe the advantages of the functional level simulator SimCell. Here, the functional level simulator refers to a simulator adopting the 1. Introduction execution model of a simplified processor that executes one instruction per cycle. Multi-core processors like Cell Broadband Engine 1),2) and Intel Core 2 Duo are In this kind of simulator, we cannot observe the detailed processor behavior, such becoming common in high-performance servers and desktop computers.
    [Show full text]
  • Jak Wygląda Proces Badawczy I Wdrażanie Wynalazków Do Produkcji W Firmie Toshiba. Historia Wynalazków
    JAK WYGLĄDA PROCES BADAWCZY I WDRAŻANIE WYNALAZKÓW DO PRODUKCJI W FIRMIE TOSHIBA. HISTORIA WYNALAZKÓW Aneta Białek Tomasz Piotrowiak Spis treści..........................................................................................................................2 Wstęp.................................................................................................................................3 1. Historia firmy Toshiba................................................................................................5 1.1. Magik i syn samuraja...............................................................................................6 1.1.1. Hisahige Tanaka ( 1799 – 1881 )…………………………………………………….7 1.1.2. Ichisuke Fujioka ( 1857 – 1918 )…………………………………………………….9 1.2. Toshiba Corporation………………………………………………………………..12 2. Innowacje technologiczne………………………………………………………........13 2.1. Przełomowe innowacje technologiczne w dziejach firmy i świata………………..15 2.2. Innowacje technologiczne – roboty prosto do celu : od lalki – łucznika do przyszłości robotyki………………………………………………………………...18 2.3. Toshiba i Księga Rekordów Guinnessa…………………………………………….22 3. Inne innowacyjne odkrycia Toshiby i plany na przyszłość……………………….25 3.1. Drukarka TOSHIBA B-SX8R - wydruk, kasowanie i ponowny wydruk…………25 3.2. Toshiba stawia na flash……………………………………………………………..26 3.3. USB Toshiby dostarczają prąd po wyłączeniu komputera………………………...27 3.4. Telefon TOSHIBA PORTÉGÉ G910………………………………………………27 3.5. Toshiba: nagrywarka dla kamer Gigashot………………………………………….28 3.6. Mały reaktor jądrowy
    [Show full text]
  • Challenges to EDA System from the View Point of Processor Design and Technology Drivers
    Challenges to EDA System from the View Point of Processor Design and Technology Drivers Mitsuo Saito Chief Fellow / VP Engineering Toshiba Corporation Semiconductor Company Copyright © 2009 Toshiba Corporation, All rights reserved. Contents Semiconductor Technology and Microprocessor Trend Our Processor Design Experiences ¾ Early Struggles for Processor Design ¾ Meet with Playstation • Development of Graphics and Emotion Engine for Playstation 2 ¾ Towards Cell Broadband Engine (Cell) • Development of Cell Processor • Targets of Cell Processor ¾ Towards SpursEngine • What is SpursEngine Makimoto’s Wave and What we have done? ¾ Historical Analysis of Development Today’s Situation and Challenge to EDA system ¾ What is necessary now? ¾ What will happen next? Summary Copyright © 2009 Toshiba Corporation, All rights reserved. 2 StillStill SemiconductorSemiconductor RoadmapRoadmap is is WorkingWorking :: MooreMoore’’ss LawLaw One Billion Transistors per Chip in 2010 (250Mgates) 1.E+10 1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 # of Transistors / Chip 1.E+04 1.E+03 1980 1985 1990 1995 2000 2005 2010 From SEMATECH Data Copyright © 2009 Toshiba Corporation, All rights reserved. 3 First Micro Processor was born in 1971 Intel 4004 DIP16P, Address bus 12 bit, Data bus 4 bit Clock 741KHz P-MOS 10um Tr. 2300 11/15/1971 From : Wikipedia Copyright © 2009 Toshiba Corporation, All rights reserved. 4 ProcessorProcessor ClockClock RateRate perper YearYear 10000 6GHz 3.2GHz 5GHz 3.8GHz P4 PLAYSTATION3 2.93GHz Standard Processor Era 3.6GHz P4 2006/11/11 3.4GHz P4 Core2 Ext 4GHz Out of Order Execution 3.2GHz P4 3.0GHz P4 3GHz 2.0GHz 3.2GHz Pentium 4 XBox360 2GHz 1.4GHz 2005/12/10 Pentium 4 1GHz Athlon/PIII Multi core Era 1000 700MHz 800MHz Freq.
    [Show full text]