Accelerator Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Accelerator Architectures ................................................................................................................................Guest Editors’ Introduction ..................................................................................................................... ACCELERATOR ARCHITECTURES ...... We are entering the golden age of What is an accelerator? the computational accelerator. The com- Let us first attempt to better define the mercial accelerator space is vibrant with notion of an accelerator. An accelerator is a activity from semiconductor vendors, large separate architectural substructure (on the and small, that are designing accelerators for same chip, or on a different die) that is graphics, physics, network processing, and a architected using a different set of objectives variety of other applications. System ven- than the base processor, where these dors are introducing tools and program- objectives are derived from the needs of a ming systems to lower the barriers to entry special class of applications. Through this for software development for their plat- manner of design, the accelerator is tuned to forms. We are already seeing the initial provide higher performance at lower cost, or stream of applications that benefit from at lower power, or with less development these accelerators, and there are definite effort than with the general-purpose base signs that more are yet to come. The hardware. Depending on the domain, research space is blossoming with very accelerators often bring greater than a broad, multidisciplinary activity in ad- 103 advantage in performance, or cost, or vanced research and development for new power over a general-purpose processor. It’s classes of accelerator architecture and appli- worth noting that this definition is quite cations to tap into their power. broad, covering everything from special- It is our honor to serve as coeditors of this purpose function units added to a base special issue of IEEE Micro on accelerator processor, to computational offload units, architectures. There is much to be said to separate, special processors added to the Sanjay Patel about this area, and the authors of our base platform. Examples of accelerators articles have provided a good sampling of include floating-point coprocessors, graph- Wen-mei W. Hwu the commercial and academic work in this ics processing units (GPUs) to accelerate the emerging area. rendering of a vertex-based 3D model into a University of Illinois at As with any emerging area, the technical 2D viewing plane, and accelerators for the delimitations of the field are not well motion estimation step of a video codec. Urbana-Champaign established. The challenges and problems We view an accelerator as an augmenta- associated with acceleration are still forma- tion of a base-class, general-purpose, or tive. The commercial potential is not commodity system. As such, the accelerator generally accepted. There is still the ques- is added to the system to achieve greater tion, ‘‘If we build it, who will come?’’ functionality or performance. Figure 1 In this introductory article to this issue, shows the general system architecture of let us spend some time trying to define the an accelerator, where the accelerator is area of computational acceleration, discuss attached to the base processor via an some of the architectural trade-offs, clarify interconnect. Many variations of this model some of the issues, drawbacks, and advan- are possible, including the accelerator tages of applications development on accel- connected via a system bus such as PCI erator architectures, and try to articulate Express or HyperTransport. This is a who might come if we build it. relatively low-cost path, given the commod- ....................................................................... 4 Published by the IEEE Computer Society 0272-1732/08/$20.00 G 2008 IEEE Authorized licensed use limited to: Politecnico di Milano. Downloaded on January 29, 2009 at 09:01 from IEEE Xplore. Restrictions apply. Figure 1. Several generalized accelerator system architectures. ity support for these protocols, but, because possibly with coherence activity), but at a these buses are intended to support a wide higher cost, which might not be feasible for variety of devices, they are of typically cost-sensitive markets. modest performance and capability. PCI Even tighter coupling is possible with the Express 2.0, for example, provides up to accelerator directly on the same die as the 16 Gbytes/s of bandwidth at microseconds processor, as is used with the CellBE of latency. processor. The choice of particular integra- Some accelerator domains require tighter tion model depends on the nature of the coupling between the accelerator and the domain, the volume of chips its market can processor, and for such domains other support, and the general cost of solution it models may be more appropriate. The can bear. In this view, the accelerator can be accelerator can be attached via the processor thought of as a heterogeneous extension to bus, such as a front-side bus, where the the base platform. accelerator would be in a processor-like Accelerators can have macroarchitectures socket in the system, as is the case with the that span from fixed-function, special- AMD Torrenza.1 This model can provide purpose chips (early generations of graphics higher bandwidth at lower latency, and with chips were of this variety) to highly tighter integration than the system bus (for programmable engines tuned to the needs example, direct access to processor memory, of a particular domain (latest-generation ........................................................................ JULY–AUGUST 2008 5 Authorized licensed use limited to: Politecnico di Milano. Downloaded on January 29, 2009 at 09:01 from IEEE Xplore. Restrictions apply. ......................................................................................................................................................................................................................... GUEST EDITORS’ INTRODUCTION graphics chips are of this variety). The law reduces the cost of performance in the choices of macroarchitecture are driven CPU. Eventually, the market for the primarily by the diversity of computation accelerator erodes, and the accelerator dies. in the domain requiring the accelerator. The Examples include floating-point coproces- more varied the computation, the more sors, audio DSPs for PCs, and video decode programmable the accelerator will need acceleration chips. A sustainable accelerator to be. model requires an application domain Generally speaking, accelerator architec- where ‘‘too much performance is never tures maximize throughput per unit area of enough.’’ silicon, (or depending on product and technology constraints, throughtput per Application domains watt), by invariably exploiting parallelism Although it would be highly presumptu- via fine-grained data-parallel hardware. ous of us to attempt to articulate future Almost all exploit some form of multiword applications that will demand acceleration, single-input, multiple-data (SIMD) opera- we can examine a few domains that demand tion, as SIMD hardware can have better it today. Graphics, gaming, network pro- throughput per area over a more general cessing (which includes TCP offload, en- parallelism model; however, this passes cryption, deep packet inspection, intelligent some optimization cost on to the developer. routing, IPTV, and XML processing), and Various memory models are employed: video encoding are the well-known spaces in some accelerators have software-managed which commercial accelerator chips for memories, some have hardware-managed improving system performance are generally caches, some have direct hardware support commercially viable. Markets exist for for interprocessor communication, some specialized chips for image processing and have very high-bandwidth channels to specialized functional blocks for SoCs for external memory. Accelerators also tend to mobile devices (to improve overall perfor- use specialized, fixed-function hardware for mance per watt). Scientific computing, oil frequent, regular computation. When con- and gas exploration, and financial modeling trasted to general-purpose CPU architec- have also been strong markets in which the tures, which are optimized for low latency accelerator model has provided value, and a richer application programming particularly as more computation is done model, the fine-grained parallel accelerator on an interactive, client-side basis in order architectures appear more akin to digital to drastically reduce delay to discovery or signal processors (DSPs) and early RISC decision making. processors that moved the performance Why does the acceleration model work burden out of the hardware into the well for these domains? Primarily, these software layers. domains fit the mold in which ‘‘too much Viewed more from a market-driven performance is never enough.’’ Additional perspective, accelerators arise because small performance provided by the base platform sets of economically relevant applications is too costly (in dollars or in watts) or not demand more performance or more func- presently possible, thus a customized solu- tionality than the base platform can pro- tion makes economic sense. vide. The economics of the situation justify From a more technical perspective, these the inclusion of additional hardware. His- domains are amenable to an accelerator- tory has taught us that this is a precarious based
Recommended publications
  • Exploring Weak Scalability for FEM Calculations on a GPU-Enhanced Cluster
    Exploring weak scalability for FEM calculations on a GPU-enhanced cluster Dominik G¨oddeke a,∗,1, Robert Strzodka b,2, Jamaludin Mohd-Yusof c, Patrick McCormick c,3, Sven H.M. Buijssen a, Matthias Grajewski a and Stefan Turek a aInstitute of Applied Mathematics, University of Dortmund bStanford University, Max Planck Center cComputer, Computational and Statistical Sciences Division, Los Alamos National Laboratory Abstract The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption. We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes. Starting with a conventional commodity based cluster we leverage the high bandwidth of graphics processing units (GPUs) to increase the overall system bandwidth that is the decisive performance factor in this scenario. Thus, even the addition of low-end, out of date GPUs leads to improvements in both performance- and power-related metrics. Key words: graphics processors, heterogeneous computing, parallel multigrid solvers, commodity based clusters, Finite Elements PACS: 02.70.-c (Computational Techniques (Mathematics)), 02.70.Dc (Finite Element Analysis), 07.05.Bx (Computer Hardware and Languages), 89.20.Ff (Computer Science and Technology) ∗ Corresponding author. Address: Vogelpothsweg 87, 44227 Dortmund, Germany. Email: [email protected], phone: (+49) 231 755-7218, fax: -5933 1 Supported by the German Science Foundation (DFG), project TU102/22-1 2 Supported by a Max Planck Center for Visual Computing and Communication fellowship 3 Partially supported by the U.S.
    [Show full text]
  • For Immediate Release
    FOR IMMEDIATE RELEASE ACM - Association for Computing Machinery IEEE Computer Society Contacts: Virginia Gold Margo McCall 212-626-0505 714-816-2182 [email protected] [email protected] ACM, IEEE COMPUTER SOCIETY HONOR INNOVATOR OF HIGH-PERFORMANCE COMPUTER SYSTEMS DESIGNS University of Wisconsin’s Sohi Developed Advances Widely Adopted by Microprocessor Manufacturers to Increase the Power of Computing NEW YORK, MAY 4, 2011 -- ACM (the Association for Computing Machinery) and the IEEE Computer Society (IEEE-CS) will jointly present the Eckert-Mauchly Award to Gurindar S. (Guri) Sohi of the University of Wisconsin-Madison for pioneering widely used micro-architectural techniques in the design of high-performance microprocessors. These innovations increase the instruction-level parallelism, a measure of how many operations in a computer program can be performed simultaneously. They can be found in almost every commercial microprocessor used today in personal computers and high-end servers. The Eckert Mauchly Award http://awards.acm.org/eckert_mauchly www.computer.org/portal/web/awards/eckert is known as the computer architecture community’s most prestigious award. Sohi will receive the 2011 Eckert-Mauchly Award at the International Symposium on Computer Architecture (ISCA), held as part of the Federated Computing Research Conference (FCRC) http://www.acm.org/fcrc, June 7, 2011, in San Jose, CA. Early in his career, Sohi articulated a model for a dynamically scheduled processor supporting precise exceptions. This model has served as the basis for many commercial superscalar microprocessors that have been designed and built since the early 1990s. His group also proposed the idea of memory dependence prediction to further improve instruction-level parallelism, a technology that has been considered a key innovation in some recent microprocessors.
    [Show full text]
  • Department of Computer Science
    i cl i ck ! MAGAZINE click MAGAZINE 2014, VOLUME II FIVE DECADES AS A DEPARTMENT. THOUSANDS OF REMARKABLE GRADUATES. 50COUNTLESS INNOVATIONS. Department of Computer Science click! Magazine is produced twice yearly for the friends of got your CS swag? CS @ ILLINOIS to showcase the innovations of our faculty and Commemorative 50-10 Anniversary students, the accomplishments of our alumni, and to inspire our t-shirts are available! partners and peers in the field of computer science. Department Head: Editorial Board: Rob A. Rutenbar Tom Moone Colin Robertson Associate Department Heads: Rob A. Rutenbar shop now! my.cs.illinois.edu/buy Gerald DeJong Michelle Wellens Jeff Erickson David Forsyth Writers: David Cunningham CS Alumni Advisory Board: Elizabeth Innes Alex R. Bratton (BS CE ’93) Mike Koon Ira R. Cohen (BS CS ’81) Rick Kubetz Vilas S. Dhar (BS CS ’04, BS LAS BioE ’04) Leanne Lucas William M. Dunn (BS CS ‘86, MS ‘87) Tom Moone Mary Jane Irwin (MS CS ’75, PhD ’77) Michelle Rice Jennifer A. Mozen (MS CS ’97) Colin Robertson Daniel L. Peterson (BS CS ’05) Laura Schmitt Peter L. Tannenwald (BS LAS Math & CS ’85) Michelle Wellens Jill C. Zmaczinsky (BS CS ’00) Design: Contact us: SURFACE 51 [email protected] 217-333-3426 Machines take me by surprise with great frequency. Alan Turing 2 CS @ ILLINOIS Department of Computer Science College of Engineering, College of Liberal Arts & Sciences University of Illinois at Urbana-Champaign shop now! my.cs.illinois.edu/buy click i MAGAZINE 2014, VOLUME II 2 Letter from the Head 4 ALUMNI NEWS 4 Alumni
    [Show full text]
  • Conga-TR4 User's Guide
    COM Express™ conga-TR4 COM Express Type 6 Basic module based on 4th Generation AMD Embedded V- and R-Series SoC User’s Guide Revision 1.10 Copyright © 2018 congatec GmbH TR44m110 1/66 Revision History Revision Date (yyyy.mm.dd) Author Changes 0.1 2018.01.15 BEU • Preliminary release 1.0 2018.10.15 BEU • Updated “Electrostatic Sensitive Device” information on page 3 • Corrected single/dual channel MT/s rates for two variants in table 2 • Updated section 2.2 “Supported Operating Systems” • Added values for four variants in section 2.5 "Power Consumption" • Added values in section 2.6 "Supply Voltage Battery Power" • Updated images in section 4 "Cooling Solutions" • Added note about requiring a re-driver on carrier for USB 3.1 Gen 2 in section 5.1.2 "USB" and 7.4 "USB Host Controller" • Added Intel® Ethernet Controller i211 as assembly option in table 4 "Feature Summary" and section 5.1.4 "Ethernet" • Corrected section 7.4 "USB Host Controller" • Added section 9 "System Resources" 1.1 2019.03.19 BEU • Corrected image in section 2.4 "Supply Voltage Standard Power" • Updated section 10.4 "Supported Flash Devices" 1.2 2019.04.02 BEU • Corrected supported memory in table 2, 3, and added information about supported memory in table 4 • Added information about the new industrial variant in table 3 and 7 1.3 2019.07.30 BEU • Updated note in section 4 "Cooling Solutions" • Changed number of supported USB 3.1 Gen 2 interfaces to two throughout the document • Added note regarding USB 3.1 Gen 2 in section 7.4 "USB Host Controller" 1.4 2020.01.07 BEU
    [Show full text]
  • SIGARCH FY'06 Annual Report July 2005- June 2006 Submitted By
    SIGARCH FY’06 Annual Report July 2005- June 2006 Submitted by: Norm Jouppi, SIGARCH Chair Overview The primary mission of SIGARCH continues to be the forum where researchers and practitioners of computer architecture can exchange ideas. SIGARCH sponsors or cosponsors the premier conferences in the field as well as a number of workshops. It publishes a quarterly newsletter and the proceedings of several conferences. It is financially strong with a fund balance of over one million dollars. The SIGARCH bylaws are available online at http://www.acm.org/sigs/bylaws/arch_bylaws.html Officers and Directors The most recent SIGARCH election was held in the spring of 2003. Norm Jouppi of HP currently serves as SIGARCH Chair, with Margaret Martonosi of Princeton as Vice Chair and Matt Farrens of UC Davis as Secretary/Treasurer. SIGARCH has a four member Board of Directors, which currently consist of Alan Berenbaum, Joel Emer, Bill Dally, and Mark Hill. Alan Berenbaum also serves as Past Chair. In addition to these elected positions, Doug DeGroot serves as the Editor of the SIGARCH newsletter Computer Architecture News, and Doug Burger serves as Information Director, providing SIGARCH information online. The current slate of officers and directors has unanimously agreed to accept the optional 2-year term extension, as approved by the ACM SIG Governing Board Executive Committee. This means the term of the current officers and directors will expire in June 2007. Awards The Eckert-Mauchly Award, cosponsored by the IEEE Computer Society, is the most prestigious award in computer architecture. SIGARCH endows its half of the award, which is presented annually at the Awards Banquet of ISCA.
    [Show full text]
  • Lewis University Dr. James Girard Summer Undergraduate Research Program 2021 Faculty Mentor - Project Application
    Lewis University Dr. James Girard Summer Undergraduate Research Program 2021 Faculty Mentor - Project Application Exploring the Use of High-level Parallel Abstractions and Parallel Computing for Functional and Gate-Level Simulation Acceleration Dr. Lucien Ngalamou Department of Engineering, Computing and Mathematical Sciences Abstract System-on-Chip (SoC) complexity growth has multiplied non-stop, and time-to- market pressure has driven demand for innovation in simulation performance. Logic simulation is the primary method to verify the correctness of such systems. Logic simulation is used heavily to verify the functional correctness of a design for a broad range of abstraction levels. In mainstream industry verification methodologies, typical setups coordinate the validation e↵ort of a complex digital system by distributing logic simulation tasks among vast server farms for months at a time. Yet, the performance of logic simulation is not sufficient to satisfy the demand, leading to incomplete validation processes, escaped functional bugs, and continuous pressure on the EDA1 industry to develop faster simulation solutions. In this research, we will explore a solution that uses high-level parallel abstractions and parallel computing to boost the performance of logic simulation. 1Electronic Design Automation 1 1 Project Description 1.1 Introduction and Background SoC complexity is increasing rapidly, driven by demands in the mobile market, and in- creasingly by the fast-growth of assisted- and autonomous-driving applications. SoC teams utilize many verification technologies to address their complexity and time-to-market chal- lenges; however, logic simulation continues to be the foundation for all verification flows, and continues to account for more than 90% [10] of all verification workloads.
    [Show full text]
  • Virtualization: Comparision of Windows and Linux
    VIRTUALIZATION: COMPARISION OF WINDOWS AND LINUX Ms. Pooja Sharma Lecturer (I.T) PCE, Jaipur Email:[email protected] Charnaksh Jain IV yr (I.T) PCE, Jaipur [email protected] Abstract Full-Virtualization, Para-Virtualization, hyper- visior(Hyper-V), Guest Operating System, Host Virtualization as a concept is not new; computational Operating System. environment virtualization has been around since the first mainframe systems. But recently, the term 1. Introduction "virtualization" has become ubiquitous, representing any type of process obfuscation where a process is Virtualization provides a set of tools for increasing somehow removed from its physical operating flexibility and lowering costs, things that are environment. Because of this ambiguity, important in every enterprise and Information virtualization can almost be applied to any and all Technology organization. Virtualization solutions are parts of an IT infrastructure. For example, mobile becoming increasingly available and rich in features. device emulators are a form of virtualization because the hardware platform normally required to run the Since virtualization can provide significant benefits mobile operating system has been emulated, to your organization in multiple areas, you should be removing the OS binding from the hardware it was establishing pilots, developing expertise and putting written for. But this is just one example of one type virtualization technology to work now. of virtualization; there are many definitions of the In essence, virtualization increases flexibility by term "virtualization" floating around in the current decoupling an operating system and the services and lexicon, and all (or at least most) of them are correct, applications supported by that system from a specific which can be quite confusing.
    [Show full text]
  • A Survey of Reconfigurable Processors
    Hindawi Publishing Corporation VLSI Design Volume 2013, Article ID 683615, 18 pages http://dx.doi.org/10.1155/2013/683615 Review Article Ingredients of Adaptability: A Survey of Reconfigurable Processors Anupam Chattopadhyay MPSoC Architectures, UMIC Research Centre, RWTH Aachen University, Mies-van-der-Rohe Strasse 15, 52074 Aachen, Germany Correspondence should be addressed to Anupam Chattopadhyay; [email protected] Received 18 December 2012; Revised 14 May 2013; Accepted 1 June 2013 Academic Editor: Yann Thoma Copyright © 2013 Anupam Chattopadhyay. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a design to survive unforeseen physical effects like aging, temperature variation, and/or emergence of new application standards, adaptability needs to be supported. Adaptability, in its complete strength, is present in reconfigurable processors, which makes it an important IP in modern System-on-Chips (SoCs). Reconfigurable processors have risen to prominence as a dominant computing platform across embedded, general-purpose, and high-performance application domains during the last decade. Significant advances have been made in many areas such as, identifying the advantages of reconfigurable platforms, their modeling, implementation flow and finally towards early commercial acceptance. This paper reviews these progresses from various perspectives with particular emphasis on fundamental challenges and their solutions. Empowered with the analysis of past, the future research roadmap is proposed. 1. Introduction Circuits (ASICs) in terms of flexibility and performance. Since this work, notable research has been done in accel- The changing technology landscape and fast evolution of erator design (application-specific processors), multicore application standards make it imperative for a design to homogeneous and heterogeneous System-on-Chip (SoC) be adaptable.
    [Show full text]
  • SIGARCH Annual Report July 2009 - June 2010
    SIGARCH Annual Report July 2009 - June 2010 Overview The primary mission of SIGARCH continues to be the forum where researchers and practitioners of computer architecture can exchange ideas. SIGARCH sponsors or cosponsors the premier conferences in the field as well as a number of workshops. It publishes a quarterly newsletter and the proceedings of several conferences. It is financially strong with a fund balance of over two million dollars. The SIGARCH bylaws are available online at http://www.acm.org/sigs/bylaws/arch_bylaws.html. Officers and Directors During the past fiscal year Doug Burger served as SIGARCH Chair, David Wood served as Vice Chair, and Kevin Skadron served as Secretary/Treasurer. Margaret Martonosi , Krste Asanovic, Bill Dally, and Sarita Adve served on the board of directors, and Norm Jouppi also served as Past Chair. In addition to these elected positions, Doug DeGroot continues to serve as the Editor of the SIGARCH newsletter Computer Architecture News, and Nathan Binkert was appointed as the new SIGARCH Information Director, providing SIGARCH information online. Rob Schreiber serves as SIGARCH’s liaison on the SC conference steering committee. The Eckert-Mauchly Award, cosponsored by the IEEE Computer Society, is the most prestigious award in computer architecture. SIGARCH endows its half of the award, which is presented annually at the Awards Banquet of ISCA. Bill Dally of NVidia and Stanford University received the award in 2010, "For outstanding contributions to the architecture of interconnection networks and parallel computers.” Last year, SIGARCH petitioned ACM to increase the ACM share of the award to $10,000, using an endowment taken from the SIGARCH fund balance, which ACM has approved.
    [Show full text]
  • Who Owns 3D Scans of Historic Sites? Three-Dimensional Scanning Can Be Used to Protect Or Rebuild Historic Structures, but Who Owns That Digital Data?
    news Society | DOI:10.1145/3290410 Esther Shein Who Owns 3D Scans of Historic Sites? Three-dimensional scanning can be used to protect or rebuild historic structures, but who owns that digital data? IGH ATOP THE Thomas Jeffer- son Memorial in Washing- ton, D.C., is a layer of biofilm covering the dome, darken- ing and discoloring it. Bio- Hfilm is “a colony of microscopic organ- isms that adheres to stone surfaces,” according to the U.S. National Park Ser- vice, which needed to get a handle on its magnitude to get an accurate cost estimate for the work to remove it. Enter CyArk, a non-profit organi- zation that uses three-dimensional (3D) laser scanning and photogram- metry to digitally record and archive some of the world’s most significant cultural artifacts and structures. CyArk spent a week covering “every inch” of the dome, processed the data, and returned a set of engineer- ing drawings to the Park Service “to quantify down to the square inch how much biofilm is on the monument,’’ says CEO John Ristevski. “This is an example of where data is being used to solve a problem,” to help preserve a historical structure, he says. Ristevski says the Park Service was not charged for the data, and the work CyArk did was funded by individual Capturing photogrammetric data for the digital reconstruction of a badly damaged temple in donors in the San Francisco Bay Area, the ancient city of Bagan, in central Myanmar. where the company is located. CyArk is one of several organiza- data to build extremely precise 3D to use them to raise awareness of their tions using 3D scanning to help pro- models, says Yves Ubelmann, an ar- historical sites,’’ he says.
    [Show full text]
  • Warehouse-Scale Video Acceleration: Co-Design and Deployment in the Wild
    Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild Parthasarathy Ranganathan Sarah J. Gwin Narayana Penukonda Daniel Stodolsky Yoshiaki Hase Eric Perkins-Argueta Jeff Calow Da-ke He Devin Persaud Jeremy Dorfman C. Richard Ho Alex Ramirez Marisabel Guevara Roy W. Huffman Jr. Ville-Mikko Rautio Clinton Wills Smullen IV Elisha Indupalli Yolanda Ripley Aki Kuusela Indira Jayaram Amir Salek Raghu Balasubramanian Poonacha Kongetira Sathish Sekar Sandeep Bhatia Cho Mon Kyaw Sergey N. Sokolov Prakash Chauhan Aaron Laursen Rob Springer Anna Cheung Yuan Li Don Stark In Suk Chong Fong Lou Mercedes Tan Niranjani Dasharathi Kyle A. Lucke Mark S. Wachsler Jia Feng JP Maaninen Andrew C. Walton Brian Fosco Ramon Macias David A. Wickeraad Samuel Foss Maire Mahony Alvin Wijaya Ben Gelb David Alexander Munday Hon Kwan Wu Google Inc. Srikanth Muroor Google Inc. USA [email protected] USA Google Inc. USA ABSTRACT management, and new workload capabilities not otherwise possible Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts with prior systems. To the best of our knowledge, this is the first for the majority of internet traffic, and video processing is also foun- work to discuss video acceleration at scale in large warehouse-scale dational to several other key workloads (video conferencing, vir- environments. tual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger CCS CONCEPTS video processing infrastructures and ś with the slowing of Moore’s · Hardware → Hardware-software codesign; · Computer sys- law ś specialized hardware accelerators to deliver more computing tems organization → Special purpose systems.
    [Show full text]
  • Introduction Hardware Acceleration Philosophy Popular Accelerators In
    Special Purpose Accelerators Special Purpose Accelerators Introduction Recap: General purpose processors excel at various jobs, but are no Theme: Towards Reconfigurable High-Performance Computing mathftch for acce lera tors w hen dea ling w ith spec ilidtialized tas ks Lecture 4 Objectives: Platforms II: Special Purpose Accelerators Define the role and purpose of modern accelerators Provide information about General Purpose GPU computing Andrzej Nowak Contents: CERN openlab (Geneva, Switzerland) Hardware accelerators GPUs and general purpose computing on GPUs Related hardware and software technologies Inverted CERN School of Computing, 3-5 March 2008 1 iCSC2008, Andrzej Nowak, CERN openlab 2 iCSC2008, Andrzej Nowak, CERN openlab Special Purpose Accelerators Special Purpose Accelerators Hardware acceleration philosophy Popular accelerators in general Floating point units Old CPUs were really slow Embedded CPUs often don’t have a hardware FPU 1980’s PCs – the FPU was an optional add on, separate sockets for the 8087 coprocessor Video and image processing MPEG decoders DV decoders HD decoders Digital signal processing (including audio) Sound Blaster Live and friends 3 iCSC2008, Andrzej Nowak, CERN openlab 4 iCSC2008, Andrzej Nowak, CERN openlab Towards Reconfigurable High-Performance Computing Lecture 4 iCSC 2008 3-5 March 2008, CERN Special Purpose Accelerators 1 Special Purpose Accelerators Special Purpose Accelerators Mainstream accelerators today Integrated FPUs Realtime graphics GiGaming car ds Gaming physics
    [Show full text]