Abstract of “Improving Performance, Energy-Efficiency and Error

Total Page:16

File Type:pdf, Size:1020Kb

Abstract of “Improving Performance, Energy-Efficiency and Error Abstract of \Improving performance, energy-efficiency and error-resilience of multicore embedded systems through speculative synchronization mecha- nisms" by Dimitra Papagiannopoulou, Ph.D., Brown University, May 2016. Embedded systems are becoming ubiquitous and like their general-purpose counterparts they have embraced the multicore design paradigm. However, embedded systems need to satisfy specific requirements in performance, energy-efficiency and error-resilience. This thesis proposes design tech- niques based on speculative synchronization mechanisms such as Hardware Transactional Memory (HTM), Speculative Lock Elision (SLE) and Transactional Lock Removal (TLR) to address these issues. The first part of the thesis introduces Embedded-Spec, an energy-efficient and lightweight im- plementation for transparent speculation on a shared-bus multicore embedded architecture. A major advantage of Embedded-Spec is that it can be transparently used with lock-based, non-speculative legacy code. An extensive set of experiments over a wide range of parameters shows that compared to traditional locking, Embedded-Spec can improve the energy-delay product to different degrees based on the chosen configuration. In order to overcome scalability limitations and achieve better performance per Watt, high-end embedded systems are turning to many-core cluster-based NUMA architectures that employ simple scratchpad memories instead of area- and power-hungry data caches. For these types of architectures without caches and cache coherence support, no speculative synchronization design exists. The sec- ond part of this thesis introduces the first implementation of HTM for a coherence-free many-core embedded system. The design employs distributed conflict management and resolution for increased scalability. Experiments show that the proposed HTM design can achieve significant performance improvement over traditional locking. The final part of this thesis explores how HTM can be used beyond data synchronization and 2 specifically as an error-recovery mechanism from variability-induced errors. Two integrated HW/SW schemes are introduced that adaptively scale the supply voltage in order to save energy. These schemes use lightweight checkpointing and roll-back mechanisms adopted from HTM to recover both from intermittent timing errors and catastrophic failures that may occur due to scaling be- yond a safe supply voltage. Experiments over a range of operating parameters show that both techniques can achieve significant energy savings at low overhead compared to using conservative voltage guardbands, while guaranteeing forward progress and reliability. Improving performance, energy-efficiency and error-resilience of multicore embedded systems through speculative synchronization mechanisms by Dimitra Papagiannopoulou M.Sc, Brown University, 2013 M.Sc, University of Patras, Greece 2014 BSE, University of Patras, Greece 2008 A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the School of Engineering at Brown University Providence, Rhode Island May 2016 c Copyright 2016 by Dimitra Papagiannopoulou This dissertation by Dimitra Papagiannopoulou is accepted in its present form by the School of Engineering as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date R. Iris Bahar, Director Recommended to the Graduate Council Date Maurice Herlihy, Reader Date Sherief Reda, Reader Approved by the Graduate Council Date Peter M. Weber Dean of the Graduate School iii Vita Dimitra Papagiannopoulou was born in 1985 in Athens, Greece and grew up in Patras, Greece. She holds a Bachelor of Science in Engineering (Dipl.-Ing.) from the department of Electrical and Computer Engineering of the University of Patras, a Master of Science degree on \Integrated Soft- ware and Hardware Systems" from the department of Computer Science and Engineering of the University of Patras and a Master of Science degree from the department of Electrical Sciences and Computer Engineering of Brown University. Her research interests span the areas of computer architecture, embedded systems, low-power design, multiprocessor synchronization, reliability and variability-aware design. iv Acknowledgements I would like to express my sincere gratitude to my advisor, Prof. Iris Bahar for her continuous support, encouragement and guidance throughout my PhD studies. Prof. Bahar was the reason I chose to attend Brown University. She has been a great mentor to me all these years and I would like to thank her for her active participation in developing me as a researcher. I would also like to thank Prof. Maurice Herlihy for working with me throughout my Ph.D., for his invaluable feedback, support and mentorship. I am grateful to Prof. Sherief Reda, for being in my thesis committee and for his constructive feedback concerning this thesis manuscript. I would also like to express my gratitude to my research collaborators, Prof. Tali Moreshet, Prof. Luca Benini and Dr. Andrea Marongiu for their great insight, help and feedback. It has been a pleasure working with them. Many thanks to my present and past colleagues for making the experience at Brown so special. I would like to thank Cesare Ferri, Thomas Carle, Onur Ulusel, Marco Donato, Kumud Nepal, Christopher Picardo, Christopher Harris, Octavian Biris, Kapil Dev, Monami Nowroz and many more. I would also like to thank my friends for always being there for me and for the great times we had together. Last but not least, I would like to thank my mother Ioanna, my father Angelos, my sister Katerina and my fiance Sotiris for their love and their continuous support, encouragement and motivation. Without them, I would not be where I am today. v Contents List of Tables ix List of Figures x 1 Introduction 1 2 Background and Previous Work 9 2.1 Traditional Locking . .9 2.2 Speculative Synchronization Mechanisms . 11 2.2.1 Transactional Memory . 12 2.2.2 Speculative Lock Elision . 20 2.2.3 Transactional Lock Removal . 22 2.2.4 Speculation for Embedded Systems . 23 2.2.5 Error-resilient and energy-efficient execution on embedded systems . 27 3 Energy-efficient and transparent speculation on embedded MPSoC 33 3.1 Embedded-Spec: Speculative Memory Design . 34 3.2 Architecture . 37 3.2.1 The Bloom Module Hardware . 38 3.3 The Embedded-Spec Algorithms . 40 vi 3.3.1 Embedded-LE ................................... 42 3.3.2 Embedded-LR ................................... 43 3.4 Experimental Results . 44 3.4.1 Benchmarks . 44 3.4.2 Embedded-LE Parameter Exploration . 46 3.4.3 Embedded-LR Parameter Exploration . 57 3.4.4 Speculative Execution vs. Locks . 59 3.5 Summary and Discussion . 63 4 Speculative Synchronization on Coherence-free Many-core Embedded architec- tures 65 4.1 Target Architecture . 66 4.2 Transactional Memory Design . 69 4.2.1 Transactional Bookkeeping . 70 4.2.2 Data Versioning . 72 4.2.3 Transaction Control Flow . 76 4.3 Experimental Results . 79 4.3.1 Overhead Characterization . 80 4.3.2 Performance Characterization . 81 4.3.3 EigenBench . 87 4.4 Summary and Discussion . 90 5 Transactional Memory Revisited for Error-Resilient and Energy-Efficient MPSoC Execution 92 5.1 Motivation . 93 5.2 Target Architecture . 95 5.3 Implementation . 97 vii 5.3.1 Checkpointing and Rollback . 97 5.3.2 Data Versioning . 98 5.3.3 Error-Resilient Transactions . 100 5.3.4 Programming model . 101 5.4 Experimental Results . 102 5.4.1 Overhead characterization . 103 5.4.2 Energy characterization . 103 5.5 Summary and Discussion . 106 6 Adaptive voltage scaling policies for improving energy savings at near-edge op- eration 107 6.1 Addressing critical and non-critical errors . 108 6.2 Error policy design . 109 6.3 The Thrifty uncle/Reckless nephew policy . 113 6.4 Experimental Results . 116 6.4.1 Energy consumption . 116 6.4.2 Overhead characterization . 119 6.4.3 Energy savings vs. transaction size . 119 6.5 Summary and Discussion . 121 7 Conclusions and future directions 123 viii List of Tables 3.1 EMBEDDED-SPEC | All Configurations. 42 3.2 Hardware configurations. 45 3.3 EMBEDDED-SPEC { Top Best two configurations when considering performance only, energy only, or energy-delay product. 63 4.1 Per-core transactional write footprint for each application. 81 4.2 Experimental setup for VSoC platform. 82 ix List of Figures 2.1 The lock interface. 10 2.2 Example of transactional events handling (based on the implementation proposed in [1]). 15 2.3 Classification of TM designs. 20 2.4 Percentage error rate versus supply voltage for intermittent timing errors and the Critical Operating Point. 28 2.5 Pipeline augmented with Razor latches and control lines (taken from [2]). 30 3.1 Logic for Transactional Management used in Embedded-Spec. The architectural configuration is taken from [3]. The dark blocks show the additional hardware re- quired. That is, the Tx bit for each line of the data cache to indicate if the data is transactional, the Tx logic in the cache controller to handle transactional accesses, and the Bloom module to detect and resolve conflicts. 35 3.2 Modifications to the cache coherence protocol for transactional accesses. The gray block indicates the added operations. Note: The TX decision diamond denotes whether the Tx bit is already set or not. 36 3.3 Architecture overview, as proposed in [3]. 38 3.4 (a) Overview of the Bloom Module. (b) Internal details of a core Bloom Filter Unit (BFU). Taken from [3] . 39 x 3.5 The flowchart of the Embedded-LE algorithm. 43 3.6 Execution time for Embedded-LE and Embedded-LE-Sleep modes. 46 3.7 Energy Consumption for Embedded-LE and Embedded-LE-Sleep modes. 47 3.8 Energy Delay Product for Embedded-LE and Embedded-LE-Sleep modes. 48 3.9 Performance of Embedded-LE and varying maximum number of retries. 49 3.10 Energy Consumption of Embedded-LE and varying maximum number of retries. 51 3.11 Energy Delay Product of Embedded-LE and varying maximum number of retries. 51 3.12 Performance of Embedded-LE-Sleep and varying maximum number of retries.
Recommended publications
  • A 1024-Core 70GFLOPS/W Floating Point Manycore Microprocessor
    A 1024-core 70GFLOPS/W Floating Point Manycore Microprocessor Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva, Lexington, MA The Past, Present, & Future of Computing SIMD MIMD PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU MINI MINI MINI BIG BIG CPU CPU CPU CPU CPU BIG BIG BIG BIG CPU CPU CPU CPU PAST PRESENT FUTURE 2 Adapteva’s Manycore Architecture C/C++ Programmable Incredibly Scalable 70 GFLOPS/W 3 Routing Architecture 4 E64G400 Specifications (Jan-2012) • 64-Core Microprocessor • 100 GFLOPS performance • 800 MHz Operation • 8GB/sec IO bandwidth • 1.6 TB/sec on chip memory BW • 0.8 TB/sec network on chip BW • 64 Billion Messages/sec IO Pads Core • 2 Watt total chip power • 2MB on chip memory Link Logic • 10 mm2 total silicon area • 324 ball 15x15mm plastic BGA 5 Lab Measurements 80 Energy Efficiency 70 60 50 GFLOPS/W 40 30 20 10 0 0 200 400 600 800 1000 1200 MHz ENERGY EFFICIENCY ENERGY EFFICIENCY (28nm) 6 Epiphany Performance Scaling 16,384 G 4,096 F 1,024 L 256 O 64 4096 P 1024 S 16 256 64 4 16 1 # Cores On‐demand scaling from 0.25W to 64 Watt 7 Hold on...the title said 1024 cores! • We can build it any time! • Waiting for customer • LEGO approach to design • No global timinga paths • Guaranteed by design • Generate any array in 1 day • ~130 mm2 silicon area 1024 Cores 1Core 8 What about 64-bit Floating Point? Single Precision Double Precision 2 FLOPS/CYCLE 2 FLOPS/CYCLE 64KB SRAM 64KB SRAM 0.215mm^2 0.237mm^2 700MHz 600MHz 9 Epiphany Latency Specifications
    [Show full text]
  • Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018
    Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018 Click on the product names to get more product information. In most cases these links go to LinuxGizmos.com articles with detailed product descriptions plus market analysis. HDMI or DP- USB Product Price ($) Vendor Processor Cores 3D GPU MCU RAM Storage LAN Wireless out ports Expansion OSes 86Duino Zero / Zero Plus 39, 54 DMP Vortex86EX 1x x86 @ 300MHz no no2 128MB no3 Fast no4 no5 1 headers Linux Opt. 4GB eMMC; A20-OLinuXino-Lime2 53 or 65 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB Fast no yes 3 other Linux, Android SATA A20-OLinuXino-Micro 65 or 77 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB opt. 4GB NAND Fast no yes 3 other Linux, Android Debian Linux A33-OLinuXino 42 or 52 Olimex Allwinner A33 4x A7 @ 1.2GHz Mali-400 no 1GB opt. 4GB NAND no no no 1 dual 40-pin 3.4.39, Android 4.4 4GB (opt. 16GB A64-OLinuXino 47 to 88 Olimex Allwinner A64 4x A53 @ 1.2GHz Mali-400 MP2 no 1GB GbE WiFi, BT yes 1 40-pin custom Linux eMMC) Banana Pi BPI-M2 Berry 36 SinoVoip Allwinner V40 4x A7 Mali-400 MP2 no 1GB SATA GbE WiFi, BT yes 4 Pi 40 Linux, Android 8GB eMMC (opt. up Banana Pi BPI-M2 Magic 21 SinoVoip Allwinner A33 4x A7 Mali-400 MP2 no 512MB no Wifi, BT no 2 Pi 40 Linux, Android to 64GB) 8GB to 64GB eMMC; Banana Pi BPI-M2 Ultra 56 SinoVoip Allwinner R40 4x A7 Mali-400 MP2 no 2GB GbE WiFi, BT yes 4 Pi 40 Linux, Android SATA Banana Pi BPI-M2 Zero 21 SinoVoip Allwinner H2+ 4x A7 @ 1.2GHz Mali-400 MP2 no 512MB no no WiFi, BT yes 1 Pi 40 Linux, Android Banana
    [Show full text]
  • Survey and Benchmarking of Machine Learning Accelerators
    1 Survey and Benchmarking of Machine Learning Accelerators Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner MIT Lincoln Laboratory Supercomputing Center Lexington, MA, USA freuther,pmichaleas,michael.jones,vijayg,sid,[email protected] Abstract—Advances in multicore processors and accelerators components play a major role in the success or failure of an have opened the flood gates to greater exploration and application AI system. of machine learning techniques to a variety of applications. These advances, along with breakdowns of several trends including Moore’s Law, have prompted an explosion of processors and accelerators that promise even greater computational and ma- chine learning capabilities. These processors and accelerators are coming in many forms, from CPUs and GPUs to ASICs, FPGAs, and dataflow accelerators. This paper surveys the current state of these processors and accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. We then select and benchmark two commercially- available low size, weight, and power (SWaP) accelerators as these processors are the most interesting for embedded and Fig. 1. Canonical AI architecture consists of sensors, data conditioning, mobile machine learning inference applications that are most algorithms, modern computing, robust AI, human-machine teaming, and users (missions). Each step is critical in developing end-to-end AI applications and applicable to the DoD and other SWaP constrained users.
    [Show full text]
  • SPORK: a Summarization Pipeline for Online Repositories of Knowledge
    SPORK: A SUMMARIZATION PIPELINE FOR ONLINE REPOSITORIES OF KNOWLEDGE A Thesis presented to the Faculty of California Polytechnic State University San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Steffen Lyngbaek June 2013 c 2013 Steffen Lyngbaek ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: SPORK: A Summarization Pipeline for Online Repositories of Knowledge AUTHOR: Steffen Lyngbaek DATE SUBMITTED: June 2013 COMMITTEE CHAIR: Professor Alexander Dekhtyar, Ph.D., De- parment of Computer Science COMMITTEE MEMBER: Professor Franz Kurfess, Ph.D., Depar- ment of Computer Science COMMITTEE MEMBER: Professor Foaad Khosmood, Ph.D., Depar- ment of Computer Science iii Abstract SPORK: A Summarization Pipeline for Online Repositories of Knowledge Steffen Lyngbaek The web 2.0 era has ushered an unprecedented amount of interactivity on the Internet resulting in a flood of user-generated content. This content is of- ten unstructured and comes in the form of blog posts and comment discussions. Users can no longer keep up with the amount of content available, which causes developers to start relying on natural language techniques to help mitigate the problem. Although many natural language processing techniques have been em- ployed for years, automatic text summarization, in particular, has recently gained traction. This research proposes a graph-based, extractive text summarization system called SPORK (Summarization Pipeline for Online Repositories of Knowl- edge). The goal of SPORK is to be able to identify important key topics presented in multi-document texts, such as online comment threads. While most other automatic summarization systems simply focus on finding the top sentences rep- resented in the text, SPORK separates the text into clusters, and identifies dif- ferent topics and opinions presented in the text.
    [Show full text]
  • Master's Thesis: Adaptive Core Assignment for Adapteva Epiphany
    Adaptive core assignment for Adapteva Epiphany Master of Science Thesis in Embedded Electronic System Design Erik Alveflo Chalmers University of Technology Department of Computer Science and Engineering G¨oteborg, Sweden 2015 The Author grants to Chalmers University of Technology and University of Gothenburg the non-exclusive right to publish the Work electronically and in a non-commercial pur- pose make it accessible on the Internet. The Author warrants that he/she is the author to the Work, and warrants that the Work does not contain text, pictures or other mate- rial that violates copyright law. The Author shall, when transferring the rights of the Work to a third party (for example a publisher or a company), acknowledge the third party about this agreement. If the Author has signed a copyright agreement with a third party regarding the Work, the Author warrants hereby that he/she has obtained any necessary permission from this third party to let Chalmers University of Technology and University of Gothenburg store the Work electronically and make it accessible on the Internet. Adaptive core assignment for Adapteva Epiphany Erik Alveflo, c Erik Alveflo, 2015. Examiner: Per Larsson-Edefors Chalmers University of Technology Department of Computer Science and Engineering SE-412 96 G¨oteborg Sweden Telephone + 46 (0)31-772 1000 Department of Computer Science and Engineering G¨oteborg, Sweden 2015 Adaptive core assignment for Adapteva Epiphany Erik Alveflo Department of Computer Science and Engineering Chalmers University of Technology Abstract The number of cores in many-core processors is ever increasing, and so is the number of defects due to manufacturing variations and wear-out mechanisms.
    [Show full text]
  • Program & Exhibits Guide
    FROM CHIPS TO SYSTEMS – LEARN TODAY, CREATE TOMORROW CONFERENCE PROGRAM & EXHIBITS GUIDE JUNE 24-28, 2018 | SAN FRANCISCO, CA | MOSCONE CENTER WEST Mark You Calendar! DAC IS IN LAS VEGAS IN 2019! MACHINE IP LEARNING ESS & AUTO DESIGN SECURITY EDA IoT FROM CHIPS TO SYSTEMS – LEARN TODAY, CREATE TOMORROW JUNE 2-6, 2019 LAS VEGAS CONVENTION CENTER LAS VEGAS, NV DAC.COM DAC.COM #55DAC GET THE DAC APP! Fusion Technology Transforms DOWNLOAD FOR FREE! the RTL-to-GDSII Flow GET THE LATEST INFORMATION • Fusion of Best-in-Class Optimization and Industry-golden Signoff Tools RIGHT WHEN YOU NEED IT. • Unique Fusion Data Model for Both Logical and Physical Representation DAC.COM • Best Full-flow Quality-of-Results and Fastest Time-to-Results MONDAY SPECIAL EVENT: RTL-to-GDSII Fusion Technology • Search the Lunch at the Marriott Technical Program • Find Exhibitors www.synopsys.com/fusion • Create Your Personalized Schedule Visit DAC.com for more details and to download the FREE app! GENERAL CHAIR’S WELCOME Dear Colleagues, be able to visit over 175 exhibitors and our popular DAC Welcome to the 55th Design Automation Pavilion. #55DAC’s exhibition halls bring attendees several Conference! new areas/activities: It is great to have you join us in San • Design Infrastructure Alley is for professionals Francisco, one of the most beautiful who manage the HW and SW products and services cities in the world and now an information required by design teams. It houses a dedicated technology capital (it’s also the city that Design-on-Cloud Pavilion featuring presentations my son is named after).
    [Show full text]
  • Proyecto Fin De Grado
    ESCUELA TÉCNICA SUPERIOR DE INGENIERÍA Y SISTEMAS DE TELECOMUNICACIÓN PROYECTO FIN DE GRADO TÍTULO: Despliegue de Liota (Little IoT Agent) en Raspberry Pi AUTOR: Ricardo Amador Pérez TITULACIÓN: Ingeniería Telemática TUTOR (o Director en su caso): Antonio da Silva Fariña DEPARTAMENTO: Departamento de Ingeniería Telemática y Electrónica VºBº Miembros del Tribunal Calificador: PRESIDENTE: David Luengo García VOCAL: Antonio da Silva Fariña SECRETARIO: Ana Belén García Hernando Fecha de lectura: Calificación: El Secretario, Despliegue de Liota (Little IoT Agent) en Raspberry Pi Quizás de todas las líneas que he escrito para este proyecto, estas sean a la vez las más fáciles y las más difíciles de todas. Fáciles porque podría doblar la longitud de este proyecto solo agradeciendo a mis padres la infinita paciencia que han tenido conmigo, el apoyo que me han dado siempre, y el esfuerzo que han hecho para que estas líneas se hagan realidad. Por todo ello y mil cosas más, gracias. Mamá, papá, lo he conseguido. Fáciles porque sin mi tutor Antonio, este proyecto tampoco sería una realidad, no solo por su propia labor de tutor, si no porque literalmente sin su ayuda no se hubiera entregado a tiempo y funcionando. Después de esto Antonio, voy a tener que dejarme ganar algún combate en kenpo como agradecimiento. Fáciles porque, sí melones os toca a vosotros, Alex, Alfonso, Manu, Sama, habéis sido mi apoyo más grande en los momentos más difíciles y oscuros, y mis mejores compañeros en los momentos de felicidad. Amigos de Kulturales, los hermanos Baños por empujarme a mejorar, Pablo por ser un ejemplo a seguir, Chou, por ser de los mejores profesores y amigos que he tenido jamás.
    [Show full text]
  • Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing
    Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian National University Canberra, Australia April 07, 2016 Introduction & Background Overview 1 Introduction & Background 2 Power Measurement Environment 3 Experimental Platforms 4 Approach 5 Results & Analysis 6 Conclusion Mitra et. al. (ANU) GTC 2016, San Francisco April 07, 2016 2 / 20 Introduction & Background Use of low-powered SoCs for HPC Nvidia Jetson TK1: ARM + GPU SoC Nvidia Jetson TX1: ARM + GPU SoC TI Keystone II: ARM + DSP SoC Adapteva Parallella: ARM + 64-core NoC TI BeagleBoard: ARM + DSP SoC Terasic DE1: ARM + FPGA SoC Rockchip Firefly: ARM + GPU SoC Freescale Wandboard: ARM + GPU SoC Cubieboard4: ARM + GPU SoC http://cs.anu.edu.au/systems Mitra et. al. (ANU) GTC 2016, San Francisco April 07, 2016 3 / 20 Introduction & Background Use of low-powered SoCs for HPC In order for SoC processors to be considered viable exascale building blocks, important factors to explore include: Absolute performance Balancing use of different on-chip devices Understanding the performance-energy trade-off Mitra et. al. (ANU) GTC 2016, San Francisco April 07, 2016 4 / 20 Introduction & Background Contributions Environment for monitoring and collecting high resolution power measurements for SoC systems Understanding the benefits of exploiting both the host CPU and accelerator GPU cores simultaneously for critical HPC kernels Performance and energy comparisons
    [Show full text]
  • AI-Optimized Chipsets
    AI-Optimized Chipsets Part III: Key Opportunities & Trends Aug 2018 An Introduction Previously in Part I, we reviewed the ADAC loop and key factors driving innovation for AI- optimized chipsets. In Part II, we review the shift in performance focus computing from general application neural nets and how this is driving demand for high performance computing. To this end, some startups are adopting alternative, novel approaches and this is expected to pave the way for other AI-optimized chipsets. In this instalment, we review the training and inference chipset markets, assess the dominance of tech giants, as well as the startups adopting cloud-first or edge-first approaches to AI-optimized chipsets. The training chipset market is dominated by 5 firms, while the inference chipset market is more diverse with >40 players Training Chipsets Inference Chipsets No. of Companies • 5 (i.e. Nvidia, Intel, Xilinx, AMD, Google) • > 40 Differentiators • Computation Performance • Power Efficiency • Usability • System Latency • Innovation Road Map • Cost • Computation Performance Barriers to Entry • R&D Intensity • Manufacturing Scale Economies • Developer Support • High Switching Costs (End User) • Size of End Markets • Regulatory Requirements • Distribution ASP • USD 2,000 - USD 20,000 • USD 80 - USD 10,000 Source: UBS • The training chipset market is dominated by 5 firms which have developed massively parallel architectures, well-suited for deep learning algorithms. NVIDIA is the most prominent with its GPU technology stack. • The early leaders in this market are likely to maintain their lead, but the inference market is large and easily accessible to all, including tech giants and startups. Given the significant market for inference chipsets, many defensible market niches based on high-system speed, low power and/or low Total Cost of Ownership (TCO) products are likely to emerge.
    [Show full text]
  • November 2–5, 2014 Asilomar Hotel and Conference Grounds
    Monterey, CA 93943 CA Monterey, 8236 Box P.O. Corp. SS&C Conf. FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS November 2–5, 2014 Asilomar Hotel and Conference Grounds Technical Co-sponsor FORTY-EIGHTH Welcome from the General Chairman ASILOMAR CONFERENCE ON Prof. Roger Woods SIGNALS, SYSTEMS & COMPUTERS Queen’s University Belfast, UK Welcome to the 48th Asilomar Conference on Signals, Systems, and Computers! I have had a long involvement with the Conference since my first publication in 1997 when I was immediately struck by the unique nature of the Asilomar conference environment. The picturesque sand dunes and warm sunshine provide a wonderful backdrop to a conference that allows easy access to, and Technical Co-sponsor interaction with key researchers. Understandably, over the years, I have needed little persuasion to attend. There will never be a better opportunity to capture the attention of a key researcher in your area IEEE SIGNAL PROCESSING SOCIETY of expertise than at Asilomar! The technical program was crafted expertly by the Technical Program Chair, Geert Leus, and his team of Technical Area Chairs: Shengli Zhou, Zhengdao Wang, Bhaskar Rao, Michael Rabbat, Zhi Tian, Visa Koivunen, Selin Aviyente, Jorn Janneck, Mohsin Jamali, and Matt McKay. I would like to thank Geert and his team for assembling a high quality program with 439 accepted papers and 164 invited papers. The student paper contest this year has been chaired by Joe Cavallaro and he has selected a total of 11 CONFERENCE COMMITTEE submissions. The student finalists will present poster presentations to the judges on Sunday afternoon and of course, everyone is General Chair Publicity Chair welcome to attend.
    [Show full text]
  • Electronics Tinkering Books
    Books about Home and Classroom Electronics Tinkering, Hacking, Programming, and having fun with Raspberry Pi, Arduino, Intel Edison, Intel Galileo, and the like Collection of References edited by Stanislav Sýkora Permalink via DOI: 10.3247/SL5Refs15.003 Stan's LIBRARY and its Electronics Section Extra Byte | Stan's HUB Free online electronics texts Forward a missing book reference Site Plan & SEARCH This almost complete compilation includes titles yet to be released (they have a month specified in the release date). The entries are sorted by publication year and the first Author. Green-color titles indicate educational texts. You can download a PDF version of this document for off-line use. But keep coming back, the list is growing! Many of the books are available from Amazon. Entering Amazon from here helps this site at no cost to you. F Other Lists: Popular Science F Mathematics F Physics F Chemistry Visitor # Patents+IP F Electronics | DSP | Tinkering F Computing Spintronics F Materials ADVERTISE with us WWW issues F Instruments / Measurements Quantum Computing F NMR | ESR | MRI F Spectroscopy Extra Byte Hint: the F symbols above, where present, are links to free online texts (books, courses, theses, ...) Advance notices (years ≥ 2015). At page bottom, Related Works, HARDWARE, and LINKS: Link Directories: SCIENCE | Edu+Fun 1. Benjamin Alex, MATH | COMPUTING Raspberry Pi 2: Beginners User Manual, PHYSICS | CHEMISTRY Independent 2015. Kindle >>. NMR-MRI-ESR-NQR 2. Borchers Jan, ELECTRONICS Arduino in a Nutshell, PATENTS+IP FREE, 2015. Online >>. WWW stuff 3. Bradt Paul, Bradt David, Opaskar Joanna, Arduino Force, Pressure, Friction, and Acceleration Science Fair Projects, Other resources: CreateSpace Independent 2015.
    [Show full text]
  • Analysis of Task Scheduling for Multi-Core Embedded Systems
    Analysis of task scheduling for multi-core embedded systems Analys av schemaläggning för multikärniga inbyggda system JOSÉ LUIS GONZÁLEZ-CONDE PÉREZ, MASTER THESIS Supervisor: Examiner: De-Jiu Chen, KTH Martin Törngren, KTH Detlef Scholle, XDIN AB Barbro Claesson, XDIN AB MMK 2013:49 MDA 462 Acknowledgements I would like to thank my supervisors Detlef Scholle and Barbro Claesson for giving me the opportunity of doing the Master thesis at XDIN. I appreciate the kindness of Barbro chatting with me in Spanish and the support of Detlef no matter how much time it was required. I want to thank Sebastian, David and the other people at XDIN for the nice environment I lived in during these 20 weeks. I would like to thank the support and guidance of my supervisor at KTH DJ Chen and the help of my examiner Martin Törngren in the last stage of the thesis. I want to thank very much the other thesis colleagues at XDIN Joanna, Cheuk, Amir, Robin and Tobias. You have done this experience a lot more enriching. I would like to say merci! to my friends from Tyresö Benoit, Perrine, Simon, Audrey, Pierre, Marie-Line, Roberto, Alberto, Iván, Vincent, Olivier, Achour, Maxime, Si- mon, Emilie, Adelie, Siim and all the others. I have had great memories with you during the first year at KTH. I thank Osman and Tarek for this year in Midsom- markransen. I thank all the professors and staff from the Mechatronics department Mike, Bengt, Chen, Kalle, Jad and the others for making this programme possible, es- pecially Martin Edin Grimheden for his commitment with the students.
    [Show full text]